Neuromorphic computing with nanoscale spintronic oscillators - arXiv

Neuromorphic computing with nanoscale spintronic oscillators Jacob Torrejon1*, Mathieu Riou1*, Flavio Abreu Araujo1*, Sumito Tsunegi2, Guru Khalsa3†, Damien Querlioz4, Paolo Bortolotti1, Vincent Cros1, Akio Fukushima2, Hitoshi Kubota2, Shinji Yuasa2, M. D. Stiles3 and Julie Grollier1

1 - Unité Mixte de Physique, CNRS, Thales, Univ. Paris-Sud, Université Paris-Saclay, 91767 Palaiseau, France 2 - National Institute of Advanced Industrial Science and Technology (AIST), Spintronics Research Center, Tsukuba, Ibaraki 305-8568, Japan 3 - Center for Nanoscale Science and Technology, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-6202, USA 4 - Centre de Nanosciences et de Nanotechnologies, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91405 Orsay France

* [email protected], [email protected], [email protected]

† now at: Cornell University, Department of Materials Science and Engineering, Ithaca, NY 148531501, USA

1

In the brain, hundreds of billions of neurons develop rhythmic activities and interact to process information. Taking inspiration from this behavior to realize high density, low power neuromorphic computing will require huge numbers of nanoscale non-linear oscillators. However, there is no proof of concept today of neuromorphic computing with nano-oscillators. Indeed, nanoscale devices tend to be noisy and to lack the stability required to process data in a reliable way. Here, we show experimentally that a nanoscale spintronic oscillator can achieve spoken digit recognition with accuracies similar to state of the art neural networks. We pinpoint the regime of magnetization dynamics leading to highest performance in the bias conditions space. These results, combined with the exceptional ability of these spintronic oscillators to interact together, their long lifetime, and low energy consumption, open the path to fast, parallel, on-chip computation based on networks of oscillators. For cognitive tasks such as image or speech recognition, the brain processes information faster and with much less energy than any computer. While the underlying processes have not yet been fully elucidated, we know that biological neurons behave as non-linear oscillators, which spike periodically, interact, and synchronize1. This behavior has led to proposals for neuromorphic computing leveraging the non-linear dynamics of coupled oscillators2–6. However, realizing the most challenging tasks requires many oscillators. The brain contains hundreds of billions of neurons, today’s deep networks millions of neurons7, and artificial neural networks of the future, hundreds of millions. Simulating huge complex arrays of oscillators in software is a slow and energy-intensive process. Implementing them in hardware could allow fast, low power computation. However, if we want to build small and useful chips using non-linear dynamics as a principle for computing, it will be necessary to emulate neurons with nanoscale oscillators. Indeed, a simple estimation indicates that, in order to fit a hundred million oscillators organized in a two-dimensional array inside a chip the size of a thumb, their lateral dimensions must be smaller than one micrometer. Unfortunately, oscillators with nanoscale dimensions are susceptible to noise: their overall behavior is affected by thermal fluctuations and their features may drift with time8–11. This is an issue for neuromorphic computing, which can tolerate unreliability in the inputs that should be classified but typically not in the processing circuitry3. Indeed, automatic pattern classification requires generating the same output of the neural network each time similar inputs are presented. Because of these difficulties, there has been no demonstration yet of neuromorphic computing with nanoscale oscillators as neural primitives despite multiple theoretical proposals2–6. In this article, we prove experimentally that nanoscale magnetic oscillators called spin-torque nanooscillators12–14 overcome this challenge and can be used to emulate the oscillatory behavior of collections of neurons (Fig. 1a). We first highlight their two main assets for neuromorphic computing: their exceptional ability to synchronize and their well-controlled magnetization dynamics. We then demonstrate experimentally that a spin-torque oscillator can realize neuromorphic tasks such as spoken digit recognition reaching state of the art performance. Step-by-step, we dissect the mechanisms underlying pattern recognition using a much simpler, archetypal task: the classification of waveforms such as sines and squares. We show that perfect classification is synonymous with high signal to noise ratios, and is achieved, for these oscillators, under specific dynamical regimes. Finally, we compare spin-torque nano-oscillators to other hardware oscillators, show that they are among the most promising for neuromorphic computing and that they are ready to be interconnected in large neural networks towards fast, parallel, dynamical neuromorphic computing. 2

Leveraging the amplitude dynamics of spin-torque nano-oscillators for computing Spin torque nano-oscillators, illustrated in Fig. 1b, have a lateral size on the order of tens to hundreds nanometers15,16. Composed of two nanometer-thick ferromagnetic layers separated by a normal spacer, they have exactly the same structure as current magnetic memory cells. Therefore, they have high endurance, operate at room temperature, and can be fabricated in large numbers (up to hundreds of millions today) on a chip17,18. When a charge current (between microamps and milliamps) flows through these junctions, it gets spin-polarized, and in turn generates a torque on the magnetization19,20 leading to sustained precession at frequencies between hundreds of megahertz to several tens of gigahertz. Magnetization oscillations are then converted into voltage oscillations (shown in Fig. 1b, middle) up to tens of millivolts through the effect of magneto-resistance, which are easy to detect21. These features of spin-torque nano-oscillators are promising for neuromorphic computing, as highlighted recently22–27. However, their use for an actual computing task has never been demonstrated.

Figure 1: Spin-torque nano-oscillator for neuromorphic computing. (a) Schematic of a neuron and its action potential (spikes) (b) Left: Schematic of a spin-torque nano-oscillator, consisting of a normal spacer sandwiched between two ferromagnetic layers. Middle: Measured voltage emitted by the ̃ oscillator as a function of time , for a steady current injection of 7 mA at 0H = 430 mT. The dotted blue line highlights the amplitude ̃ . Right: Voltage amplitude ̃ as a function of current at 0H = 430 mT. The typical resulting excursion of voltage amplitude is highlighted in magenta when a input signal Vin of 250 mV is injected (here for a bias current of 6.5 mA and 0H = 430 mT.) (c) Schematic of the experimental set-up. A dc current IDC as well as a fast-varying waveform encoding the input are injected in the spin-torque nano-oscillator. The microwave voltage emitted by the oscillator in response to the excitation is measured with an oscilloscope. For computing, the amplitude ̃ of the oscillator is used, and measured directly with a microwave diode. (d) Input (magenta) and measured microwave voltage emitted by the oscillator as a function of time. Here IDC = 6 mA, 0H = 430 mT. The envelope ̃ of the oscillator signal is highlighted in blue. 3

The most basic properties required to emulate neurons in hardware are non-linear responses to inputs and insensitivity to noise. Our central idea is to exploit the stability and non-linearity of the amplitude dynamics of spin-torque nano-oscillators for neuromorphic computing. At steady state, the amplitude of oscillations is indeed confined in an attractive potential well set by the balance between the torque exerted by the injected current and the magnetic damping, which enhances its robustness to noise28. In addition, as shown in the right panel of Fig. 1b, the response of the amplitude ̃ to current is non-linear: ̃ , where is the current threshold above √ which steady oscillations occur29. We therefore propose to encode neural inputs in the current injected in the oscillator and to use the amplitude response ̃ as the neural output The nanoscale oscillators used for this study are circular magnetic tunnel junctions, with a 6 nm thick FeB free layer 375 nm in diameter, and a magnetic vortex as a ground state (more information in Methods). Using an arbitrary waveform generator, we inject a varying current though the junctions in addition to the dc current, using the set-up schematized in Fig. 1c. The resulting voltage oscillations, recorded with an oscilloscope, are shown in Fig. 1d. As can be observed, the amplitude of the oscillator, highlighted in blue, varies in response to the injected dc current, with a relaxation time of a few hundred nanoseconds28. In the following, for neuromorphic computing tasks, we measure directly the signal amplitude dynamics ̃ with a microwave diode. Spoken digit recognition with a single spin-torque nano-oscillator Recent studies have elegantly pointed out that time-multiplexing can be leveraged to emulate a full neural network with a single oscillator30–32. Here we use this approach, a form of “reservoir computing”4,5, to demonstrate the ability of spin-torque nano-oscillators to realize neuromorphic tasks. We perform experimentally a benchmark task called spoken digit recognition. The input data, taken from the TI-46 database33, are audio waveforms of isolated spoken digits (0 to 9) pronounced by five different female speakers. The goal is to recognize the digits, independent of the speaker. The main steps of the procedure for spoken digit recognition are illustrated in Fig. 2a. As acoustic features are mainly encoded in frequencies, each audio waveform is first filtered to different frequency channels, a standard procedure for speech recognition34. In this study, we decide to compare two different filtering methods: a spectrogram obtained by a Fourier transform and a more complex cochlear model35, based on the behavior of biological ears. The filtered input is preprocessed as in Refs30–32 (see Methods), then applied as a current to the oscillator via the arbitrary voltage generator with the set-up of Fig. 1c. The transient responses of the oscillator’s voltage amplitude ̃ are recorded for each utterance of each spoken digit. Then, using a computer, data points sampled from these time traces (e.g. ̃ in Fig. 1d) are linearly combined to reconstruct ten different outputs, one for each digit (see details in Methods). Each of these outputs should be as close as possible to 1 when the corresponding digit is presented at the input and near 0 otherwise. Finding the optimal coefficients of the linear combination satisfying these conditions is called training. As each digit has been pronounced ten times by each of the five speakers, we can use part of the data to determine the coefficients (training), and the rest to evaluate the recognition performance (testing) (see Methods for more details). In order to assess the impact of our oscillator on the quality of recognition, we always perform a control trial without the oscillator. In that case, the pre-processed input traces are used directly to reconstruct the outputs on the computer, without going through the experimental set-up.

4

Figure 2: Spoken digit recognition (a) Principle of the experiment. (b-c) Spoken digit recognition rates in the testing set as a function of the number of utterances N used for training (since there are many ways to pick the N utterances used for training out of 10, the recognition rate is an average over all combinations) (b) for the spectrogram filtering (µ0H = 430 mT, IDC = 6 mA) and (c) for the cochlear filtering (µ0H = 448 mT, IDC = 6 mA). The magenta curves are the experimental results using the magnetic oscillator. The black curves are a control trial, in which the pre-processed inputs are directly used for reconstructing the output on a computer, without going through the experimental set-up. The error bars correspond to the standard deviation of the word recognition rate based on all possible choices of N utterances of training out of ten, shown above and below the mean value. In both Fig. 2b and c, the magenta curves corresponding to experiments are above the black curves of the control trial, indicating that the spintronic oscillator improves the quality of spoken digit recognition, despite the added noise concomitant to its nanometric size. In Fig. 2b, we present a case in which the acoustic feature extraction, achieved through the linear spectrogram filtering, plays a minimal role in classification. As can be seen from the black line corresponding to the control trial, without the oscillator, the system is unable to discriminate speakers: the recognition rates are low, around 10 % and the performances decrease with the amount of training data (see Methods). With the oscillator (magenta line in Fig. 2b), the recognition rates are improved by more than +55 % and reach values above 65 %. This example highlights the crucial role played by the oscillator in the recognition process. With the non-linear cochlear filtering (Fig. 2c), which is the standard in reservoir computing30–32, and has been optimized for acoustic feature extraction, we achieve recognition rates up to 99.2 % (99.8% by averaging experimental traces sixteen times), as high as the state of the art (with cochlear filtering, existing software and hardware neural networks achieve recognition rates between 95.7 % and 99.8 % for the same task)30–32,36.

5

Mechanism underlying pattern recognition: analysis of a simple classification task We now examine in detail the role played by the spin-torque nano-oscillator in pattern recognition. For this purpose, we use simpler input waveforms, a series of randomly ordered sine and square full waveforms with the same period (Fig. 3a), allowing step-by-step analysis of the processes at stake. The task is to recognize if the input is part of a sine or a square waveform at each discrete value of the input indicated with red dots in Fig. 3a31.

Figure 3: Classification of sine and square waveforms. (a) Input waveform. The task is to discriminate sines from squares at each red point. There are 8 discrete red points in each sine and square waveform. (b) Pathway taken by the information when it propagates non-linearly through a spatial neural network. (c) The trajectory of a single non-linear oscillator in time can emulate neural information pathways. The state of the oscillator at time t influences the state of the oscillator at t+, which in turn influences the state at time t+2, etc. (d) Zoom on the preprocessed input waveform for a sine and a square. The corresponding fast binary input sequences are numbered from 1 to 16 (8 for the sine, 8 for the square). 12 virtual neurons are used in the conversion of output to results. (e) Envelope ̃ , of the experimental oscillator’s emitted voltage amplitude (µ0H = 466 mT, IDC = 7 mA). The trajectories created in response to the input waveform are numbered from 1 to 16 (8 for the sine, 8 for the square). (f) Superposition of the preprocessed inputs generating trajectories number (3) and (11) corresponding respectively to a sine and square. They are identical because they both code for the input value +1 (g) Superposition of the oscillator voltage’s trajectories number (3) and (11). (h) Target for the output reconstructed from the voltages ̃ extracted from each of the ̃ (i) Reconstructed output ∑ oscillator’s trajectory (red) and target (grey) in response to an input waveform with 80 randomly arranged sines and squares. The magnetic field is 0H = 447 mT, and the applied current 7.2 mA. The results in this panel are based on 24 virtual neurons separated by  = 100 ns. The working principle of neural networks is to process the information presented at the input by transforming it non-linearly while it propagates through the network7. In a trained neural network, similar inputs trigger similar pathways of information, while different inputs result in distinct 6

pathways, enabling classification and recognition of input data. In spatial neural networks as in Fig. 3b, many non-linear neurons are required for the creation of these pathways (illustrated in blue). On the other hand, as illustrated in Fig.3c, neural pathways can also be defined in time, for example by the non-linear trajectory of the amplitude of a single oscillator. However, in this case, generating a complex pathway in response to each value of the input (e.g. each red dot in Fig.3a) requires preprocessing the input30. This preprocessing consists of multiplying each point of the input by the same fast varying binary sequence (as illustrated in Fig. 3d). Each input will then trigger a specific trajectory of the oscillator’s amplitude37. In addition, if the time step of the sequence is set to a fraction of the oscillator’s relaxation time, a transient dynamical state will be generated. In that case, the oscillator’s state at time t will determine its state at time t+ and so on (Fig.3c), just as the state of neuron i determines the state of neuron j in Fig.3b. In other words, the single oscillator functions as a set of virtual neurons coupled in time in contrast to standard neural networks in which they are separated in space. We use this procedure to trigger a specific transient trajectory of the oscillator’s voltage amplitude for each of the input values in red in Fig. 3a. The fast binary sequence used for pre-processing is defined by its duration , and the number of “neurons” N that it emulates. The interval of time  separating each “neuron” is  =  N. In Fig. 3a-h, for the purpose of clarity, N is chosen small, equal to 12. Fig. 3d shows the resulting pre-processed input corresponding to 16 input (red) points, 8 for a sine, 8 for a square. The corresponding 16 measured trajectories of the oscillator’s voltage amplitude are shown in Fig. 3e. The oscillator has accomplished its role if the 8 trajectories obtained in response to a sine are all different from the 8 trajectories obtained in response to a square. This separation is difficult as sine and square can take the same values at the input (i.e. 1), as shown in Fig. 3f where the input sequences numbered 3 (sine) and 11 (square) have been superposed. However, even in this case, the response of the oscillator to these identical inputs is different, as shown in Fig. 3g, where they are superposed. This behavior, which allows a perfect separation between sine and square inputs, results from the finite relaxation time of the oscillator. Due to its memory of past events, the oscillator responds differently to identical inputs, if the preceding inputs were different (as is the case for sequences 2 and 10). Fig. 3g also demonstrates that the noise amplitude dynamics of the spin-torque nano-oscillator is low enough to distinguish clearly the two traces. In order to finalize classification, the output is reconstructed through a linear combination of all the ̃, ∑ “neuron” values (amplitudes of the envelope ̃ ) in each oscillator’s trajectory: as illustrated in Fig. 3h. For sines / squares classification we choose as a target for the reconstructed output the value for a sine, and for a square31. This requires finding a fixed set of weights wi (the same for all input sequences) satisfying the condition at all times. We realize this training procedure by recording the voltage emitted by the oscillator during the first half of the sequence composed of 160 sines and squares then performing with a computer a simple linear regression to find the optimum coefficients wi, as described in Methods. The voltages emitted by the oscillator during the second half of the input sequence combined with the coefficients wi found during the first half sequence are then used to reconstruct with a computer. As can be seen in Fig. 3i, the output reconstructed from the oscillator response (with N = 24) follows the target closely, with a root mean square (RMS) of the deviations of 11 % ± 1 % (4.1 % ± 0.2 % when traces are averaged 16 times). As a line (dashed blue line at V = 0.5 in

7

Fig. 3i) discriminates without any error between sines and squares is perfect.

for sines and

for squares, classification

Dynamical regime for optimal waveform classification

Figure 4: Conditions for optimal waveform classification (a) Top: Map of the root mean square of the deviations of the output from the target versus dc current IDC and magnetic field Happ. Bottom: Map of the maximal response over noise ratio VupVdw /V. (b) Extraction of parameters from the time traces of the oscillator’s response. Top: Maximum positive (Vup) and negative (Vdw) variations of the oscillator’s amplitude in response to the varying preprocessed input. Bottom: Noise V of the oscillator’s voltage at steady state under IDC. (c) Maps in the IDC - Happ plane of Top: The maximal response of the oscillator to the input VupVdw. Bottom: The inverse of amplitude noise 1/V. The threshold current Ith is indicated by a white dashed line. The ability of the reconstructed output to follow the target varies strongly with the bias conditions of the oscillator, as can be seen in the top panel of Fig.4a, where the root mean square of deviations changes from 10 % to over 30 %. Indeed, spin-torque nanooscillators’ dynamical features, such as the amplitude of oscillations and noise, depend on the injected dc current IDC and the applied magnetic field Happ. We show below that the breadth of amplitude variations and the amplitude noise can be used as predictors of the oscillator’s ability to perform pattern recognition. Indeed, good recognition performance is found when the oscillator responds well to the time-varying preprocessed input, with large amplitude variations Vup and Vdw for both positive and negative directions (Fig.4b, top). On the other hand, bad classification will result when the oscillator’s noise V at the bias point (Fig.4b, bottom) is high. As shown in Fig.4b, we extract these parameters from the time traces of the oscillator’s emitted voltage at each bias point, and plot in Fig.4c the corresponding figures of merit VupVdw and 1/V as a function of current and field. The red regions of large oscillations amplitude in the top panel of Fig.4c correspond to low magnetic fields where the magnetization is weakly confined and high currents where the spin torque on magnetization is maximal. The blue regions of high noise in the bottom panel of Fig.4c correspond to areas just above the threshold current Ith for oscillations where the amplitude of oscillations ̃ is growing rapidly as a function of current and becomes sensitive to external fluctuations29 . As can be seen by comparing the top and bottom panels of Fig.4c, a range of bias conditions (currents between 6 mA to 7 mA and magnetic fields from 350 mT to 400 mT) features wide variations of oscillations amplitudes and low noise, leading to perfect classification (typically achieved with RMS deviation < 8

15 %). The good match between the map of VupVdw/V in the bottom panel of Fig.4a and the map giving the performances of classification (Fig.4a, top) confirms that the best conditions for classification correspond to regions of optimal compromise between low noise and large amplitude variations. The necessity of high signal-to-noise ratio for efficient neuromorphic computing highlighted here for magnetic oscillators is a rule of thumb that applies to any type of nanoscale oscillator. Towards a fully parallel neural network of interconnected spin-torque nano-oscillators The time-multiplexing approaches that we have used to demonstrate that spin-torque nanooscillators can perform neuromorphic computing are convenient for emulating small neural networks. For the next step, parallel spatial networks containing many oscillators will be necessary to speed up the most challenging computing tasks. Indeed, a fully parallel neural network composed of one million neurons opens to possibility to compute in only one microsecond what a single oscillator would compute in one second. However, not all types of oscillators are suitable for building such networks. First, they should be able to interact dynamically as the neurons they emulate. A good measure of this property is their ability to mutually synchronize. Second, the energy consumed by an oscillator during each computation should be low. The associated figure of merit is the energy per oscillation, which is minimized if the oscillator is fast. Third, the power dissipated by the chip should be below 100 W/cm2 38. In other words, for fitting several million oscillators in a chip smaller than 1 cm2, the power dissipated by each oscillator should not exceed 10 μW. Lastly, the oscillators should have a long lifetime, allowing a large number of computations before they fail. Lateral dimensions

Energy / oscillation

Frequency

Power consumption

Demonstrated operating time

Ability to synchronize

References

CMOS neuron

> 30 µm

265 pJ

10 Hz

265 nW

Infinite

Yes

[39]

Accelerated CMOS neuron

 10 µm

8.5 pJ

1 MHz to 10 MHz

8 µW to 40 µW

Infinite

Yes

[40]

CMOS ring oscillator

6 µm

6 nJ

200 KHz

1.2 nW

Infinite

Unknown

[41]


6 µm

30 fJ

1.5 GHz

50 µW

Infinite

Unknown

[41]


 300 µm

 1 pJ

8 GHz to 16 GHz

20 mW

Infinite

Yes

[42]

Electro-optical oscillator

> 1 cm

1 nJ

 GHz

 1 W (laser)

Infinite

Yes

[30]

Memristive oscillator

> 100 nm

> 33 pJ

300 MHz

> 20 µW

> Hours

Yes

[43,44,11]

Vortex spin-torque oscillator

300 nm

3 pJ

300 MHz

1 mW

> Months

Yes

[45]

10 nm spin-torque oscillator (projection)

10 nm

100 aJ

10 GHz

1 µW

> Months

Yes

[16]

Figure 5: Table comparing different oscillator candidates for neuromorphic computing. Figure 5 compares different existing classes of oscillators according to these criteria. Oscillators based on CMOS technology can have low energy consumption and power dissipation, but they cannot be scaled down below tens of micrometers39–42. Optical and electro-optical oscillators achieve neuromorphic computing with high performance thanks to their delayed feedback dynamics, but their size is macroscopic and their energy consumption is high30–32. Memristive oscillators can emulate complex neural dynamics43,44,46, but their power consumption will need to be lowered and 9

their lifetime increased before they can be used11. Spin-torque nano-oscillators, through their strong phase-amplitude coupling9,10,28, have an enhanced ability to interact compared to other types of oscillators29, which is an advantage for building large arrays. Recently, tens of spin-torque nanooscillators have been experimentally synchronized by magnetic interactions, and results indicate that there is no upper bound to the number of oscillators that can synchronize47,48. The spin-torque vortex oscillators studied in this article fulfill all criteria except that their power consumptions is close to 1 mW, which set a probable limit to the size of achievable arrays to hundreds of thousands of oscillators. However, the power consumption of spin-torque nano-oscillators can be reduced by scaling down their dimensions. In particular, with a power consumption of 1 µW, 10 nm spin-torque nano-oscillators (achievable with current technologies15,16) fulfill all criteria that should be combined for building large scale arrays. These considerations, combined with our first demonstration of neuromorphic computing with a spin-torque nano-oscillator, open the path to building large scale spintronics neural networks that exploit magnetization dynamics for computing27.

Methods Samples Magnetic tunnel junctions films with a stacking structure of buffer/PtMn(15)/ Co71Fe29(2.5)/Ru(0.9)/ Co60Fe20B20(1.6)/Co70Fe30(0.8)/ MgO(1)/ Fe80B20(6)/ MgO(1) /Ta(8)/Ru(7) (thicknesses in nm) were prepared by ultra-high vacuum (UHV) magnetron sputtering. After annealing at 360 °C for 1 h, the resistance-area products (RA) was  3.6 Ωμm2. Circular-shape MTJs with a diameter  375 nm were patterned using Ar ion etching and e-beam lithography. The resistance of the samples is close to 40 , and the magneto-resistance ratio is about 135 % at room temperature. The FeB layer presents a vortex structure as the ground state for the dimensions used here. In a small region called the core of the vortex, the magnetization spirals out of plane. Under dc current injection, the core of the vortex steadily gyrates around the center of the dot with a frequency in the range of 250 MHz to 400 MHz for the oscillators we consider here. Vortex dynamics driven by spin-torque are well-understood49, well-controlled and have been shown to be particularly stable28.

Measurement set-up The experimental implementation to perform spoken digit recognition and sine/square classification tasks is illustrated in Fig. 1c. The preprocessed input signal Vin is generated by a high frequency arbitrary waveform generator and injected as a current through the magnetic nano-oscillator. The sampling rate of the source is set to 200 MHz (20 points per interval of time ) for the spoken digit recognition task, and 500 MHz (50 points per interval of time ) for the classification of sines and squares. The peak to peak variation of the input signal is 500 mV, which corresponds to peak to peak current variations of 6 mA, as illustrated in the right panel of Fig.1b (part of the incoming signal is reflected due to impedance mismatch). The bias conditions of the oscillator are set by a dc current source and an electromagnet which applies a field perpendicular to the plane of the magnetic layers. The oscillating voltage emitted by the nano-oscillator is rectified by a planar tunnel microwave diode, of bandwidth from 0.1 GHz to 12.4 GHz and response time of 5 ns. The input dynamic range of the diode is between 1 µW and 3.15 mW corresponding to a dc output level between 0 mV and 400 mV, 10

respectively. We use an amplifier to accommodate the emitted power of the nano-oscillator to the working range of diode. The output signal is then recorded by a real time oscilloscope. In Figs. 1b (right panel), 3e and g and 4a, b and c, the amplitude of the signal emitted by the oscillator is shown without amplification (the signal measured after the diode has been divided by the total amplification of the circuit,  +21 dB). If, due to sampling errors, the measured oscillators’ envelope is shifted with respect to the input, classification accuracy can be degraded. We use alignment marks to align our measurements with the input when we reconstruct the output. The alignment precision is  1 ns. Sine and squares classification For this classification task the input is a random sequence of 160 sines and squares with the same period. Each period is discretized into 8 points separated by a time step τ. The preprocessing consists in multiplying the value of each point by the same binary sequence generated by a random distribution of +1 and -1 values. The sequence contains 24 values, so during each time step τ, 24 neurons ̃ are emulated. For efficient classification, the time step between two consecutive transient states is selected to be θ  100 ns (about 5 times smaller than the relaxation time of the oscillator, as recommended in reference30). This time is short enough to guarantee that the oscillator is maintained in its transient regime so the emulated neurons are connected to each other, but it is also long enough to let the oscillator respond to the input excitation. The post-processing of the output consists of two distinct steps. The first is called the training (or learning) process. Its goal is to determine the set of weights , using some of the oscillator’s trajectories as examples. The second is called the classification (or recognition) process, which consist of classifying the rest of the trajectories using the weights determined during the training. This post processing is done offline on a computer. The target ̃ for the network output is 0 for all the trajectories in response to a sine and 1 for all the trajectories in response to a square. The best weights are found by minimizing the difference between ̃ and for all the examples. In practice, optimal values are determined by using the linear Moore-Penrose pseudo-inverse operator denoted by a dagger symbol ( ). If we consider the vector target ̃ containing the targets ̃ for all the time steps τ used for the training and the matrix containing all the neuron responses for all the time steps τ used for the training, then the vector W ̃ containing the optimal weights is given by: For sine/square recognition, we record five points instead of one for each “neuron” when we measure the output of the oscillator. During post-processing, we use these additional states between ̃ and ̃ to increase the number of coefficients available for solving the problem, and thus increase classification accuracy. In addition, best performances do not necessarily correspond to a target in exact phase with the oscillator’s output. For the sine/square recognition task, shifting the target by an integer number of neurons increases performance. The results presented here correspond to shifts of N/2, a value that was optimal in our case. The standard deviation of the RMS value of around 1 % (0.2 % when traces are averaged 16 times).

, obtained with 10 repetitions, is

11

Spoken digit recognition For this task, the inputs are taken from the NIST TI-46 data corpus32. The input consists of isolated spoken digits said by five different female speakers. Each speaker pronounces each digit 10 times. The 500 audio waveforms are sampled at a rate of 12.5 kHz and have variable time lengths. We have used two different filtering methods: spectrogram and cochlear model. The spectrogram (cochlear) filtering gives rise to inputs of 65 (78) frequency channels of variable durations (different number of ). In comparison with the sine square classification case, the input is composed of 65 (78) temporal variables instead of just one, so the mask is a binary matrix instead of a binary vector. We use 400 equivalent neurons ̃ to recognize the digit, so the mask is a (65 x 400) matrix for the spectrogram ((78 x 400) matrix for the cochlear model) randomly filled with +1 or -1. Each emulated neuron is fed with a combination of the 65 (78) inputs with coefficients +1 or -1. The resulting matrix is reshaped column wise into a vector which is the input of the oscillator. For this task we record one single point for each “neuron”. The target is kept in phase with the output of the oscillator. Once the response of the oscillator is recorded, 10 outputs corresponding to the recognition of each class of digit (i.e. 0,1,…,9) are constructed. The output should be +1 if the signal is the corresponding digit and 0 otherwise. The training thus consists in finding 10 sets of weights, each of them corresponding to the recognition of one class of digit. The optimum weights are found by linear regression as was previously explained for sines and squares classification tasks. During the classification phase, the 10 reconstructed outputs corresponding to one digit are averaged over all the time steps τ of the signal and the digit is identified by taking the maximum value of the 10 averaged reconstructed outputs. Indeed, the averaged reconstructed output corresponding to the digit should be close to one and the others should be close to zero. The efficiency of the recognition is evaluated with the word success rate, which is the rate of digits correctly identified. The training can be done using more or less data (here utterances). We always trained the system using the 10 digits spoken by the 5 speakers. The only parameter changed is the number of utterances used for the training. If we use N utterances for training, we use the remaining 10-N other utterances for testing. However, some utterances are very well pronounced while other are hardly distinguishable. As a consequence, the resulting recognition rate depends on which N utterances are picked for training in the set of 10 (e.g, if N = 2, the utterances picked for training could be n° 1 and 2, but also 2 and 3, or 6 and 10, or any other combination of 2 picked out of 10). In order to avoid this bias, the recognition rates that we give in the paper are the average of the results over all possible choices of combinations. The width of the error bars corresponds to the standard deviation of the word recognition rate shown above and below the mean value. During the control test in the particular case of the spectrogram filtering method, the success rate decreases while the number of utterances used for the training increases. This phenomenon is due to underfitting50. The raw spectrogram is not complex enough to allow a correct reconstruction of the target during the training and it gets worse when the number of utterances used for training increases. Adding the oscillator brings complexity and suppresses this phenomenon.

12

1. Buzsaki, G. Rhythms of the Brain. (OUP USA, 2011). 2. Hoppensteadt, F. C. & Izhikevich, E. M. Oscillatory Neurocomputers with Dynamic Connectivity. Phys. Rev. Lett. 82, 2983–2986 (1999). 3. Aonishi, T., Kurata, K. & Okada, M. Statistical Mechanics of an Oscillator Associative Memory with Scattered Natural Frequencies. Phys. Rev. Lett. 82, 2800–2803 (1999). 4. Jaeger, H. & Haas, H. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication. Science 304, 78–80 (2004). 5. Maass, W., Natschläger, T. & Markram, H. Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations. Neural Comput. 14, 2531–2560 (2002). 6. Hölzel, R. W. & Krischer, K. Pattern recognition with simple oscillating circuits. New J. Phys. 13, 073031 (2011). 7. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). 8. Abidi, A. A. Phase Noise and Jitter in CMOS Ring Oscillators. IEEE J. Solid-State Circuits 41, 1803– 1816 (2006). 9. Quinsat, M. et al. Amplitude and phase noise of magnetic tunnel junction oscillators. Appl. Phys. Lett. 97, 182507 (2010). 10. Bianchini, L. et al. Direct experimental measurement of phase-amplitude coupling in spin torque oscillators. Appl. Phys. Lett. 97, 032502 (2010). 11. Li, S., Liu, X., Nandi, S. K., Venkatachalam, D. K. & Elliman, R. G. High-endurance megahertz electrical self-oscillation in Ti/NbOx bilayer structures. Appl. Phys. Lett. 106, 212902 (2015). 12. Kiselev, S. I. et al. Microwave oscillations of a nanomagnet driven by a spin-polarized current. Nature 425, 380–383 (2003). 13. Rippard, W. H., Pufall, M. R., Kaka, S., Russek, S. E. & Silva, T. J. Direct-Current Induced Dynamics in Co90Fe10/Ni80Fe20 Point Contacts. Phys. Rev. Lett. 92, 027201 (2004).

13

14. Özyilmaz, B., Kent, A. D., Sun, J. Z., Rooks, M. J. & Koch, R. H. Current-Induced Excitations in Single Cobalt Ferromagnetic Layer Nanopillars. Phys. Rev. Lett. 93, 176604 (2004). 15. Sato, H. et al. Properties of magnetic tunnel junctions with a MgO/CoFeB/Ta/CoFeB/MgO recording structure down to junction diameter of 11 nm. Appl. Phys. Lett. 105, 062403 (2014). 16. Gajek, M. et al. Spin torque switching of 20 nm magnetic tunnel junctions with perpendicular anisotropy. Appl. Phys. Lett. 100, 132408 (2012). 17. Apalkov, D., Dieny, B. & Slaughter, J. M. Magnetoresistive Random Access Memory. Proc. IEEE 104, 1796–1830 (2016). 18. Hanyu, T. et al. Standby-Power-Free Integrated Circuits Using MTJ-Based VLSI Computing. Proc. IEEE 104, 1844–1863 (2016). 19. Slonczewski, J. C. Current-driven excitation of magnetic multilayers. J. Magn. Magn. Mater. 159, L1–L7 (1996). 20. Berger, L. Emission of spin waves by a magnetic multilayer traversed by a current. Phys. Rev. B 54, 9353–9358 (1996). 21. Tsunegi, S., Yakushiji, K., Fukushima, A., Yuasa, S. & Kubota, H. Microwave emission power exceeding 10 μW in spin torque vortex oscillator. Appl. Phys. Lett. 109, 252402 (2016). 22. Macià, F., Kent, A. D. & Hoppensteadt, F. C. Spin-wave interference patterns created by spintorque nano-oscillators for memory and computation. Nanotechnology 22, 095301 (2011). 23. Csaba, G., Papp, A. & Porod, W. Spin-wave based realization of optical computing primitives. J. Appl. Phys. 115, 17C741 (2014). 24. Pufall, M. R. et al. Physical Implementation of Coherently Coupled Oscillator Networks. IEEE J. Explor. Solid-State Comput. Devices Circuits 1, 76–84 (2015). 25. Nikonov, D. E. et al. Coupled-Oscillator Associative Memory Array Operation for Pattern Recognition. IEEE J. Explor. Solid-State Comput. Devices Circuits 1, 85–93 (2015). 26. Yogendra, K., Fan, D. & Roy, K. Coupled Spin Torque Nano Oscillators for Low Power Neural Computation. IEEE Trans. Magn. 51, 1–9 (2015). 14

27. Grollier, J., Querlioz, D. & Stiles, M. D. Spintronic Nanodevices for Bioinspired Computing. Proc. IEEE 104, 2024–2039 (2016). 28. Grimaldi, E. et al. Response to noise of a vortex based spin transfer nano-oscillator. Phys. Rev. B 89, 104404 (2014). 29. Slavin, A. & Tiberkevich, V. Nonlinear Auto-Oscillator Theory of Microwave Generation by SpinPolarized Current. IEEE Trans. Magn. 45, 1875–1918 (2009). 30. Appeltant, L. et al. Information processing using a single dynamical node as complex system. Nat. Commun. 2, 468 (2011). 31. Paquot, Y. et al. Optoelectronic Reservoir Computing. Sci. Rep. 2, (2012). 32. Martinenghi, R., Rybalko, S., Jacquot, M., Chembo, Y. K. & Larger, L. Photonic Nonlinear Transient Computing with Multiple-Delay Wavelength Dynamics. Phys. Rev. Lett. 108, 244101 (2012). 33. Texas Instruments-Developed 46-Word Speaker-Dependent Isolated Word Corpus (TI46), September 1991, NIST Speech Disc 7-1.1 (1 disc). (1991). 34. Alalshekmubarak, A. Towards A Robust Arabic Speech Recognition System Based On Reservoir Computing. (2014). 35. Lyon, R. A computational model of filtering, detection, and compression in the cochlea. in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’82. 7, 1282– 1285 (1982). 36. Yildiz, I. B., Kriegstein, K. von & Kiebel, S. J. From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems. PLOS Comput. Biol. 9, e1003219 (2013). 37. Appeltant, L., Sande, G. V. der, Danckaert, J. & Fischer, I. Constructing optimized binary masks for reservoir computing with delay systems. Sci. Rep. 4, 3629 (2014). 38. Mahajan, R., Chiu, C. & Chrysler, G. Cooling a Microprocessor Chip. Proc. IEEE 94, 1476–1486 (2006).

15

39. Livi, P. & Indiveri, G. A current-mode conductance-based silicon neuron for address-event neuromorphic systems. in 2009 IEEE International Symposium on Circuits and Systems 2898– 2901 (2009). doi:10.1109/ISCAS.2009.5118408 40. Wijekoon, J. H. B. & Dudek, P. Compact silicon neuron circuit with spiking and bursting behaviour. Neural Netw. 21, 524–534 (2008). 41. Tran, D. X. & Dang, T. T. An ultra-low power consumption and very compact 1.49 GHz CMOS Voltage Controlled Ring Oscillator. in 2014 International Conference on Advanced Technologies for Communications (ATC 2014) 239–244 (2014). doi:10.1109/ATC.2014.7043391 42. Tomita, Y. et al. An 8-to-16GHz 28nm CMOS clock distribution circuit based on mutual-injectionlocked ring oscillators. in 2013 Symposium on VLSI Circuits C238–C239 (2013). 43. Shukla, N. et al. Ultra low power coupled oscillator arrays for computer vision applications. in 2016 IEEE Symposium on VLSI Technology 1–2 (2016). doi:10.1109/VLSIT.2016.7573439 44. Sharma, A. A., Bain, J. A. & Weldon, J. A. Phase Coupling and Control of Oxide-Based Oscillators for Neuromorphic Computing. IEEE J. Explor. Solid-State Comput. Devices Circuits 1, 58–66 (2015). 45. Lebrun, R. et al. Mutual synchronization of spin torque nano-oscillators through a non-local and tunable electrical coupling. ArXiv160101247 Cond-Mat (2016). 46. Pickett, M. D., Medeiros-Ribeiro, G. & Williams, R. S. A scalable neuristor built with Mott memristors. Nat. Mater. 12, 114–117 (2013). 47. Houshang, A. et al. Spin-wave-beam driven synchronization of nanocontact spin-torque oscillators. Nat. Nanotechnol. 11, 280–286 (2016). 48. Awad, A. A. et al. Long-range mutual synchronization of spin Hall nano-oscillators. Nat. Phys. advance online publication, (2016). 49. Dussaux, A. et al. Field dependence of spin-transfer-induced vortex dynamics in the nonlinear regime. Phys. Rev. B 86, 014402 (2012).

16

50. Guyon, X. & Yao, J. On the Underfitting and Overfitting Sets of Models Chosen by Order Selection Criteria. J. Multivar. Anal. 70, 221–249 (1999).

Acknowledgements This work was supported by the European Research Council ERC under Grant bioSPINspired 682955. The authors would like to thank Laurent Larger, Bogdan Penkovsky and François Duport for useful discussions. Author contributions The study was designed by JG and MDS, samples were optimized and fabricated by ST, experiments were performed by JT and MR, numerical studies were realized by FAA, MR and GK, all authors contributed to the analysis of results and writing of paper. Additional information Reprints and permissions information is available online at www.nature.com/reprints. Correspondence and requests for materials should be addressed to J.G. Competing financial interests The authors declare no competing financial interests.

17