Reconfigurable Generic and programmable Feed

i

Reconfigurable Generic and programmable Feed Forward Neural Network implementations on FPGA By Ayman Youssef Mahgoub

A Thesis submitted to the Faculty of Engineering at Cairo University In partial fulfillment of the Requirements for the Degree of MASTER OF SCIENCE In ELECTRONICS AND COMMUNICATION ENGINEERING

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2012 i

ii

Reconfigurable Generic and programmable Feed Forward Neural Network implementations on FPGA By Ayman Youssef Mahgoub A thesis submitted to the Faculty of Engineering at Cairo University In partial fulfillment of the Requirements for the Degree of MASTER OF SCIENCE In ELECTRONICS AND COMMUNICATION ENGINEERING

Under the supervision of Professor. Dr Amin Nasar Electronics and communication department Faculty of engineering, Cairo University Dr. Karim Abbass Electronics and communication department Faculty of engineering, Cairo University

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2012 ii

iii

Reconfigurable Generic and programmable Feed Forward Neural Network implementations on FPGA By Ayman Youssef Mahgoub A thesis submitted to the Faculty of Engineering at Cairo University In partial fulfillment of the Requirements for the Degree of MASTER OF SCIENCE In ELECTRONICS AND COMMUNICATION ENGINEERING Approved by the Examining Committee Professor Doctor Amin Nasar, Thesis main Advisor. _____________________________________________________ Professor Doctor Elsayed Mostafa, Member. _________________________________________________________ Assistant professor Hossam A.H.Fahmy, Member _________________________________________________________

FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2012 iii

iv

Abstract Neural networks have diverse application in different fields; for example pattern recognition, control, and time series prediction. This wide Variety of applications motivated researchers to investigate efficient hardware neural network implementations. This work presents two novel generic, scalable and reconfigurable neural network architectures. The two architectures are implemented using field-programmable gate arrays (FPGAs) to prove their viability. Traditional Implementations of feed-forward Neural-Networks had two major drawbacks: 1) Limited resources available on the FPGA compared to the large number of multiplication operations required by the Neural-Network 2) The limited reusability of the design in regards to different applications and when used to apply different neural connections. Our proposed implementations circumvent both issues. The designs reduce resource requirements by time-sharing. The time-shared resources are arranged in a scalable configurable processing unit. The scalability allows the user to program and implement the design with a variable number of neurons; starting from only one neuron to the maximum number of neurons in any layer. The architectures also give the user the ability to reconfigure them to solve different applications. This is performed with programming-like ease and flexibility. A GUI was implemented to allow automatic configuration of the processors for different applications. The performances of the proposed architectures are compared to conventional neural networks implementation performance, and DSP performance. The results validate the ability of the proposed architectures to be reconfigured for different application with they also show that the network works at speeds comparable to dedicated implementations of neural networks. This work is the first published work; the authors know of, that introduces the ability to tradeoff between speed and area in neural network hardware implementations.

iv

v

Table of contents Contents List of figures

ix

List of tables

xi

List of abbreviations

1

Chapter 1

Introduction

2

1.0 DESIGN MOTIVATION

2

1.1 thesis overview

3

1.2 thesis organization

4

Chapter2

Background Information

5

2.1 Introduction to neural network

5

2.2 neural network advantages

6

2.3 The human brain

7

2.4 basic elements in a biological brain

8

2.5 How does a bio-neuron work

9

2.6 Translating from a biological neuron to artificial neuron

9

2.7 neural network model

10

2.8 Neural network architectures

12

2.8.1 single-layer feed forward neural network 2.8.2 Multilayer neural network 2.8.3 Recurrent networks 2.9 The feed forward phase

12 13 13 14

2.10 neural network learning algorithms

14

2.10.1 Supervised learning

15

2.10.2 Unsupervised learning

17

2.11 neural network applications

18

2.11.1 Pattern recognition

18

2.11.2 Function approximation

18

2.11.3 Control

18

2.12 hardware Neural Networks implementations 2.12.1 Hardware versus software implementations v

18 18

vi 2.12.2 ASIC neural network implementations

19

2.12.3 FPGA reconfigurable neural network implementations

20

2.13 Summary

20

Chapter 3

FPGA implementations methodologies

21

3.1 Direct implementation 3.2 reducing the number of multipliers 3.3 layer multiplexing 3.4 One neuron implementation

21 22 26 27

3.5 pipelined FPGA neural network implementations

27

3.6 Applications using FPGA neural network implementations

29

3.6.1 ECG classification

29

3.6.2 Face detection

29

3.6.3 Mobile robot navigation 3.6.4 Communication 3.6.5 Control 3.6.6 Industry and other applications 3.7 Summary Chapter4

30 30 31 31 31

The proposed architectures

33

4.1 Introduction

33

4.2 The First design

33

4.2.1 The neuron cell

34

4.2.2 The code ram

35

4.2.3 Weight memory

35

4.2.5 Control unit

35

4.3 The Second architecture

38

4.3.1 Control unit

39

4.4 Software components

40

4.5 system operation

42

4.6 Summary

44

Chapter 5

Applications and testing

43

5.0 applications and testing

45

5.1 Time series prediction

45

5.2 ECG classification

49 vi

vii 5.3 Discussion and Conclusion Chapter 6 Conclusion and 6.0 Work conclusion 6.1 Future work Appendix A Appendix B Back propagation matlab code REFERENCES

53

suggestions for future work

vii

53 53 54 55 57 55 59

viii

Acknowledgement I take this opportunity to express my regards and gratitude to all who contributed to this work, as this work was not the outcome of a single person. First of all I want to thank my supervisors Dr. Amin Nasar and Dr. Karim Abbass who without their help and encouragement this work would not have been possible and who guided me through this thesis steps; most of the novel ideas and solutions found in this thesis are the result of numerous discussions with them. At last, I want to thank my family whose support, sacrifice, without which it would have not been possible for me to complete this work.

Ayman Mahgoub

viii

ix

List of figures Figure(2.1) brain nn diagram ............................................................................................................ 7 Figure(2.2) Neuron cell ..................................................................................................................... 8 Figure(2.3) nn mathematical model ............................................................................................... 10 Figure(2.4) neuron activation functions ......................................................................................... 11 Figure(2.5) neuron activation functions ......................................................................................... 11 Figure(2.6) And,or and not gates truth tables ................................................................................ 12 Figure(2.7) And,or and not gates neural networks ......................................................................... 12 Figure(2.8) neural network architectures ...................................................................................... 13 Figure(2.9) supervised learning architecture ................................................................................. 15 Figure(2.10) clustreing algorithm .................................................................................................. 16 Figure (2.11) neural network control model .................................................................................. 18 Figure (3.1) conventional neural network........................................................................................ 21 Figure(3.2) neural network with reduced number of multipliers ................................................... 22 Figure (3.3) serial neuron architecture .......................................................................................... 23 Figure(3.4) neural network layer architecture ............................................................................... 24 Figure(3.5) neuron architecture with approximation sigmoid ....................................................... 25 Figure (3.6) neuron activation approximation ............................................................................... 25 Figure(3.7) layer multiplexing architecture..................................................................................... 26 Figure(3.8) one neuron one layer ................................................................................................... 27 Figure(3.9) conventional neural network without pipelining ........................................................ 28

ix

x Figure(3.10) pipelined neural network implementation ................................................................ 28 Figure(4.1) first design architecture................................................................................................ 34 Figure(4.2) load weight flow chart ................................................................................................. 36 Figure(4.3) store flow chart ............................................................................................................ 37 Figure(4.4) second design ............................................................................................................... 39 Figure(4.5) GUI flow chart............................................................................................................... 40 Figure(4.6) compile GUI ................................................................................................................. 41 Figure(4.7) memory component GUI ............................................................................................. 42 Figure(4.8) flow chart to explain the system operation.................................................................. 43 Figure(5.1) time series prediction flow chart ................................................................................. 46 Figure(5.2) 2-4-1 neural network .................................................................................................... 47 Figure(5.3) simulated results .......................................................................................................... 47 Figure(5.4) ECG normal beats ........................................................................................................ 49 Figure(5.5) ECG classification steps ................................................................................................ 50 Figure(5.6) neural network vs epochs ............................................................................................ 51

x

xi

List of tables table 4.1 control unit structures .................................................................................................... 35

table 5.1 time series prediction clock cycles and time ................................................................... 48 table 5.2 resources needed(time series prediction) ...................................................................... 48

table 5.3 ECG dataset ..................................................................................................................... 50

table 5.4 clock cycles(ecg classification) ......................................................................................... 51 table 5.5 resources needed( ecg classification resources) ............................................................. 52

xi

xii

List of equations

Equation(2.1) the weight update ................................................................................................... 18 Equation(2.2) output neuron delta ................................................................................................ 18 Equation(2.3) the hidden layer delta ............................................................................................. 18

Equation(5.1) Mackey-Glass series ................................................................................................ 45 Equation(5.2) Mackey-Glass discrete ............................................................................................ 45

xii

1

List of abbreviations ALU

arithmetic logic unit

FPGA

field programmable gate array

DSP

digital signal processing

VHDL

Verilog high description language

GUI

graphical user interface

3D

three dimensions

1

2

Chapter 1 INTRODUCTION 1.0DESIGN MOTIVATION Neural networks have many applications in different fields; pattern recognition [1-4], function approximation, robotics navigation, and control applications. This wide variety of applications makes neural networks a very important subject for research. Neural networks are implemented either in software or hardware. Software implementations have the advantage of available toolboxes that help the designer to implement the application without the need to know the inner elements of the neuron. The designer specifies only the number of layers, the number of neurons in each layer, the inputs, and the target. The software then evaluates the neuron network outputs using the available general purpose processor hardware (ALU units, registers,).However software implementations of neural networks are only viable for finding offline solutions to problems and are not suitable for real-time applications. This disadvantage opened the way for hardware implementations of neural network. The idea is to design dedicated hardware just for the neural network. This hardware would then be able to implement neural network applications in a time efficient manner. Hardware implementation achieves this performance gain by exploiting the natural parallelism in neural networks. There are two major issues with such implementations, first the designer needs to know the inner elements of the neural network and make some approximations to model the neuron with an acceptable Error, second the neural network hardware must be easy to reconfigure for different applications with different neural network layers and different number of neurons per layer. This thesis addresses both these issues. The main target for this research is to design a scalable and reconfigurable neural network hardware that

2

3

approximates the neural network, and that is easy for the designer to reconfigure for different applications.

1.1thesis overview In this thesis, two novel architectures for neural network hardware implementations are proposed. The designs are scalable and reconfigurable; In both designs we integrate resource reutilization techniques with pipelining to implement a generic feed forward network on FPGA with high speed and low cost. The designs are parameterized to give the user the ability to choose the number of neurons, layers, and range for both weights and input samples. The design can be programmed to solve problems with different numbers of neurons. This can range from one neuron to the maximum layer's number of neurons. This all done by changing one parameter in the VHDL code (the number of neurons in the design), all other parameters are dependent on this parameter. This gives the user the ability to trade between speed and the resource utilization of the network, and choose the appropriate specifications for the application at hand. A compiler and a GUI that allow the user to reconfigure the design with different applications are implemented. Both designs are tested using two applications (ECG classification, and time series prediction). Both applications were implemented with different Neural-Network architectures. A comparison between these configurations is presented and shows the user's ability to trade between area, power, and speed. Also a comparison is done between FPGA solution speed and DSP solution speed. The architecture consists of different components: - programmable control unit, scalable neuron, data memory, code memory, and scalable registers. The design components are described in VHDL code and integrated into single reconfigurable and programmable designs.

3

4

1.2THESIS ORGANIZATION The thesis is organized as follows. Chapter 2 discusses the needed background information about neural networks with a brief introduction to its history and structure. Chapter 3 summarizes the related work from literature. Chapter 4 introduces the proposed neural network architectures, and the design of their components. It also discusses the software used in programming the neural network designs and its scripting language commands .Chapter 5 discusses two different applications for the neural network, and testing these applications on the hardware design. Finally chapter 6 concludes of the work and makes suggestions for future work.

4

5

Chapter 2 Neural networks 2.1 INTRODUCTION TO NEURAL NETWORK The human brain is one of the most powerful calculating and learning machines. This machine can do different sophisticated tasks with no time with very high accuracy. One example of such tasks is reading a written text. In this task the human eyes take a snapshot of the written text. The brain processes this image, and recognizes each character separately. Finally, the brain combines these characters and extracts information. The brain, as it is ALLAH's creation, can do this task and other tasks with more accuracy than any known human made system. Another example that proves the importance of the neural network is the pigeon's brain, to test the pigeon brain; a Pigeon was placed in a closed box and then presented to it some paintings of two different artists (e.g. Chagall / Van Gogh). The pigeon are rewarded for pecking the right artist when presented with that artist work. Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with Pictures they hadn't been trained on) [5]. This proves that Neural networks do not simply memories the pictures, they can extract and recognize patterns (e.g. artistic style), and they generalize the art style from the already seen paintings to make predictions about new paintings. This is what neural networks (biological and artificial) is good at (unlike conventional computer) it can generalize the problem. There is no precise definition of neural network, but an acceptable definition is that a neural network involves a network of simple processing elements (called neurons). These elements should be able to show a complex global behavior that simulates the human brain behavior [5].Another definition is that a neural network is an artificial representation of the human brain that simulates its learning process [6].

5

6

2.2 NEURAL NETWORK ADVANTAGES The two main strength points of neural network are its highly parallel structure and its ability to generalize problems. However neural networks have other strength points:1- Nonlinearity. Artificial neural networks can solve both linear and nonlinear problems. This is very important as there are many nonlinear problems that we need the neural network to solve.[7] 2- Learning through examples. Artificial neural networks learn by examples. A random selected input is presented to the neural network, then the error or the difference of the neural network response from the target response is used to update the neural network weights. [5]. 3- Fault tolerance. Neural networks are very robust against noise, because the computations are distributed between the neurons. An error in one neuron will have small effect on the final decision [8]. 4- VLSI. The highly parallel nature of neural networks motivates VLSI implementations to benefit from this parallel nature. 5- Adaptable. Neural networks have a built in capability to adapt to changes in the surrounding environment. This is done by adapting the neurons weights (learning process), as the neurons weights represent the knowledge stored in the neural network. In conclusion we can say that neural networks can learn from the input sequence and the surrounding environment; it is a nonlinear system thus being more suitable to the solution of nonlinear problems. Solutions developed by ANNs are fault tolerant, this is to the distributed nature of computation, as a change in a neuron weight or output will have small effect on the network output. The final advantage of ANNs is that their calculations are highly parallel; this inherent parallelism means that they are more suitable for hardware implementations, particularly VLSI implementations.

6

7

2.3 The human brain To understand the neural network structure and components we need first to understand how the human brain works and its biology. The human brain can be viewed as three-stage system as shown in Figure (2.1). The first stage is the receptors that convert the inputs coming from other human body parts into electrical pulses. The second part is the neural network which takes the electrical pulses coming from the receptors and makes appropriate decisions. The third part in the human brain is the effectors. The effectors are responsible for converting brain pulses to actual response.

FIGURE (2.1) Brain neural network diagram The brain is slower than digital computers in the mathematics computation; however, the brain is many times faster in pattern recognition, perception, motor control, and computer vision applications. Neuron cells work in millisecond range, while silicon logic gates work in nanoseconds range. However the highly parallel nature of neural networks over compensates this speed difference. It is estimated that there are approximately 10 billion neurons in the human cortex, and 60 trillion synapses or neuron connections [4]. The result for this is that a human brain is more energy efficient than the best computer.

7

8

2.4 BASIC ELEMENTS IN A BIOLOGICAL BRAIN Neuron cells are the basic element in a biological neural network. It is estimated that there are approximately 100,000,000,000 neurons in a human neural network. Each neuron is connected with at least 10,000 other neurons. It is also estimated that the average power dissipation due to the brain activity is to be in the range of 10 watts [5].The biological neuron structure is shown in Figure (2.2).

FIGURE (2.2) Biological neuron Source: http://www.neuralpower.com/technology.htm The neuron cell consists of four different parts (cell body, dendrites, axon, and synapse). The cell body is the signal processing unit in the neuron [9]. The dendrites are small fibers attached to the main body of the neuron. The neuron cell body accepts input signals from the dendrites; this means that the dendrites are the receptors for the neuron inputs. The axon is a long fiber that takes the output from the neuron cell. The last part in the neural network is the synapse; the synapse works as the connection junction between this neuron to other neurons.

8

9

2.5 HOW DOES A BIO-NEURON WORK The input electrical signals come from the other neurons (connected to the neuron) dendrites into the cells' synapse, when the input signal reaches the synapse it causes the release of chemicals in the cell. The chemicals coming from all the neurons some up in the cell body if this sum of these chemicals is bigger than a certain threshold, the neuron fires and sends out an impulse signals to other neurons through the axon. This firing action makes it suitable to model by an activation function with 0 and 1 output (fire and not fire).

2.6 TRANSLATING FROM A BIOLOGICAL NEURON TO ARTIFICIAL NEURON The next step is building a mathematical model that can model the neurons behavior. The dendrites work as an input receptor from other neuron cells; that is why it can be replaced by input samples coming from other neuron cells. The synapses can be transformed to a multiplication of the input samples with different weights. The cell body sums all the chemicals coming from different synapse and checks weather to fire an output signal or not, so it can be modeled with a simple summation operation proceeded by an activation function. This artificial neuron model simulates physical neuron with a good approximation, and at the same time it is easy to implement.

9

10

2.7 neural network model A neuron is the basic processing unit that is fundamental to the operation of a neural network. In this section we discuss the neuron model which is the basic unit for neural networks. Biological neural networks are more complex than the mathematical models discussed here for ANNs, but this is considered an acceptable approximation model. We identify three parts of the neural model as shown in Figure (2.3) [9].

Figure (2.3) Artificial Neuron model 1-A set of synapses, or connecting links, each is characterized by a weight. This is shown in the block diagram as we see each input xj at the input of the synapse j connected to a neuron k is multiplied by the synaptic weight wkj. 2- Adder that sums the multiplications of the input samples and weights. 3- An activation function to limit the neuron's amplitude. The activation function is a very important part in the artificial neural network, which has a critical effect on the neural network performance. The activation function defines the output of a neuron in terms of the induced input in biologically inspired neural networks.

10

11

There are three types of the activation functions are which popular. These are the threshold function, the piecewise function, and the sigmoid function [5].These three types are shown in Figure (2.4).

Figure (2.4) Step, Sign, and sigmoid Neuron activation functions The artificial neural network is constructed from neuron cells, these neuron cells are arranged in a layered structure. Each layer consists of a number of neurons. There are three types of layers, input layer, output layer and one or two hidden layers. Figure (2.5) shows an example of a neural network structure the neural network have three input neurons, four hidden neurons, and three output layer neurons.

Figure (2.5) Neural network 11

12

2.8 Neural network architectures

There are three fundamentally different classes of network architectures [5]:2.8.1 Single-layer feed forward networks Single layer feed forward networks have only an input and an output layer. With the single layer notation referring to the output layer as the input layer is not taken into account. Single-layer feed forward networks can be used to implement the basic logic gates. This is done by finding the appropriate connection weights and neuron thresholds. Figure (2.7) shows explicitly how to construct simple networks that perform NOT, AND, and OR. NOT

AND

OR

0

1

0

0

0

0

0

0

1

0

0

1

0

0

1

1

1

0

0

1

0

1

1

1

1

1

1

1

Figure(2.6) And,or and not gates truth tables

Figure(2.7) And,or and not gates neural networks 12

13

It is a well known fact that any other complex logic circuits can be implemented using these simple logic gates. The resulting network of course will be more complex and has many layers. 2.8.2 Multilayer neural network This type of neural networks has an input layer, an output layer, and has multiple or one hidden layers in between. The hidden term refers to the fact that this part of the neural network is not seen directly from either the input or output of the network. The source nodes in the input layer of the network supply the input vector. Figure (2.8) shows an example of a multilayer neural network. 2.8.3 Recurrent networks A recurrent neural network unlike feed forward neural networks in has at least one feedback loop. Figure (2.8) shows an example of a recurrent network.

Figure (2.8) Neural network architectures

13

14

2.9 THE FEED FORWARD PHASE Neural networks work in two phases: 

The learning phase.



The feed forward phase.

The feed forward phase starts after the learning phase of the neural network. The feed forward phase of the neural network by presenting the input samples to the input layer neurons. Each neuron perform the sum of the products and weights, and then performs the activation function. The final outputs of neurons in this layer become the inputs to the neurons to the next layer. This process is repeated layer by layer among the hidden-layers, and the final outputs of the last hidden layer feed into output-layer as its input. Of course before the neural network work in feed forward phase, the neural network must be trained to saturate the neural weights to a value that minimizes the error. This phase is called the learning phase.

2.10 NEURAL NETWORK LEARNING ALGORITHMS Basically, learning is a process by which the Synaptic weights of a neural network are adapted through a learning process. The learning process may be classified as follows: 

Learning with a teacher, also referred to as supervised learning.



Learning without a teacher, also referred to as unsupervised learning.

14

15

2.10.1 Supervised learning A teacher is presented during the learning process. The teacher presents the expected output of every input pattern used to train the network. In other words the teacher produces the desired output. The network output is then compared with the target patterns to calculate the error (the difference between the neural network output and the desired output produced by the teacher output). This error is used to calculate the weights update. After updating the weights, the new weights are used to improve the system performance. This process continues until the target error percent is reached. This form of supervised learning is the basis of error-correction learning as we see in the figure (2.9) [5] that the supervised-learning process constitutes a closed-loop feedback system.

Figure (2.9) Supervised learning architecture

15

16

back propagation learning algorithm The back propagation algorithm [13] is one of the most used algorithms of ANN training. Back Propagation is an example of a supervised learning algorithm. The back propagation algorithm was developed by Paul Werbos in 1974 and rediscovered by Rumelhart and Parker [14]. The main idea of the back propagation algorithm is to update the neural network weights until it reaches its optimum values for solving the problem or application in hand. Back Propagation networks are ideal for simple Pattern Recognition applications. The weight update equation is as follows

Where W (t+1): the new weights.

( + 1) =

( )+∆

(2.1)

W (t): the old weights.

∆W: the weights update.

This weight update depends on the error (the difference between the neural network output and the target output). First the error is given by the following equation

∆ =( And Where

:

=

,

= .

.

WHERE

: learning rate.

−

). . (1 −

For the output neuron

.∑

. 1−

: ℎ

) (2.2) .

(2.3)

The final point we need to discuss in back propagation algorithm is when the algorithm stops or when it reaches the optimum solution. There are two stopping criteria, the first stopping criteria is when the neural network reaches an acceptable error (very small error 16

17

that can be specified by the user; the second stopping criteria is to stop after a certain number of trials (epochs) for example 1000 epochs. A detailed matlab code for the back propagation algorithm is given in appendix A. 2.10.2 Unsupervised learning In this type of learning there is no teacher for the neural network, the neural network is responsible for learning on it is own; this means that the neural network learns by adapting to the structural features in the input patterns. In Unsupervised learning the machine simply receives inputs x1, x2. . .However it doesn't know the desired outputs for certain inputs. It may seem somewhat difficult the machine could possibly learn given that it doesn’t get any feedback from its environment. However unsupervised learning can be thought of as a process of finding patterns from the input data classes and build a representation for each data class. Clustering:Clustering is defined as a technique for finding similarity groups in data, called clusters [10-12].It groups data sample that are similar to each other in one cluster and data samples that are very different from each other into different clusters as shown in Figure (2.10).

Figure (2.10) Clustering algorithm 17

18

2.11 neural network applications There are three main neural network tasks [5]. 2.11.1 Pattern recognition Pattern recognition is defined as the process whereby a received signal/pattern is assigned to one of a prescribed number of classes [5]. In pattern recognition applications the problem is divided to two main functions. The first function is feature extraction, and the second function is classification using its previous learned knowledge [15]. 2.11.2 Function approximation Function approximation is the problem of predicting new function values from previous known output values, without the need to solve the function's equations. Function approximation is the fundamental problem in a majority of real world applications, such as prediction, pattern recognition, data mining and classification [16].Different methods have been developed to address this problem; one of them is using artificial neural networks. 2.11.3 Control Neural networks are used also in system control in different types of systems [17], for example in mobile robot movements. Figure (2.11) gives a detailed block diagram for a neural network used in control systems. Neural network can be used to control the system or to predict the system response for certain inputs.

Figure (2.11) Neural network control model

2.12 HARDWARE NEURAL NETWORKS IMPLEMENTATIONS 2.12.1

Hardware 18

19

versus software implementations Implementation of Neural-Networks can be accomplished either in software or hardware. Software solutions are naturally fast to develop and very flexible in their reprogramming, however their main problem is that it takes a general purpose processor too much time to solve the application. This is because an ANN application by definition has quite a large number of multiplications associated with the large number of connections in the network. These operations are handled very inefficiently by the general purpose ALU of a processor. This opened the way for dedicated hardware implementations like on Field-Programmable Gate Arrays (FPGA), and Application Specific Integrated Circuit (ASIC). 2.12.2 ASIC neural network implementations ASIC implementations allow a very efficient implementation of the ANN. ASIC implementations of ANN are both area and power efficient; however, they do not offer configurability by the user, once a design is set we are stuck with it. The second disadvantage of ASIC implementation is its expense; producing an ASIC chip needs a large number of ordered ICS which may not be the case in research areas. ASIC implementations can be either analog [18], [19].or digital implementations. Both implementations have advantages and disadvantages. Analog ASIC implementation

Digital ASIC implementation

Advantages:-

Advantages:-

1- Suitable implementation for sigmoid

1-the design is less affected by electrical

nonlinearities.

noise, temperature variations, and power

2- Analog hardware is easy to interface supply variations. with real time application.

2-simple tradeoff between area, speed , and power

Disadvantage :-

Disadvantages:-

1-can is affected by temperature variations, 1-not easy to interface with real time electrical noise, and power supply variation. applications. Needs A/D and D/A. 2-complex

relations

noise, power, and area.

tradeoffs

between 2-cannot

implement

sigmoid

without an approximation.

19

function

20

2.12.3 FPGA reconfigurable neural network implementations FPGA implementation is the most suitable hardware implementation for Neural-Network applications, as it allows the user to reconfigure it for different applications and designs. FPGA neural networks preserve the parallel architecture of the Neural-Network. In the same time it provides low cost hardware implementations. FPGA implementations face three major challenges, first NNs require large resource as there are many multiplications and nonlinear excitation functions, and the second challenge is the need for the design to be generic to solve multiple applications. The third challenge is the length of the design cycle and the ability of the design to practically solve real time problems. This work addresses these challenges.

2.13 Summary This chapter presented the background information related to neural networks. The chapter discussed neural networks definitions, its importance, and its advantages. The chapter introduced the neuron's mathematical model and the different types of neural networks. The chapter also discussed neural networks learning algorithms and neural networks applications. Finally, the chapter introduced different neural networks hardware implementations and the advantages of FPGA neural network implementations.

20

21

Chapter 3 Literature survey and previous architectures 3.0 Introduction This chapter discusses different architectures and ideas for implementing neural network. This chapter also introduces examples on these architectures from literature, and compares between these different architectures. Through this discussion the advantages of the proposed architectures are explained.

3.1 Direct implementation (conventional neural network) In this implementation the neural network is implemented in hardware and all the neuron multiplications are done in parallel. This means that all the neurons and the connections between them are implemented as shown in figure (3.1). The advantage of this architecture is high speed as all the neurons work in parallel. The disadvantage here is the large number of multipliers needed. This architecture is not often used in FPGA implementations.

Figure(3.1) Conventional Neural network 21

22

3.2 reducing the number of multipliers The direct implementation needs a high number of multipliers and connections. Multipliers are hardware-expensive. The number of multipliers can be reduced by sharing one multiplier for all incoming connections in every neuron. Consequently, each neuron only processes one input value every clock cycle. Now all neurons in one layer serially process all outputs from the previous layer.

Figure (3.2) Neural network with reduced number of multipliers

22

23

This architecture has attracted many researches to implement neural network applications, as it requires small number of multipliers and FPGA resources. This small number of multipliers needed allow large applications to fit in small FPGAS. The neuron structure used for this implementation is shown in Figure (3.3). In this structure there is one multiplier and one accumulator per neuron, the inputs are entered serially to the neuron and multiplied by the weights from the ROM. The result from this multiplication is accumulated to previous multiplication result. The number of clocks needed for the neuron to finish its work equals the number of input samples from the previous layer. This reduces the design speed but at the same time reduces the resources needed for the neuron (the number of multipliers needed).

Figure (3.3) Serial Neuron architecture 23

24

The layer architecture in this implementation has one input which is connected to all the neurons in that layer as shown in Figure (3.4).

Figure (3.4) Neural network layer architecture Chaui and others [20] introduced a Neural Network Based Adaptive Control of a Flexible Joint using this architecture. The proposed controller takes into account the nonlinearities in the motor and load models. The design implements both learning phase and feed forward phase. The design was tested on Virtex 2 FPGA. The proposed controller is capable of working with high speed, due to its highly parallel structure. Boubaker and others [21] proposed neural network FPGA architecture for alertness classification based on this architecture. The proposed system works on EEG signals. The 24

25

system has on-line learning capabilities. In the proposed work they used the serial neuron architecture to reduce the number of multipliers needed. Smach and others [22] introduced a neural network FPGA detector using this architecture. The proposed architecture was implemented using C language to extract the neurons' weights. The feed forward neural network was implemented using VHDL. Another work we considered “design artificial neural network using FPGA” [23]. The neuron design in this work is close to previously discussed neuron structure. It consists of a multiplier, accumulator, and an activation function. The input enters the neuron block serially and is accumulated to the previous multiplier results, the number of clock cycles needed for the neuron to finish its work, equals the number of connections from the previous layer. In this work each neuron had its own activation function as shown in figure (3.5) [23]; this means more resources for the neurons with less number of clock cycles and less speed.

Figure (3.5) Neuron architecture with approximation sigmoid [23] Also presented an approximation approach to implement sigmoid function to reduce the resources needed for the activation function as shown in figure (3.6).

Figure (3.6) Neuron activation approximation 25

26

3.3 LAYER MULTIPLEXING Layer multiplexing is a resource efficient neural network implementation. In this approach only the layer with the maximum number of neurons is implemented, which lowers the resources needed for the design as shown in figure (3.7). The control unit controls this layer to implement all the neural network layers. This approach allows implementation of large neural network applications using fewer resources on FPGA. Consequently we can fit a large application within a small FPGA.

Figure (3.7) Layer multiplexing architecture

This methodology was first introduced in 2007 by S. Himavathi et al. [24]. They used reconfigurable FPGA (Xilinx FPGA xcv400hq240) to implement a multilayer feed forward neural network. 26

27

In this design, the neuron layers were reused, only one layer of neurons is implemented. The control unit controls this layer to implement all the neural network layers; this reduced the logic resource usage of the chip.

3.4 ONE NEURON IMPLEMENTATION The last approach for resource reduction is using one neuron only to implement the application. This shown in figure (3.8). The idea is simple; this neuron will replace all the neurons in the network. Each time the neuron output will be saved, and reentered again. A control unit also needed to allow this neuron hardware to do the functionality of all the neurons in the neural network.

Figure (3.8) One neuron one layer

3.5 PIPELINED FPGA NEURAL NETWORK IMPLEMENTATION Another approach for neural network implementations is pipelining. The conventional multilayer neural network has a characteristic that the neurons in a layer depend on the neurons in the previous .This is shown in Figure (3.9). 27

28

Figure (3.9) Conventional neural network without pipelining

The main idea for pipelining is making all layers work together at the same time as shown in figure (3.10). The first layer is computing the input pattern t, the second layer is busy calculating the result for pattern t-1 [25]. The third layer is producing the output of pattern t-2. The neural network in this case produces an output in every clock cycle.

Figure (3.10) Pipelined neural network implementation

Lin and others [25] proposed an architecture that integrates the layer multiplexing and pipeline architecture together. The proposed architecture aims to decrease the needed resources using layer multiplexing architecture, and decrease the speed over head using pipelining techniques.

28

29

3.6 APPLICATIONS USING FPGA NEURAL NETWORK IMPLEMENTATIONS There are many papers in literature that discuss FPGA neural network implementations, as FPGA NN implementations are much faster than DSP implementations. We discuss some of this work here. This section is divided between different types of applications. 3.6.1 ECG classification Recognition of cardiac arrhythmias by electrocardiogram (ECG) is a very important tool for heart diseases diagnose. Ozdemir and others [26] introduced a fully parallel neural network based ECG classifier implemented on FPGA. This work uses principle component analysis (PCA) algorithm to reduce the feature vector dimensions. A simple 8-2-1 neural network is implemented on FPGA. The system classifies between three different heart diseases with 97 % accuracy. The system was trained using MIT ECG database. Armato and others [27] introduced an FPGA based arrhythmia recognition system for wearable applications. The proposed system depends on QRS feature extraction and discrete Fourier transform. The system was implemented using MATLAB simulink, and converted to VHDL using Xilinx system generator. The system was trained using MIT ECG database. The training set was 70% of the MIT database records and the last 30% were used to test the hardware. 3.6.2 Face detection and recognition Face detection plays a crucial role in different applications, for example video surveillance and robotic vision. He [28] and his research group introduced a novel FPGA neural network implementation for face detection. The implementation depends on a cascade of neural network classifiers trained on Haar features. Nine neural network classifiers and Haar generator are implemented on Virtex5.The architecture out performs DSP implementations. The detection speed of the proposed architecture reaches 625 frames per second. Sadha and others [29] introduced a face recognition system based on FPGA implemented neural network. The system depends on neural networks and Eigen face feature. 29

30

It is capable of updating the Eigen faces when the faces data base is updated. The system demonstrates a recognition rate of more than 85%. 3.6.3 Mobile robot Most of mobile robots' algorithms are based on artificial intelligence and neural networks. Mhuyddin [30] and his research group implemented a neuro-fuzzy algorithm for robot obstacle avoidance. The algorithm was implemented on field programmable array. The algorithm allows the robot to avoid obstacles in different situations. The training process of the algorithm is done on PC until the desired weights are reached. The algorithm was tested on a 3 servo motor robots. Two motors for the robot motion, and the third motor for controlling the ultrasonic sensor used in obstacle detection. The robot was simulated and tested in nine different situations. The hardware experiments proved the accuracy of the simulated results. 3.6.4 Communication Jamel and others [31] introduced a hardware implementation of an adaptive neural equalizer. Equalizers are used to compensate for the channel effect in wireless or wire communication systems. This allows accurate detection and correction of the received bits. These applications use neural networks for their ability to reject noise. The adaptive neural equalizer is implemented using MLP (multi layer perceptions) using a three layer structure (one input layer, one hidden layer, one output layer). Back propagation learning algorithm is used for neural learning. The simulation results show that

the hardware implementation

has

similar

implementation, with the advantage of higher speed.

30

performance as

the software

31

3.6.5 Control Kim and others [32] introduced a hardware implementation for a neural network controller. The developed neural control hardware was tested by balancing an inverted pendulum. The PID controller and the neural network were implemented on FPGA Despite the large sampling time the proposed controller succeeds in balancing the pendulum. El-Madaney and others [33] introduced a spacecraft neural network controller on FPGA. The implementation of multi-layer NN using lookup table LUT reduces the Resource utilization for implementation and time for execution. The proposed design utilizes less than 25% of virtex 5 FPGA. 3.6.6 Industry and other applications Kennedy and saeid introduced a FPGA neural network device for gas detection. The neural network first is trained to the gases that need to be detected. The FPGA board is attached to a robot. The robot moves in the environment and detects the gas. This can be used for security (explosives detection) [34]. Moradi and others [35] introduced a new method of FPGA implementation of Farsi Handwritten Digit recognition. The proposed design uses neural network as a classifier. Experimental results showed that the proposed system achieved 96% accuracy.

31

32

3.7 Summary This chapter introduced different architectures for neural networks implementations. The chapter discussed the advantages and the disadvantages of the architectures. This discussion proves the importance of configurable design that allows the user to trade between area, speed, and power. Also the chapter discussed related examples from literature. Finally the chapter introduced a literature review on applications implemented on FPGA. This wide variety of applications of FPGA neural network proves the importance of a generic design that can be configured for different applications.

32

33

Chapter4 The Proposed architectures 4.1 INTRODUCTION

This chapter discusses the proposed feed forward architectures. The two proposed architectures are scalable programmable designs. All the designs parameters are generic and can be changed by the user at will. For example (the number of neural layers, the fraction number of bits, integer number of bits, number of neurons in each layer,……..).This allows the designer to reconfigure the designs for different applications. The new methodology aims to give the user the ability to implement the same application with different speed and resources depending on the application in hand and used FPGA. The first design uses a resource utilization technique for the activation function. This decreases the design area and resource requirements on the expense of speed. The second design uses a pipelining technique to increase the design speed on the expense of area. To help the user to program both the designs, a user interface GUI was build using C# this user interface allows the user to program the chip very easily. Neural network applications can be programmed on both designs, and each application can be implemented with different number of neurons ranging from one neuron to maximum layer number of neurons. So in short, the designs offer the user two levels of flexibility. First, the hardware is parameterized and the entire network and its connections can be changed at will through a simple interface. At the second level, the workings of the network can be easily described using a programming language. Thus, the user can describe different applications and trade-off performance and area for such applications at will.

4.2 The First methodology The first design aims to reduce logic resources at the expense of number of clock cycles. This is done by using a resource sharing technique on the activation function. Only one activation function is used. The neurons outputs pass through the activation function before being saved in the input memory. The design has two different buses for the weight memory and the input memory; this allows the control unit to transfer the neurons weights and inputs in one clock cycle to reduce the needed clock cycles. The design consists of five different main components. The first component is the neuron; this is considered the basic unit of the design, it consists of a multiplier, a summation unit 33

34

and an accumulation register. The second component is the control unit. This component is the most important component in the design, as its output signals control all the design components (neuron cells, multiplexer, input memory, weight memory…,). The third components are memory components: - weight memory, read write memory and code memory. These components save the output values of the neurons and the neurons weights. The design parameters are generic variables, namely: fraction bits number, integer bits number, number of neurons, and memories width. Figure (4.1) gives a detailed diagram of the first design.

Figure (4.1) First design architecture

4.2.1 The neuron cell The design of the neuron cell consists of an adder, multiplier and an accumulation register. The design of the neuron has two generic variables (the number of fraction bits, and the integer bits). These two generic variables can be assigned different values for different applications. This allows the user to control the design accuracy. 34

35

4.2.2 The code RAM The code RAM saves the control unit instructions; the code memory size and width are generic. This allows the number of neurons to be generic. 4.2.3 Weight memory The weight memory saves the neural network weights. It has three generic variables, the first variable is the number of neurons in the hidden layer, the variable second is the memory width, and the third variable is the memory size. 4.2.4 Sigmoid The sigmoid memory saves a binary approximation for the input output relation in the sigmoid function. The VHDL code of this memory is generated using Matlab with a generic code that can be used to change the type of the activation function and its accuracy (number of fraction bits). 4.2.5 Control unit The control unit controls all the design components. It takes its input from the code RAM and executes the instruction. The control unit can execute one of three instructions shown in table4.1, the first is “loadweinp”, this instruction Loads the weights and the inputs to the neuron at the same time. This decreases the number of clock cycles needed. The second instruction is “store”; this instruction stores the output of a single neuron cell in a specified memory location. This instruction takes two arguments the neuron number and the target memory address to save in, the third instruction is the “reset” instruction that resets all the memory registers. Table 4.1 Control unit instructions loadweinp Starting

Bits with no function

“00”

Weight

Input address

address

store

Starting “01”

Neuron for choosing bits

reset

Starting “11”

Bits with no function

35

Neuron address

Memory address

36

To execute each of these instructions the control unit executes the instruction steps. The following sections discus each instruction execution steps. a)-load weight and input:In the first clock cycle of this instruction the control unit writes the weight and memory address bits on the address buses of both memories. In the second clock cycle the data are transferred from the input memory and the weight memory to the neuron multiplier inputs. The last clock cycle is to enable the neuron by enabling the accumulator register. These steps are shown in Figure (4.2).

Figure (4.2) Load weight flow chart

36

37

b)-store:Figure (4.3) shows this instruction steps. In the first cycle the control unit enables the output register of the neuron cell and selects the neuron to store the data from. The second step is enabling the neuron and writing bits on the memory data buses. The third step is to write on the activation memory. The forth step is to store data into the memory.

Figure (4.3) Store flow chart

37

38

c) - Reset:This instruction resets all the memory components and registers. The reset instruction is used to clear the neurons before presenting the new layer data. In the last clock cycle of every instruction, the program count is incremented to load the new instruction op code. This is done to reduce the number of clock cycles needed for to load each instruction. This reduces the total clock cycles needed to implement the application. The load instruction loads the input sample and the neurons' weights in the same clock cycle. This is done by using different data buses for the input samples and the neurons' weights. This decreases the number of clock cycles needed for implementation of the application.

4.3 THE SECOND ARCHITECTURE

The main drawback of the previous architecture is the clock cycles needed to store the neurons outputs. The neurons outputs are saved serially; this is because there is only one shared activation function that all neurons have to wait to use. To address this draw back the second architecture is proposed. In this design each neuron has its own activation function. There is no resource sharing on the activation function. This allows all the neurons outputs to be saved in one clock cycle. The input memory in this design is divided into banks. Each neuron has its memory bank. Figure (4.4) shows the new architecture. The new architecture contains a multiplexer; this multiplexer is responsible for choosing the memory bank to load data bits from. The following section discusses the new architecture control unit and the modifications done on the other components.

38

39

Figure (4.4) Second design 4.3.1 Control unit:The control unit executes three instructions loadweinp, reset, and store. The loadweinp instruction arguments are different from the previous proposed architecture, because each input memory address saves the n neurons outputs. The loadweinp instruction takes three arguments, the first argument is the neurons weight address, the second argument is the 39

40

memory data address, the third argument is which neuron output is chosen from the memory data output. 4.4 SOFTWARE COMPONENTS The main idea of this work is to introduce a new architecture for neural network implementations that have the flexibility of software implementations, and a comparable speed with fixed implementations. That is why user friendly software is needed to program the designs. The software flow chart is given in Figure (4.5).

Figure (4.5) GUI flow chart 40

41

A very simple compiler is built to convert the neural architecture commands like ( “loadweinp”,”store”, and “reset”) to machine language. Figure (4.6) shows the compiler user interface and an example on how it works. There are some variables we need to enter into the GUI. The first variable is memory bits this gives the compiler the width of the code ram to be considered in the conversion, the second variable is the number of neuron bits. In the following example the compiler converts load instructions to its binary equivalent; this binary code is copied directly on the code RAM VHDL code. This allows the user to program the design for different applications and different neural network structures.

Figure (4.6) Compiler GUI There is also a memory user interface program that takes weights from the user and converts them to binary bits. The memory interface has three variables the fraction 41

42

number, integer bits, and number of neurons. These variables allow the user to define the accuracy of the architecture and the number of the neurons in each layer.

Figure (4.7) Memory component GUI

4.5 SYSTEM OPERATION This section discusses how all these components work together to use the designs and implement the applications. Figure (4.8) shows how the user can program the architectures

42

43

to implement the required application, and at the same time how the tradeoff between speed and area is realized.

Figure (4.8) Flow chart to explain the system operation 43

44

4.6 Summary This chapter discussed the two proposed architectures. Section 4.2 discussed the first architecture's components. Section 4.3 introduced the second highly parallel architecture and the difference between the two proposed architectures. Section 4.4 introduces the software components needed for programming the architectures. The last section discussed the operation of the architectures, and how the software and hardware components work together.

44

45

Chapter 5 Applications and testing 5.0 INTRODUCTION This chapter introduces two applications to test the proposed architectures. The two applications validate the designs' ability to be reconfigured for different neural network structures and solve different problems. The applications were tested using different number of neurons to validate the design ability to tradeoff between area, power, and speed.

5.1 TIME SERIES PREDICTION The first application is time series prediction. Time series prediction is a very important problem in many real-time applications. Time series prediction is useful for weather prediction, agricultural activities (predicting water levels, predicting seed growth), economic forecasting [36-39] .The time series predicted in this work is the Mackey-Glass time series .The general equation of Mackey-Glass time series is [40].

(̇ ) =

( − ) − + ( − )

( )

( . )

Such equation cannot be implemented in MATLAB in its given form, so we use the discrete form of the equation.

( + )=

( )+

( − ) + ( − )

( . )

In this work we assume a=.2, b=.1, x(0)=1.2,time step size .1, and we use the 4th order Runge-Kutta method to solve the equation. A detailed matlab code for this problem is given in appendix B.

45

46

Figure (5.1) shows the steps for configuring the proposed architectures for this application.

Figure (5.1) Time series prediction flow chart As shown in the flow chart a 2-4-1 Neural-Network is implemented and trained in MATLAB. The weight values produced by MATLAB are loaded into the GUI and converted to binary representation. In the next step a functional simulation is done on Xilinx ISE design suite 12.1. The outputs of the Neural-Network are then saved on text file. This output is finally loaded into MATLAB and drawn as red and green points on the graph as shown in (Figure 5.2) the black line represents the time series equations solution. The blue line represents the matlab neural network output. The green and red points represent the proposed FPGA 46

47

architectures output. The network is implemented using three different design settings (using four neurons, two neurons, and only one neuron).

Figure (5.2) 2-4-1 neural network fixed point sim ulation vs vhdl

0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Figure (5.3)

Simulated results

Table.5.1 shows the clock cycles needed for the first design (FD) and the second design (SD) to implement this application ( with different number of hardware neurons), and a 47

48

comparison between both proposed architectures, conventional neural network implementations, and layer multiplexing implementations. The proposed designs are introduced in different three settings using one neuron 1nn, using two neurons 2nn, and using four neurons 4nn. TABLE.5.1 Time series prediction clock cycles and time Clock cycles

time

FD(1nn)

120

1.09 us

SD(1nn)

120

1.09 us

FD(2nn)

88

.8 us

SD(2nn)

70

.7 us

FD(4nn)

78

.709 us

SD(4nn)

53

.4545 us

LM

40

.3636 us

CNN

30

.2727 us

DSP

29640

29,64us

matlab

8.4 ms

It can be seen from the previous table that the two neural network designs' speeds are much faster than DSP speed, and in the same time comparable to fixed Multiplexing and conventional feed forward neural network designs. Table.5.2 shows a comparison between the resources needed for different architectures. TABLE.5.2 Resources needed (time series prediction)

Multipliers

Block

And adders

rams

FD(1nn)

1

NN slices

4

472

SD(1nn)

1

4

472

FD(2nn)

2

4

944

SD(2nn)

2

5

944

FD(4nn)

4

4

1888

SD(4nn)

4

7

1888

LM

4

7

CNN

7

10

48

1888 3304

49

First it can be seen from Table.5.2 that the difference in clock cycles between the two designs increase as the network size increases. When only one neuron is used both designs have the same number of clock cycles. It can be seen from the previous tables that the second design SD (4nn) has the same resources as the LM, and has comparable speed with it. This proves that the designs have a comparable speed with fixed feed forward neural network implementations with the ability to tradeoff between speed (clock cycles) and resources. 5.2 ECG CLASSIFICATION An ECG recording is a measure of the activity of the heart from electrodes placed at specific locations on the patient. The electrocardiogram (ECG) makes a useful diagnostic technique for monitoring heart activities [41-43]. Figure (5.4) shows one heartbeat. E C G s ig n a l 1 0 0 . d a t

1

V o lt a g e / m V

0.5

0

-0 . 5

28

0

0.05

1

0.1

0.15 T im e / s

0.2

0.25

Figure (5.4) ECG normal beat One of the common approaches in ECG beat classification is Artificial Neural Networks, which have shown accurate performance in different classification tasks [44-45]. In this application the MIT-BIH arrhythmia database is used. A neural network classifier to 49

50

classify between 4 different heart diseases (normal beat, left bundle beat, right bundle beat,) is built. Using 61 feature vectors extracted from the dataset signals, Back propagation algorithm is used to teach the neural network. After that the extracted neural network weights are used in the feed forward FPGA implementation. First the maximum peak of the signal is detected and 60 samples around the peak were taken, which contain useful information for classification. 41 patient files were used for training and testing the neural network and the error rate from Matlab. Table.5.3 shows the number of sample beats used to train and test the neural network. The total number of samples used in training is 5747 samples. The total number of samples used for testing 745 sample beats. TABLE.5.3 ECG dataset total 1400 Normal beat Left bundle branch block beat 1634 Right bundle branch block 1821 beat 1637 Paced beat

train 1214 1448 1635

test 186 186 186

1450

187

Figure (5.5) shows the steps for the ECG classification (First the peak of the ECG signal is detected, second we extract the feature from the ECG signal is extracted, and the last step is the neuron classification).

Figure (5.5) ECG classification steps 50

51

The error rate was found to be less than10% after that the weights data are transferred to binary representations and used inside the VHDL code. ECG beat samples were tested and results were identical to Matlab simulation as shown in Figure (5.6).

100 90 80 70 60 50 40 30 20 10 0

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Figure (5.6) Neural network error vs. epochs

Table.5.4 compares between the clocks cycles needed using different design settings. TABLE.5.4 Clock cycles (ECG classification) Clock cycles FD(2nn)

1028

time 9.34 us

SD(2nn)

1000

9.09 us

FD(4nn)

572

5.2 us

SD(4nn)

525

4.77 us

FD(8nn)

332

3.018 us

SD(8nn)

285

2.59 us

LM

230

2.09 us

CNN

213

1.936 us

DSP

64500

64,5 us

matlab

16.6 ms

Table.5.5 shows the resources needed by different architectures (number of multipliers, adders and equivalent slices). 51

52 TABLE.5.5 Resources (ECG classification) Multipliers

Block rams

And adders

NN slices

FD(2nn)

2

4

944

SD(2nn)

2

5

944

FD(4nn)

4

4

1888

SD(4nn)

4

7

1888

FD(8nn)

8

4

3772

SD(8nn)

8

11

3772

LM

8

11

3772

CNN

10

13

4720

5.3 SUMMARY This chapter introduced two applications (time series prediction, ECG classification), both applications are implemented using the two proposed architectures. The applications are implemented with different number of neurons. The results validate the designs ability to implement different applications with different number of neurons. The chapter introduced a comparison between the proposed designs, conventional neural network designs, layer multiplexing designs, and digital signal processor implementation.

52

53

Chapter 6 CONCLUSION AND SUGGETESD FUTURE WORK 6.0 DISCUSSION AND CONCLUSION This work introduced two neural network architectures, the first architecture is a resource sharing design, and the second architecture is a highly pipelined design. Both designs are scalable, and generic. We test the scalability of the designs using two different applications. The designs ability to work with different number of neurons is proven. A comparison between the proposed architectures, DSP implementations and fixed feed forward implementations is presented. The comparison shows that the proposed architectures are much faster than DSP implementation and are comparable in speed and area with dedicated feed forward neural network implementations. The two designs have comparable results when the number of neurons is small. As the number of neurons increases, the difference between the two designs in speed and resources increases. The second design has a small overhead over fixed layer multiplexing designs when configured in layer multiplexing settings (using the maximum layer number of neurons); this with the advantage of giving the user the ability to trade between speed and area. In conclusion the proposed architectures give the user the ability to decrease the resources with overhead in speed. And can be programmed for different applications with different accuracies.

53

54

6.1 FUTURE WORK The proposed architecture needs more improvement, for example the neuron cell design is fixed with one multiplier, which limits the neuron cell speed; this is done to save FPGA resources. However a modification for the design can make the number of multipliers in each neuron a user specified variable. This will give the user the ability to choose the number of neurons to implement the application, and choose the number of multipliers in each neuron. This will increase the degree of freedom the user has for trading off speed with logic resources. The activation function implemented in this work is implemented using a LUT (look up table); this is not resource efficient, an approximation of the activation function can reduce the resources needed for the activation function, while preserving the designs accuracy. The current work only discussed the feed forward phase, as all the supervised learning is done on software. The next step to improve this work is changing the designs to have on line learning capabilities as well. The new architecture will have the ability to learn on its own. The weights will be initialized randomly at first, and after the learning phase the weights can be fixed. This will minimize the need for computer computations. Current work is tested on two different areas function approximation and pattern classification. The next step is to test the new architectures on control problems, as the dynamic behavior of the neural network is most needed in such applications.

54

55

Appendix A Back propagation Matlab code clc; load mydata; n = 1.1; nbrOfNodes = 15; nbrOfEpochs = 2000; y=zeros(1,nbrOfEpochs); % Initialize matrices with random weights 0-1 W = rand(nbrOfNodes, length(Attributes(1,)); U = rand(length(Classifications(1,),nbrOfNodes); m = 0; e = size(Attributes); for m =1:nbrOfEpochs % Increment loop counter % Iterate through all examples for i=1:e(1) % Input data from current example set I = Attributes(I,.'; D = Classifications(I,.'; % Propagate the signals through network H = f(W*I); O = f(U*H); % Output layer error delta_i = O.*(1-O).*(D-O); % Calculate error for each node in layer_(n-1) delta_j = H.*(1-H).*(U.'*delta_i); % Adjust weights in matrices sequentially U = U + n.*delta_i*(H.'); W = W + n.*delta_j*(I.'); end RMS_Err = 0; % Calculate RMS error error=0; 55

56

d1=0; for i=1:625 D = Classifications2(I,.'; I = testing(I,.'; classnew=f(U*(f(W*I))); z=find(classnew==max(classnew)); z1=find(D==max(D)); if(z~=z1) error=error+1; end end c=m y(m) = (error/625)*100; end plot(y,'+r');

56

57

Appendix B Mackey-Glass time series prediction Matlab code %% Input parameters a = 0.2; % value for a in eq (1) b = 0.1; % value for b in eq (1) tau = 17; % delay constant in eq (1) x0 = 1.2; % initial condition: x(t=0)=x0 deltat = 0.1; % time step size (which coincides with the integration step) sample_n = 12000; % total no. of samples, excluding the given initial condition interval = 1; % output is printed at every 'interval' time steps %% Main algorithm % * x_t : x at instant t , i.e. x(t) (current value of x) % * x_t_minus_tau : x at instant (t-tau) , i.e. x(t-tau) % * x_t_plus_deltat : x at instant (t+deltat), i.e. x(t+deltat) (next value of x) %*X : the (sample_n+1)-dimensional vector containing x0 plus all other computed values of x %*T : the (sample_n+1)-dimensional vector containing time samples % * x_history : a circular vector storing all computed samples within x(t-tau) and x(t) time = 0; index = 1; history_length = floor(tau/deltat); x_history = zeros(history_length, 1); % here we assume x(t)=0 for –tau

Reconfigurable Generic and programmable Feed

Reconfigurable Generic and programmable Feed

Suggest Documents

Reconfigurable Generic and programmable Feed Forward Neural

Future Directions of (Programmable and Reconfigurable) - Computer

RECONFIGURABLE SINGLE AND MULTIBAND INSET FEED ...

Reconfigurable Memory Controller with Programmable Pattern Support

A Dynamically Reconfigurable Weakly Programmable ... - CiteSeerX

How Programmable is Reconfigurable Hardware? : A ...

Programmable reconfigurable self-assembly: parallel heterogeneous ...

Rapidly Reconfigurable Field-Programmable Gate Arrays ... - CiteSeerX

A Generic Architecture for Programmable Tra c

A GENERIC PROGRAMMABLE ARBITER WITH DEFAULT MASTER ...

A Programmable, Generic Forwarding Element Approach ... - CiteSeerX

Generic Design Space Exploration for Reconfigurable ... - CiteSeerX

Design of generic modular reconfigurable platforms

Generic integration tools for reconfigurable laser ...

Generic object-oriented information models for reconfigurable ...

Field-Programmable Crossbar Array (FPCA) for Reconfigurable ... - arXiv

Field Programmable DSP Arrays - A Novel Reconfigurable - arXiv

A Field Programmable Gate Array-Based Reconfigurable ... - MDPI

A Single Feed Reconfigurable Polarization Printed ... - IEEE Xplore

Analysis of a Silicon Reconfigurable Feed-Forward ... - IEEE Xplore

Generic Loop Parallelization for Reconfigurable Architectures - TU Delft

Generic medicines and generic substitution - Springer Link

Tunable and reconfigurable metasurfaces and

programmable shaders