Speech Recognition based Microcontroller for ...

5 downloads 0 Views 12MB Size Report
USART. Universal Synchronous/Asynchronous. Receiver/Transmitter. USB. Universal Serial Bus. VQ. Vector Quantization. ZCR. Zero Crossing Rate ...
    

                                                                                                     

                                                                           !" 

       

    

         

  # $  %  &' ()

   

   %  ' (    

  *%$ #  #  #     #  #       +  

           )     

 



    

      

    

       

  !  

                                   

 !      "        #      $  !%  !      &  $   '      '    ($     '   # %  % )   % *   % '   $  '      +      " %        &  '  !#       $,  ( $        -     .                                      !   "-           (    %                              .          %     %   %   %    $        $ $ -           -                           - - // $$$    0   1"1"#23."         

4& )*5/ +)     678%99:::&  %  ) 2  ; !   *   &        :? /- @7>:94& )*5/ +) "3   "    &  7>:9

Speech Recognition based Microcontroller for Wheelchair Movements  

By:

Mohammed Ehsan Safi Dr. Eyad I. Abbas

Dedication 7RWWKHVVRXOVRRIP P\IIDWKHUDDQGP P\JJUDQGIDWKHUWWR DOO WKRVH ZKR VWRRG EHVLGH PH GXULQJ P\ OLIH WR DFFRPSOLVK WKLV JRDO P\ PRWKHU P\ ZLIH P\ VLVWHUP P\JJUDQGPRWKHUP P\X XQFOH( (PDG5 5DRIP P\ DXQW( (\PDQP P\WWHDFKHUVDDQGP P\IIULHQGV

0RKDPPHG(6DIL

I 

Abstract This work introduced an approach to design and implement a system to control the movement of wheelchair by means of the human voice for paralyzed patients. This work is divided into two parts: the first part contains the speaker words recognition, simulation in computer with the aid of MATLAB (R2012a), which recognizes the speaker (user) and the command in Arabic for the four directions with stop instruction. It contained preprocessing, feature extraction and feature classification. Preprocessing is generating a signal with a fine structure as close as possible to that of the original speech signal and produces a data reduction of unwanted data with easier task analysis. The feature extraction in this work based on Mel-Frequency Cepstral Coefficient (MFCC) technique which combined with its first and second derivative including power computation of the speech frames that yields the total multi-dimensional of features as (0)&&ǻ(ǻ0)&&ǻǻ(ǻǻ0)&& . The Dynamic Time Warping (DTW) is used for feature classification, which is a method for computing the similarity between two sequences. The experimental results showed that the proposed methods gave a recognition rate 100% of the already trained speakers. The test was conducted at different sound levels of the surrounding environment (53 to 73) dB as measured by Sound Level Meter (SLM). The second part is the interface between computer and microcontroller to control the direction of wheelchair through the command coming from the computer according to speech recognition. The wheelchair is derived by two DC motors, which one controlled by microcontroller with H-bridge relays.

 // 

List of Contents Abstract....................................................................................................... I List of Contents......................................................................................... II List of Abbreviations................................................................................ V List of Symbols...................................................................................... VII Chapter One: Introduction 1.1 Overview................................................................................................ 1 1.2 Automatic Speech Recognition.............................................................. 1 1.2.1 Preprocessing................................................................................ 3 1.2.2 Feature Extraction......................................................................... 3 1.2.3 '\QDPLF7LPH:DUSLQJ««................................................. 4 1.3Introduction to Microcontroller........................................................... 4 1.4 Literature Survey.................................................................................... 5 1.5 Aim of the work .................................................................................... 8 1.6 Thesis Layout......................................................................................... 8 Chapter Two: Speaker Recognition and Microcontroller 2.1 Introduction.......................................................................................... 10 2.2Features Extraction and Word Recognition.................................... 10 2.2.1 Classification of Speech recognition........................................... 11 2.2.2 Model of Speech Production....................................................... 11 2.2.3 Preprocessing.............................................................................. 13 2.2.3.1 Analog Anti-Aliasing Filter............................................. 14 2.2.3.2 Sampling.............................................................................. 14 2.2.3.3 Analog to Digital Converter............................................. 14 2.2.3.4 Remove Mean (DC Level)................................................... 15 2.2.3.5 Pre-Emphasis........................................................................ 15 2.2.3.6 End-Points Detection......................................................... 16

 /// 

2.2.4 Feature Extraction....................................................................... 19 2.2.4.1 Cepastral Analysis............................................................... 20 2.2.4.2 Mel-Frequency Cepstral Coefficient ««..................... 22 2.2.5 Features Matching using DTW................................................. 26 2.3 Microcontroller Interfaces with Computer..................................... 30 2.3.1 The Microcontroller««««.................................................. 30 Chapter Three: Proposed System Design 3.1 Introduction.......................................................................................... 33 3.2 Speech recognition procedure............................................................ 34 3.2.1Training step«......................................................................... 35 3.2.2 Recognition step.......................................................................... 38 3.3 Manual control..................................................................................... 40 3.4 8051-Ready Additional Board.......................................................... 41 3.5 Interface between laptop and Board ................................................ 43 3.6 Driver circuit««««...................................................................... 44 Chapter Four: Implementation and simulation Results 4.1 Introduction......................................................................................... 49 4.2 Speech recognition results................................................................. 49 4.2.1 Pre-processing results................................................................ 49 4.2.2 Feature extraction results......................................................... 55 4.2.3 Feature matching results........................................................... 73 4.2.4 Recognition Results....................................................................78 4.3 Microcontroller part............................................................................. 80 4.3.1 Simulation and test of UART.................................................... 81 4.3.2 Loaded program to Microcontroller.......................................... 83

 /s 

Chapter Five: Conclusions and Suggestions for Future Work 5.1 Conclusions.......................................................................................... 84 5.2 Suggestions for Future Work............................................................... 85 References................................................................................................ 86 Appendix A ................................................................................................ A

V List of Abbreviations A/D

Analog to Digital Converter

AC

Alternating Current

ASIC

Application Specific Integrated Circuit

ASR

Automatic Speech Recognition

dB

decibel

DC

Direct Current

DCT

Discrete Cosine Transform

DIP

dual in-line package

DFT

Discrete Fourier Transform

DSK

DSP Starter Kit

DSP

Digital Signal Processing

DTW

Dynamic Time Warping

EPD

End Point Detection

FFT

Fast Fourier Transform

FIR

Finite Impulse Response

HMM

Hidden Markov Model

I/O

Input / Output

ITL

Lower Energy Threshold

ITU

Upper Energy Threshold

IZCT

Zero-Crossing Rate Threshold

LDB

Local Discriminate Bases

LPC

Linear Predictive Cepstral

MFCC

Mel Frequency Cepstral Coefficient

NNT

Neural Network

PLCC

Plastic Leaded Chip Carrier

VI ROM

Read-Only Memory

RAM

Random Access Memory

SPI

Serial Programming Interface

STE

Short Term Energy

STZC

Short Term Zero Crossing

UART

Universal Asynchronous Receiver/Transmitter

USART USB

Universal Synchronous/Asynchronous Receiver/Transmitter Universal Serial Bus

VQ

Vector Quantization

ZCR

Zero Crossing Rate

VII

List of Symbols

C

The cepstral

d

Local distance

D

cumulative distance

e(n)

Excitation sequence

E

energy

fs

Sampling frequency

ࡲ࢓ࢋ࢒

Mel-frequency scale

h(n)

Vocal tract function

H(f)

Vocal tract spectral shaping

MSF

Magnitude sum function

N(f)

unvoiced excitation

P(f)

voiced excitation

R(f)

Lips emission

S

The spectrum of speech signal

u

Noise generator

v

Pulse generator

ǻ(

First derivative of energy

ǻ0)&&

First derivative of MFCC

ǻǻ(

Second derivative of energy

ǻǻ0)&&

Second derivative of MFCC

W

Warping path

1

Chapter One Introduction 1.1 Overview Wheelchair users represent a large proportion around the world due to the remnants of war, in addition to traffic accidents and illness and other reasons. The technology developed in the last few years for assistive technology will significantly impact and improve the life of wheelchair users. A wheelchair is a device for mobility for a disabled people in which the user sits. The device is controlled either manually by pushing the wheels with the hands or via various automatic systems. Wheelchairs are used by people for whom walking is difficult or impossible due to illness, injury or disability. The electric wheelchairs have become more efficient, quieter and more features are added to them to make the users free, more comfortable and less dependent on others, unlike the conventional manual wheelchair, the increase of these features increases the cost of the wheelchair. This work is an addition to reduce the cost, added important feature that is speech recognition because many of severely disabled patients having difficulties or impossibility to use the electric wheelchair, this feature represent an easy way to control the movement of wheelchairs.

1.2 Automatic Speech Recognition Human beings usually communicate with each other by voice. The development in electronics makes human beings tend to use the voice to command robots especially wheelchair to facilitate the lives of persons

2 with disabilities who suffer from spasms and paralysis of extremities and cannot or it is difficult for them to use joystick. There are several proposed approaches to automatic speech recognition by machine. Generally the three approaches to speech recognition are as follows [1]: 1. Pattern recognition approach 2. Acoustic-phonetic approach 3. The artificial intelligence approach The work focuses on the pattern recognition approach to speech recognition; basically speech patterns are used directly without explicit feature determination and segmentation. This method has two steps: training of speech patterns, and recognition of patterns via pattern compression. Speech data base is brought into the system via the training procedure. The concept is that if enough versions of a pattern to be recognized (i.e. a word) are included in a training set provided to the algorithm. The training procedure should be able to adequately characterize the acoustic properties of the pattern with no regard for or knowledge of any other pattern presented to the training procedure. The pattern recognition approach is chosen for three reasons [2]: 1. Simplicity of use. This method is easy to understand, it is rich in mathematical and communication theory. 2. Robustness and invariance to different speech vocabularies, users, feature sets, pattern comparison algorithms and decision rules. 3. Proved a high performance. It will be shown that the pattern recognition approach to speech recognition consistently provides high performance on any task that is reasonable for the technology and provides a clear path for extending the technology in a wide range of directions such that the performance degrades gracefully as the problem becomes more and more difficult.

3

1.2.1 Preprocessing Once the speech signal is captured and retained in the form of digital data, it needs to be processed prior to being used for the speech recognition process [3]. The basic idea behind speech preprocessing is to generate a signal with a fine structure as close as possible to that of the original speech signal. This produces a data reduction facility with easier task analysis [4]. A number of processing techniques adopted in this work like Pre-emphasis, the removal of DC offset and end point detection as we can see in the next chapters.

1.2.2 Feature Extraction The general methodology of audio classification involves extracting discriminatory features from the audio data and feeding them to a pattern classifier. Different approaches and various kinds of audio features were proposed with varying success rates. The features can be extracted either directly from the time domain signal or from a transformation domain depending upon the choice of the signal analysis approach. Some of the audio features that have been successfully used for audio classification include Mel-frequency cepstral coefficients (MFCC), linear predictive coding (LPC), and Local discriminate bases (LDB). Few techniques generate a pattern from the features and use it for classification by the degree of correlation. Few other techniques use the numerical values of the features coupled to statistical classification method [5]. The most popular features are the Mel-Frequency Cepstral Coefficients (MFCC). There are also several approaches to computing the MFCCs. The first approach is based on the usage of spectral estimation, Mel-scale warping and cepstral computation This method utilizes the Discrete Fourier Transform (DFT) and carrying significant information

4 about the structure of the signal [6]. This method was chosen to deal with speech features extraction in this work as can be seen with more details in the next chapters.

1.2.3 Dynamic Time Warping Dynamic time warping (DTW) is a much more robust distance measure for time series [7]; it is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video, the person was walking slowly and in another, that person was walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics indeed, any data which can be turned into a linear representation can be analyzed with DTW. A well known application has been automatic speech recognition, to cope with different speaking speeds [8]. The Dynamic Time Warping algorithm (DTW) is a well-known algorithm in many areas. While first introduced in 60s and extensively explored in 70s by the application to the speech recognition, it is currently used in many areas: handwriting and online signature matching, sign language recognition and gestures recognition, data mining and time series clustering (time series databases search), computer vision and computer animation, surveillance, protein sequence alignment and chemical engineering, music and signal processing [9]. In this thesis Dynamic Time Warping was used to match features that extract by using Mel-Frequency Cepstral Coefficients (MFCC).

1.3Introduction to Microcontroller Even at a time when Intel presented the first microprocessor with the 4004 there was already a demand for microcontrollers: The

5 contemporary TMS1802 from Texas Instruments, designed for usage in calculators, was by the end of 1971 advertised for applications in cash registers, watches and measuring instruments. The TMS 1000, which was introduced in 1974, already included RAM, ROM, and I/O on-chip and can be seen as one of the first microcontrollers, even though it was called a microcomputer. The first controllers to gain really widespread use were the Intel (8048), which was integrated into PC keyboards, and its successor, the Intel (8051), as well as the (68HCxx) series of microcontrollers from Motorola. Today, microcontroller production counts are billions per year, and the controllers are integrated into many appliances [10]: ‡+RXVHKROGDSSOLDQFHV PLFURZDYHZDVKLQJPDFKLQHFRIIHHPDFKLQH ‡7HOHFRPPXQLFDWLRQ PRELOHSKRQHV ‡$XWRPRWLYHLQGXVWU\ IXHOLQMHFWLRQ$%6 ‡$HUospace industry ‡,QGXVWULDOautomation . . . etc. The difference between microcontroller and microprocessor is that microcontroller has a CPU (a microprocessor) in addition to a fixed amount of RAM, ROM, I/O ports, and a timer all on a single chip. In other words, the processor, RAM, ROM, I/O ports, and timer are all embedded together on one chip, while microprocessors contain no RAM, no ROM, and no I/O ports on the chip itself [11]. ` 1.4Literature Survey Researches in the area of the wheelchair control system are still going on, beside the development researches of Automatic Speech Recognition (ASR) to provide an easy way for disable people to control the wheelchair. Researchers pursue their studies to the development of that

6 field in both hardware and software implementation. Below their relevant works are briefly described: x

K.S. Ananthakrishnan, and R. Andrew, 2002[12] presents speaker

adaptable voice controlled model-vehicle. The algorithm analyses the speech by detecting the beginning and ending of the spoken command by calculating Zero-Crossing Rate and Energy Threshold. For the selected command word, the Mel-frequency Cepstral coefficients (MFCC) parameters are computed and are compared to WKH 0)&&¶V RI DOO WKH commands of the reference speaker stored in the library. The commanding ZRUG ZKLFK JLYHV WKH PLQLPXP GLVWDQFH LV FKRVHQ DV WKH µULJKW¶ ZRUG Typical commands such as Forward, Reverse, Left and Right have been used in the experiment. x

Z. Abd Ghani, 2007 [13] introduced a system of

a wireless

wheelchair control system which employs a voice recognition using voice recognition processor (HM2007) for triggering and controlling its movements. The wheelchair is also equipped with two infrared sensors which mounted in front and rear of the wheelchair to detect obstacles for collision avoidance function. It utilizes a PIC controller to control the system operations. It communicates with the voice recognition processor to detect the spoken word and then determines the corresponding output command to drive the left and right motors. x

S. Ke, et al., 2008 [14], introduced an embedded speech recognition

system using hardware platform FPGA, also implement the HMM model left-to-right structure to model the words on IP core. For speech feature extraction, the system adopts the LPCC algorithm and obtains 16 LPCC coefficients. Vector quantization was chosen for speech data compression, which can reduce the storage space. x

H. Nik, 2009 [15] implements speech recognition control wheelchair

use two diJLWDO VLJQDO SURFHVVRUV IURP 0LFURFKLSŒ GV3,&)$ 

7 mounted on a custom designed printed circuit board to perform smooth humming control and speech recognition. One DSP is dedicated to speech recognition and implements Hidden Markov Models using dsPIC30F speech recognition library developed by Microchip; the other DSP implements Fast Fourier Transforms on humming signals. x

M. Qadri and S. Ahmed, 2009[16] implemented voice activated

wheelchair through speech processing using Digital Signal Processor (DSP). The Texas Instruments TMS320C6711 DSP Starter Kit (DSK) is connected with the wheelchair for processing of the voice signal. The DSK calculates the energy, zero crossing and the standard deviation of the spoken word. It also generates different desired analog signals according to the spoken words which further amplified and converted into digital. These digital signals are used to operate the stepper motor. Five words are recognized which are forward, reverse, left, right and stop. x

S. Jothilakshmi, V. Ramalingam, and S. Palanivel, 2009[17]

proposed a method for improving the speaker segmentation performance by fusing the residual phase and MFCC features. This method is evaluated using television broadcast interviews and NIST 2004 database. The support vector machines are used to detect the speaker change. The system reports a performance of 85.97%. The proposed system can be extended to detect the speaker changes in the speech conversations containing more than two speakers x

L. Muda, M. Begam, and I. Elamvazuthi, 2010[18] present Voice

Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques, The technique was able to authenticate the particular speaker based on the individual information that was included in the voice signal. x

A. AL-Thahab, 2011[19] proposed a technique called Multiregdilet

transform was used for isolated word recognition. Finally using the outputs

8 of the neural network (NNT) to control the wheelchair through computer note books and special interface hardware. The rate of recognition command "GO" is 90%, and 100% for other commands.

1.5Aim of the work The objective of this research is to design and implement wheelchair control system based on speech recognition to recognize user speech instructions whether a single word or compound words (i.e. two words). This should be achieved by using MFCC technique as feature extraction of a speech signal and DTW for feature classification. The framework of speech recognition part will be implemented on computer with the aid of MATLAB (R2012a), whereas direction control part will be implemented by using an Atmel (AT89S8253) microcontroller chip on MikroElektronika 8051-Ready HWREV.1.10 additional board. The software and utility programs are used for programming are MikroC Pro for 8051 Version (2.2) and 8051Flash programmer Version (2.09a).

1.6Thesis Layout In addition to the previous subjects covered in this chapter, the other chapters are arranged as follows: x

Chapter Two includes overview of speech recognition, classification

of speech recognitions according to speaker dependence, feature extraction and focused on using MFCC technique and use Discrete Cosine Transform (DCT) to reduce the dimensionality of speech feature vectors, use the DTW for features matching, overview of microcontroller, the UART, and microcontroller software. x

Chapter Three introduces the proposed system for wheelchair

control, speech recognition algorithm steps: training, testing and recognition step. Also, it introduces the electric manual control, proposed

9 8051-Ready additional board and the driver circuit with its specification to implement this work. x

Chapter Four introduces practical and simulation results for speech

recognition using MATLAB and PRAAT program, Microcontroller simulation using Proteus program and MikroC tools, and implementation of control part using software MikroC Pro and flash programs. x

Chapter Five gives some conclusions obtained from the results of the

proposed work, with several suggestions for future relevant work.

10

Chapter Two Speaker Recognition and Microcontroller 2.1 Introduction A control system is a device or set of devices to manage, command, direct or regulate the behavior of other devices or systems [20]. The control system of wheelchair consists of speech recognition of the instruction word and speaker identification (user) to triggering and controlling movement of wheelchair via microcontroller. This chapter introduces the methods of features extraction and features matching for speech recognition, and microcontroller and interface with computer.

2.2 Features Extraction and Word Recognition Speech is the most natural way to communicate for humans, and this is true since the dawn of civilization, the invention and widespread use of the Telephone, audio-phonic storage media, radio, and television had given even

further

importance

to

speech

communication

and

speech

processing .The advances in digital signal processing technology has led to the use of speech processing in many different application areas like speech compressionenhancement, synthesis, and recognition [3]. Nowadays, human beings use speech to communicate with machines to perform specific actions by voice command to control. That leads to man-made machines to understand the spoken words and respond to them by recognizing the speech [1]. Speech recognition in this work is for Arabic language and any other language according to the training of the user to the system and the

11 identity of the user or many users of the wheelchair. Speech recognition as shown in figure (2.1) contains three steps:1.

Preprocessing.

2.

Feature Extraction.

3.

Feature Matching using DTW.

2.2.1 Classification of Speech recognitions. Speech recognition systems are divided into two categories according to dependence on speaker; ‡Speaker Dependent ‡Speaker Independent For the first kind, reference patterns are constructed for a single speaker. In order for the system to recognize the speech of different speakers, the reference patterns must be updated for new speakers. In the second kind, the system recognizes a word for any speakers [3]. The speaker dependent is used to control movement of wheelchair for disabled people according to the voiced instructions of the user of wheelchair only, to prevent unauthorized use, and this advantage added to the system advantages.

2.2.2 Model of Speech Production Speech production model shown in figure (2.2) contains two parts: x Excitation part x Spectral shape part.

12

Figure 2.2: model of speech production [21]

Excitation part can be separated into voiced excitation (by a pulse generator (v) which generates a pulse train) given by P(f) and unvoiced excitation ( modeled by a white noise generator (u)) given by N(f). The output of both generators is then added and fed into the box modeling the vocal tract and performing the spectral shaping with the transmission function H(f). The emission characteristics of the lips is modeled by R(f). Hence, the spectrum S(f) of the speech signal is given as[21]:

S(f) = (v · P(f) + u · N(f)) · H(f) · R(f) = X(f) · H(f) · R(f)

(2.1)

To influence the speech sound, we have the following parameters in our speech Production model: ‡ The mixture between voiced and unvoiced excitation (determined by v and u)

13 ‡The fundamental frequency (determined by P(f)) ‡The spectral shaping (determined by H(f)) ‡The signal amplitude (depending on v and u) These are the technical parameters describing a speech signal. The spectral shape is determined in short intervals of time, e.g., every 10 ms. In speech recognition the most important information is spectral shape, in cepstrum can easily identify and remove the glottal excitation and that what can seen after pre-processing step in feature extraction by using MFCC.

2.2.3 Preprocessing Speech signal must be transformed into discrete and prepare it for feature extraction. The popular processes of transforming speech signal and being accommodated to the next stage of feature extraction are shown in Figure (2.3) [1].

Figure 2.3: Pre-processing to the speech signal [1]

14

`2.2.3.1 Analog Anti-Aliasing Filter: The analog signal (output of the microphone) should pass through an anti-aliasing filter which is really a low frequency, that is less than the folding frequency (maximum frequency), defined as half of the sampling ݂‫ݏ‬ frequency ( ൗʹ), in other words, to restrict the bandwidth of a signal to satisfy approximately the sampling theorem [1]. Once the anti-aliasing filter is done, the new analog signal has lost most of its high frequency components [6].

2.2.3.2 Sampling: The continuous speech data which was recorded by microphone must be transformed into discrete data such that it can be process by computer. All values which were recorded at any fixed time can describe the wave of speech. The sampling process is an important reason for the loss of speech data. Higher sampling frequency will get less loss but has more data to process while lower sampling frequency will get more loss but has less sampled data. According to sampling theorem: sampling frequency cannot smaller than 2 times of bandwidth of the signal [22]; we use 16kHz to be sampling frequency.

2.2.3.3 Analog to Digital Converter: The main purposes of analog to digital (A/D) converter, is to Quantize (digital representation of samples) each discrete sample x (n),n=0, «.N-1 into a specific number [1].

15

2.2.3.4 Remove Mean (DC Level): DC level or DC offset occurs when hardware, such as a sound card, adds DC current to a recorded audio signal. This current produces a recorded waveform that is not centered on the baseline. Therefore, removing the DC offset is the process of forcing the input signal mean to the baseline by adding a constant value to the samples in the sound file [4].

2.2.3.5 Pre-Emphasis: This step processes the passing of the signal through a filter which emphasizes higher frequencies. This process will increase the energy of the signal at higher frequency [18]. The most commonly used filter for this step is the FIR filter described below [23]: ܻሾ݊ሿ ൌ ܺሾ݊ሿ െ ͲǤͻͷܺሾ݊ െ ͳሿ

(2.2)

The filter response of this FIR filter can be seen in Figure (2.4) and Figure (2.5).

Figure 2.4: Pre-emphasis filter [23]

16

Figure 2.5: Preemphasis for the Arabic word "ϡΎϣ΍"

2.2.3.6 End-Points Detection: The purpose of the voicing detector is to classify a given frame as voiced or unvoiced. In many instances, voiced/unvoiced classification can easily be accomplished by observing the waveform: a frame with clear periodicity is designated as voiced, and a frame with noise-like appearance is labeled as unvoiced [24]. Two methods were used to detect End Points: Short-Term Energy (STE) and Short-Term Zero Crossing (STZC): The amplitude of the speech signal varies appreciably with time. In particular, the amplitude of unvoiced segments is generally much lower than the amplitude of voice segments. The short-time energy of the speech signal provides a convenient representation that reflects these amplitude variations [25]. 67( LV WKH PRVW REYLRXV DQG VLPSOH LQGLFDWRU RI µµYRLFHGQHVV´ Typically, voiced sounds are several orders of magnitude higher in energy than unvoiced signals. For the frame (of length N) ending at instant m, the energy is given by the following Eqs. [24]:

17

ʹ ‫ܧ‬ሺ݉ሻ ൌ σ݉ ݊ൌ݉ െܰ൅ͳ ܵ ሾ݊ሿ

(2.3)

For simplicity, the Magnitude Sum Function (MSF) is defined by: ‫ܨܵܯ‬ሺ݉ሻ ൌ σ݉ ݊ ൌ݉ െܰ൅ͳȁܵሾ݊ሿȁ

(2.4)

The value of N is chosen to meet the frame time length to be (1040ms), since at this time, the speech signal is considered unchanged or its statistical properties are relatively constant [26]. Since voiced speech has energy concentrated in the low-frequency region, due to the relatively low value of the pitch frequency, better discrimination can be obtained by low pass filtering the speech signal prior to energy calculation. That is, only energy of low-frequency components are taken into account [24]. Prior to calculation of STZC process, any DC offset voltage must be removed from the speech signal. This process counts how many times the signal crosses the time axis during the frame. The rate at which zero-crossings occurs is a simple measure of the frequency content of a narrowband signal [25]. This algorithm was first described by [27], as illustrated in a flow process in Figure (2.6). The zero crossing rate of the frame ending at time instant m is defined by the following equation [24]: ͳ

ܼ‫ܥ‬ሾ݉ሿ ൌ σ݉ ݊ ൌ݉ െܰ൅ͳȁ•‰ሺ•ሾሿሻ െ •‰ሺ•ሾ െ ͳሿሻȁ ʹ

(2.5)

The ‫݊݃ݏ‬ሺǤ ሻ function returns ±1 depending on the sign of the operand. Eq. (2.5) computes the zero crossing rates by checking the samples in pairs to determine where the zero crossings occur.

18 Note that a zero crossing is said to occur if successive samples have different signs [24].

Figure 2.6: Flow process of the end-points algorithm [27]

The zero-crossing rate is a useful parameter for estimating whether speech is voiced or unvoiced. Voiced speech has most of its

19 energy collected in the lower frequencies, whereas the most energy of the unvoiced speech is found in the higher frequencies [25]. Since high frequencies imply high zero-crossing rates and low frequencies imply low zero-crossing rates, high and low zero-crossing rate corresponds to unvoiced and voiced speech, respectively [26]. The end point detection example shown in figure (2.7).

Figure (2.7): word "ϡΎϣ΍" and end point detection for it

2.2.4 Feature Extraction Feature extraction of speech is one of the most important issues in the field of speech recognition and representative of the speech [28]. speech is a dynamic and non-stationary process, as the amplitude of the speech waveform varies with time due to variations in the vocal tract and articulators. However, speech analysis usually presumes that the statistical properties of the non-stationary speech process changes relatively slowly over time. Although this assumption is not strictly valid, it makes it possible to process short-time speech frames, ranging typically from 10 ms

20 to 40 ms, as a stationary process. Generally speaking, the use of short frame duration and overlapping frames are chosen to capture the rapid dynamics of the spectrum [26]. There are many methods for speech signal feature extraction, one of the most widely used is Mel Frequency Cepstral Coefficients (MFCC).This work adopted this technique to extract the observation vectors of the utterance words, since, in recent studies of speech recognition system, MFCC parameters perform better than others in the recognition accuracy [1].

2.2.4.1 Cepastral Analysis A speech signal is produced by filtering an excitation waveform through the vocal tract filter as show in Figure (2.8).

Figure (2.8): Linear Acoustic Model of Human SpeechProduction [26] In this model, the speech signal is expressed as the convolution of an excitation signal e(n) with the vocal tract response h(n) . Homomorphic signal processing offers a fairly simple method, known as cepstral deconvolution, to decouple the vocal tract response from

21 the excitation response, thereby enabling it to improve modeling the vocal tract characteristics. The decomposition of a speech signal s(n) into the excitation sequence e(n) and the vocal tract function h(n) can be described as follows [26]: s(n) = e(n)ٔh(n)

(2.6)

:KHUHWKHRSHUDWRU³ٔ ³UHSUHVHQWVWKHFRQYROXWLRQRSHUDWLRQ5HFDOOWKDW the convolution operation in time corresponds to a multiplication in the frequency domain Thus, Eq. (2.6) becomes [26]: S( f ) = E( f ) ‫ڄ‬H( f )

(2.7)

Note that the complex speech spectrum S( f ) is composed of a quickly varying part, excitation spectrum E( f ) (which corresponds to high frequency components) and a slowly-varying part, vocal tract response H( f ) (which corresponds to low frequency components). Considering that the speech signal is real-valued, the logarithm of Eq. (2.7) on both sides leads to [26]: log( S( f ) ) = log( E( f ) ‫ڄ‬H( f ) ) = log( E( f ) ) + log( H( f ) ) (2.8)

Now that the signal components in Eq. (2.8) are linearly combined, a linear filter (also known as liftering operation in speech engineering terminology) can be applied to remove the noise-like, quicklyvarying excitation part from the speech spectrum. Then, the inverse Fourier transform is applied to the remaining component to compute the real cepstrum. In short, under a cepstral transformation, the non-linear convolution of two signals e(n)ٔh(n) becomes equivalent to the linear sum of the cepstral

22 representations of the signals, ‫ ݁ܥ‬ሺ‫ݏ‬ሻ ൅ ‫ ݄ܥ‬ሺ‫ݏ‬ሻ .As a result, the real cepstrum is the inverse Fourier transform of the logarithm of the power spectrum of a speech signal [26]: ͳ

‫ ݏܥ‬ሺ݊ሻ ൌ σܰെͳ ݇ൌͲ Ž‘‰ȁܵሺ݇ሻȁ ݁ ܰ

݆ ʹߨ݇݊ ܰ

, IRUQ 1í 9)

Figure (2.9) shows a block diagram representation of the shortterm real cepstrum computation.

Figure (2.9) Real Cepstrum Computation [26]

2.2.4.2 Mel-Frequency Cepstral Coefficient The MFCCs are based on the known variation of the human HDU¶V critical frequency bandwidths. This is presented in the Mel-frequency scale, which is a linear frequency space below 1000 Hz and a logarithmic space above 1000 Hz [29]. A popular relation between ˆሺ ሻand mel-frequency scale ‫ ݈݁݉ܨ‬is as below [30]: ‫ ݈݁݉ܨ‬ൌ ʹͷͻͷ ‫ Ͳͳ‰‘Ž כ‬ሺͳ ൅

ˆሺ  ሻ ͹ͲͲ



(2.10)

23 MFCC provides a baseline acoustic feature set of speech and ASR applications, computed MFCC parameters for monosyllabic word recognition as follows[28]: ͳ

ߨ

‫ ݊ܥܥܨܯ‬ൌ σͶͲ ݇ൌͳ ܺ݇ …‘• ቂ݊ ቀ݇ െ ʹ ቁ ͶͲ ቃ, for n=0,1,2,...,L ,

(2.11)

Where L is the number of MFCC coefficients and  , k =1,2,...,40, represents the log energy output of the kth filter. Mel-Frequency Cepstral coefficients with single energy and their dynamic derivatives were used for feature extraction from the ear-microphone data during this study. Figure (2.10) shows the block diagram for the MFCC feature extraction step by step.

Figure (2.10): Block Diagram for the MFCC Feature Extraction.

The details of the block diagram are described below:

24 ‡ Framing and Windowing: The next thing to do with speech signal after pre-processing is to divide it into speech frames and apply a window to each frame, Each frame is K samples long, with adjacent frames being separated by P samples, see Figure (2.11) [23]. A commonly used window is the Hamming window [23]. It is calculated as:

‫ݓ‬ሺ݇ሻ ൌ ͲǤͷͶ െ ͲǤͶ͸ …‘•ሺ

ʹߨ ‫ܭ‬െͳ



(2.12)

Figure (2.11): Frame blocking of a sequence x1(n) [23] ‡ )DVW )RXULHU 7UDQVIRUP ))7  WKH )DVW )RXULHU 7UDQVIRUP LV D IDVW implementation of the Discrete Fourier Transform (DFT) which converts N-samples of frames into the frequency spectrum. ‡ Mel Scaled Filter banks: The Mel-scale filter bank implementation used in this study includes 40 triangular filters non-uniformly spaced along the frequency axis [26], as shown in Figure (2.12).

25 ‡Signal Energy: Furthermore, the signal energy is added to the set of parameters. It can simply be computed from the speech samples s(n) within the time window by [21]: ʹ ݁ሺ݊ሻ ൌ σܰെͳ ݊ ൌͲ ‫ ݏ‬ሺ݊ሻ

(2.13)

Figure (2.12): Mel-scale Filter Band [31] ‡Discrete Cosine Transform (DCT): The cepstrum is defined as the inverse Fourier transform of the log magnitude of Fourier transform of the signal. Since the log Mel filter bank coefficients are real and symmetric, the inverse Fourier transform operation can be replaced by DCT to generate the cepstral coefficients. This step is crucial in speech recognition as it can separate the vocal tract shape function from the excitation signal of the speech production model. The lower order cepstral coefficients represent the smooth spectral shape or vocal tract shape, while the higher order

26 coefficients represent the excitation information [32]. The cepstral coefficients are the DCT of the M filter outputs obtained from [28]: ͳ

ߨ

‫ܥ‬ሺ݊ሻ ൌ σ‫ܯ‬െͳ ݇ൌͲ ܺ݇ …‘• ቂ݊ ቀ݇ െ ʹ ቁ ‫ ܯ‬ቃ, n= 1,..., L,

(2.14)

Where L is the number of MFCC coefficients and  , k =0,1,2,...,M-1, represents the log energy output of the kth filter. ‡Dynamic Parameters: The voice signal and the frames changes, such as the slope of a formant at its transitions. Therefore, there is a need to add features related to the change in cepstral features over time. 13 delta or velocity features (12 cepstral features plus energy), and 39 features a double delta or acceleration feature are added. Each of the 13 delta features represents the change between frames in the equation (2.15) corresponding cepstral or energy feature, while each of the 39 double delta features represents the change between frames in the corresponding delta features [18].

݀ሺ‫ݐ‬ሻ ൌ

ܿሺ‫ݐ‬൅ͳሻെܿሺ‫ݐ‬െͳሻ ʹ

ሺʹǤͳͷሻ

 

2.2.5 Features Matching using DTW Dynamic time warping (DTW) is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video, the person was walking slowly and in another, that person was walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics indeed, any data which can be turned into a linear representation can be analyzed with DTW. A well known application has

27 been automatic speech recognition, to cope with different speaking speeds. In general, DTW is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions. The sequences are "warped" nonlinearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension [8]. The DTW allows a nonlinear warping alignment of one signal to another by minimizing the distance between the two as shown in Figure (2.13) [18].

Figure (2.13): A Warping between two time series [18]

This warping between two signals can be used to determine the similarity between them and thus it is very useful for feature recognition. DTW is a pattern matching algorithm with a non-linear time optimization effect based on Bellman's principle of optimality, which states that given an

28 optimal path from A to B and a point C lying somewhere along this path, the path segments AC and CB are optimal paths from A to C and C to B respectively [32]. For an utterance of a word Y which is m vectors long, we will get a sequence of vectors X= {x1, x2. . . n}from the acoustic preprocessing stage. What needed here, is a way to compute a "distance´ EHWZHHQ WKLV unknown sequence of vectors X and known sequences of vectors Y = {y1, y2. . . m} which are prototypes for the words we want to recognize. The main problem is to find the optimal assignment between the individual vectors of unequal sequence X and Y. Figure (2.14) shows two sequences X and Y which consist of unequal vectors. Each path through this grid (as the path shown in the figure) represents one possible assignment of the vector pairs. For example, the first vector of X is assigned the first vector of Y, the second vector of X is assigned to the second vector of Y, and so on. Figure (2.14) shows as an example the following path W given by the sequence of time index pairs of the vector sequences (or the grid point indices, respectively): : ^ZZ««ZN` ^             Q,m)} The length k of path W is determined by the maximum of the number of vectors contained in X and Y [21]. From that example we found the DTW objective is to find the warping path W = {w1, w2, w3, . . ., wK} of contiguous elements on distMatrix with (max(n, m)) < K < m + n -1, and wk= distMatrix(i, j)), such that it minimizes the following function [33]: ‫ܹܶܦ‬ሺܺǡ ܻሻ ൌ ‹༌ቄඥσ‫݇ܭ‬ൌͳ ‫݇ݓ‬

(2.17)

29

Figure (2.14): Possible assignment between the vector pairs of X and Y.

The warping path is subject to several constraints, see Figure (2.15) .Given wk = (i, j) and wk-1 = (L¶M¶ ZLWKi, L¶”Qand MM¶”m [33]: 1. Boundary conditions. w1 = (1,1) and wK = (n, m). 2. Continuity. i ± L¶”DQGM± M¶” 3. Monotonicity. i ± L¶•DQGM± M¶• . This path can be found by using dynamic programming to evaluate the following Recurrence, which defines the cumulative distance D(i, j) as the distance d(i, j) found in the current cell and the minimum of the cumulative distances of the adjacent elements [7]: D(i, j) = d(Xi, Y j ) + min {D(i í 1, j í 1), D(i í 1, j), D(i, j í 1)} (2.18)

30 The Euclidean distance between two sequences can be seen as a special case of DTW where the kth element of W is constrained such that wk=(i,j)k ,i= j = k. Note that it is only defined in the special case where the two sequences have the same length [7].

Figure (2.15): Local path alternatives for a grid point

2.3 Microcontroller interfaces with Computer This part recognizes the instructions that come from PC by serial port according to speech recognition or manually using keyboard. The brain of this part is the Microcontroller (AT89S8253) which controls the direction of motors of the wheelchair.

2.3.1 The Microcontroller A Microcontroller has a CPU (a Microprocessor) in addition to a fixed amount of RAM, ROM, I/O ports, and a timer all on a single chip. In

31 other words, the processor, RAM, ROM, I/O ports, and timer are all embedded together on one chip; therefore, the designer cannot add any external memory, I/O, or timer to it. The fixed amount of on-chip ROM, RAM, and number of I/O ports in microcontrollers makes them ideal for many applications in which cost and space are critical [11]. Microcontrollers are useful to the extent that they communicate with other devices, such as sensors, motors, switches, keypads, displays, memory and even other microcontrollers. Many interface methods have been developed over the years to solve the complex problem of balancing circuit design criteria such as features, cost, size, weight, power consumption, reliability, availability, and manufacturability. Despite its relatively old age, the 8051 is one of the most popular microcontrollers that are in use today. Many derivative microcontrollers have been developed that are based on and compatible with the 8051. Thus, the ability to program an 8051 is an important skill for anyone who plans to develop products that will take advantage of microcontrollers. The original 8051 core is an accumulator-based design with 255 instructions. Each basic instruction cycle takes 12 clocks. The CPU has four banks of eight 8-bit registers in on-chip ram for context switching; these registers reside within the 8051's lower 128 bytes of ram along with a bit-operation area and scratchpad RAM [34]. The latest versions are by far more advanced than the original RQH0DQ\RIWKHPKDVWKHODEHO³FRPSDWLEOH´³FRPSOLDQW´ or ³ IDPLO\´ LQ RUGHU WR HPSKDVL]H WKHLU ³QREOH KHULWDJH´ 7KHVH WDJV imply that microcontrollers have similar architecture and are programmed in a similar way using the same instruction set. Practically, if you know how to handle one microcontroller belonging to this family, you will be able to handle any of them.

32 One of Microcontrollers is called the AT89S8253, manufactured by Atmel, which is utilized in this work because it is widely used, its cheapness and it uses flash memory for storing programs. The last feature mentioned makes it ideal for experimentation due to the fact that program can be loaded and erased from it for many times. Besides, thanks to the built-in SPI System (Serial Programming Interface), the program can be loaded to the microcontroller even after embedding the chip in the target device [35], more details about AT89S8253 Microcontroller and Software tools in the appendix A.

33

Chapter Three Proposed System Design 3.1 Introduction The proposed wheelchair looks like a conventional mechanical wheelchair, but components were added to it, that are cheap comparative with the cost of

an electric wheelchair that even does not have the

technology of speech recognition. Proposed wheelchair as shown in Figure (3.1) provides both speech recognition of user instructions and manual control.

Figure (3.1): Proposed wheelchair The wheelchair direction movement control system consists of speech recognition part, which is represented in MATLAB and installed on a laptop or notebook to programmed speech recognition algorithm with manual control (keyboard work as joystick) on it. Microcontroller board, which consists of microcontroller and an interface between laptop and

34 microcontroller by using USB to serial converter on board, and finally interface between microcontroller and driver circuits of two motors, which consist of H-bridge relays. General proposed system diagram shown in Figure (3.2):

Figure (3.2): Wheelchair control system

3.2 Speech recognition procedure The speech recognition system was implemented in MATLAB (2012a) of Math Works, where the system was actually developed and tested. MATLAB is a high-performing language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation [3]. Implementation of the system consists of two steps training and testing (recognition).

35

3.2.1 Training step The proposed system is speaker-dependent therefore training system for the user must be done; to learn system the instruction patterns for directional movement to build the data base of wheelchair for one user (or many users) see Figure (3.2). The record word is picked up using Headphone-Mic to reduce the ambient noise, because the microphone is close to the user's mouth, and it is more comfortable to the user's movement than desktop microphone or conventional microphone. The user listens to the instruction immediately after being recorded through headphone to make the user sure that the instruction is recorded correctly and no error in record, if there is something wrong happened the user retry recording the instruction again. As in the preceding chapters, the feature extraction of the speech signal for predetermined words includes pre-processing and filtering according to the Mel-frequency filter bank and then finding these features. In this work, the time in which the uttered words are allowed, was chosen to be 2 sec., which allows recording 16000 samples at sample rate equal to 16 KHz, mono wave file format. After pre-emphasized speech signal, the important problem in speech processing is to detect the presence of speech in a background of noise. In other words, how to determine the start and end of the speech signal and isolate the parts that do not contain speech, which helps to minimize the number of samples to be processed later. The following popular algorithm depends on the STE and STZC measurement, to determine the speech boundaries. This algorithm was adopted in this work and modified to meet proposed requirements.

36

Figure (3.2): Flowchart of Speech recognition system training

37 After pre-emphases and removing any DC offset in the signal the proposed program compute the magnitude and zero-crossing rate of the signal compute STE and ZCR per frame size (10ms=160sample) and widow size equal to half of the frame size (5ms=80sample). Three thresholds computed, upper energy threshold (ITU), lower energy threshold (ITL), and Zero-crossing rate threshold (IZCT). The STE curve is then searched to find the first and second successive crossings of the ITU, and it is necessary to move backward and forward from the first and second crossings to find the end-points, where the lower threshold ITL is first exceeded. This initial search yields to the tentative endpoints. Next, perform fine search on the ZCR curve, moving toward the ends from N1 and N2 for no more than 10 frames, checking the ZCR to find three occurrences of counts above the threshold IZCT The final end-point estimation is found by moving backward and/or forward to the time of the first threshold crossing if three such occurrences are found. The N1 and N2 remain at the initial values when three such occurrences over the ZCR are not found. After this point, the speech signal is ready for feature extraction by obtaining speech signal to the Mel-frequency filter bank, as described in chapter two, by using Eqs. (2.10, 2.11, 2.12, 2.13, and 2.14) to find the MFCC coefficients as features of the speech signal and composed as: one coefficient for energy and twelve coefficients as MFCC, and by using Eq.(2.15) to find additional coefficients as delta and double delta coefficients of the original one. That yields the total multi-dimensional features as: (1E+12MFCC+ǻ(ǻ0)&&ǻǻ(ǻǻ0)&&). The total features are arranged in the form of matrix (MFCC [I, J]) as I=39, which represents the number of elements of each frame, and J represents the total overlapped frame of speech signal (No. of the total vector).

38 The feature matrix of each word that represents instruction movement of wheelchair will save in the memory of the computer as a data base for speech recognition. Training process ended when all instructions words are recorded and extract the features matrix for each word. For one user, the recorded word is equal to five and if there are two users recorded word equals ten, five for each users represent the direction movement and extract and store their matrices as data base, and so on if there are more users.

3.2.2 Recognition step The recognition or (testing) process starts when word recorded (instruction) and the same steps of the training process are done on speech signal for pre-processing and feature extraction and get the features multidimension matrix, as shown in Figure (3.3).This matrix will match with the feature matrices of data base that are extracted in training process. The matching process is done by using dynamic time warping (DTW), as described in chapter two, by using Eqs. (2.17, and 2.18), to find the optimal path of warping between the new features matrix (instruction) and each features matrix in the data base. The d(i, j) matrix are calculated and represented the local distance for each point in the grid that constructed via alignment of two features matrices and then the accumulated or global distance D(i, j) will be calculated, with initial condition: D (1, 1) =2*d (1, 1)

(3.1)

And accumulated distance at the diagonal: D (i, j) =D (i-1, j-1) +2*d (i, j)

(3.2)

Then the minimum distance is obtained by comparing it with the threshold to get the recognition of the instructions.

39

Figure (3.3): Folwchart of speech recognition system

40

3.3 Manual controls To control movement using keyboard as joystick for the wheelchair we used MATLAB (2012a), to program the four instructions of movement and stop instruction, the program sends the instruction code for the instruction movement directly when pressing the button that specified for the direction. The buttons that proposed to use like that used in the computer, for controlling the direction in games, as show in Figure (3.4):

Figure (3.4): keyboard control

The buttons

in the keyboard

construct plus form "+", the

number eight represents the forward direction, number two represents back direction, number six represent right direction, number four represents left direction And number five, which is in the center of plus, represents stop instruction.

41 The microcontroller receives the instruction codes, then by using the driver circuit controls the wheelchair motors.

3.4 8051-Ready Additional Board The 8051-Ready additional board, shown in Figure (3.4), enables a hex code to be quickly and easily loaded into 8051 Microcontrollers by using the 8051prog programmer. The additional board is supplied with three sockets for 8051 microcontrollers in DIP40, DIP20 and PLCC40 packages, 2x5 male connectors are connected to the Microcontroller pins, pads, and screw terminal for power supply, USB connector, pull-up resistors and reset button, see Figure (3.5) [36]. Key features [36]: - Data transfer via USB-UART communication; - Programming via the external programmer; - Pads; - 8 to 16V AC/DC power supply voltage.

Figure (3.4): 8051-Ready additional board [36]

42

Figure (3.5): Additional board connection schematics [36]

43

The Antel (AT89S8253) Microcontroller 40P6±40-lead PDIP (shown in Figure (3.6)) was placed in the 8051-Ready additional board in DIP40.

Figure (3.6): AT89S8253 (40P6 ± 40-lead PDIP) [36]

3.5 Interface between laptop and Board The Interface between laptop and 8051-ready addition board perform via USB cable, as shown in Figure (3.7).

Figure (3.7): Connection between laptop and board

44 FT232R USB UART IC as shown in Figure (3.8) is single chip USB to asynchronous serial data transfer interface at TTL level, without the need to TTL level converters as conventional USB to serial converters, and provides true 5V/3.3V/2.8V/1.8V CMOS drive output and TTL input. USB-UART communication was enabled, by placing jumpers J2 and J3. By doing this, the RX and TX pins of the USB-UART module are connected to the appropriate pins of the microcontroller (P3.0 for RX-MCU and P3.1 for TX-MCU) as shown in Figure (3.8).

Figure (3.8): FT232R USB UART IC

The output of Microcontroller is on port P1 to drive direction movement circuit.

3.6 Driver circuit Driver circuit was built to connect the Microcontroller board to high power consumption two motors. The Driver circuit as shown in Figure (3.9) consists of (ULN2803) chip and eight relays to construct two H-bridges, to coordinate movement of two motors.

45

Figure (3.9): Driver circuit

The ULN2803A is a high-voltage, high-current Darlington transistor array. The device consists of eight NPN Darlington pairs that features high-voltage outputs with common-cathode clamp diodes for switching inductive loads. The collector-current rating of each Darlington pair is 500mA see Figure (3.10) [37].

Figure (3.10): Darlington pair in ULN2803 [37]

46

The outputs of (ULN2803A) connected to the inputs of 12V relays that construct the H-bridge circuits to drive 12V two motors. H-bridge relays as shown in Figure (3.11) is a method to drive motor either clock wise or anti-clock wise according to the input on the four relays. If its needed to drive the motor to clock wise direction, logic one must be applied to the relay A and relay D so the current passes through relay A to the Motor and passes to the ground through relay B, and if its needed to drive motor anti-clock wise, logic one must be applied to relay B and relay C so the current passes through relay B to motor and passes to the ground through relay C.

Figure (3.11): H-bridge relays

47 Two H-bridge relays used in Driver board as shown in Figure (3.12) one for right motor and the second one for the left motor, ZKHUHWKH 33«3 LVWKHLQSXWWRWKHUHOD\VDIWHUFXUUHQWDPSOLILFDWLRQ using (ULN2803A).

Figure (3.12): Right and Left motors with H-bridge relays

To control wheelchair movement direction the Microcontroller output coordinates which motor work, and its direction rotation according to the instructions come from notebook. If wheelchair wanted to move forward both motors right and left must work, if it wanted to move back both motors work too but in the opposite direction, if it wanted to move right only right motors work and to the left only left motor work as shown in the table (3.1).

48

Table (3.1): Microcontroller output to control Driver circuit

Direction

Microcontroller output (Port1)

Forward

00110011

Back

11001100

Right

00000011

Left

00110000

Stop

00000000

49

Chapter 4 Implementation and Results 4.1 Introduction Both practical and simulation results will be discussed in this chapter. In speech recognition part MATLAB was used to implement it, for the simulation of the speech PRAAT program was used in addition to the MATLAB program ,PRAAT is a free scientific computer software package for the analysis of speech in phonetics. It was designed, and continues to be developed, by Paul Boersma and David Weenink of the Amsterdam, while the Microcontroller part simulated too before implementation on Proteus, Proteus is software for Microprocessor simulation, after simulation in Proteus the MikroC program was loaded to Microcontroller by using flash programmer, the tools of MikroC were used to test UART sent to Microcontroller.

4.2 Speech Recognition results As seen in previous chapters, the voice of speaker must pass through speech recognition steps pre-processing, feature extraction, and features matching to perform speech recognition.

4.2.1 Pre-processing results After recording the speech through the microphone, the recorded signal saved allows recording 16000 samples at a sample rate equal to 16 KHz, mono wave file format. The pre-emphasis step and end point detection performed on signal and the PRAAT program used to compare results of the output signal by manual selection of the voice of the word, as shown in Figure (4.1) using PRAAT program to select the Arabic

50 word " Ammam" (ϡΎϣ΃) in pink color and the result of end point detection can be seen in MATLAB program in Figure (4.2):

Figure (4.1): word Ammam" ϡΎϣ΃ using PRAAT program

Figure (4.2): word Ammam "ϡΎϣ΃" before and after end point detection

51 The figure (4.3) and (4.4) show the end point detection in MATLAB program and the comparison using PRAAT program of the Arabic word Yameen "ϦϴϤϳ" respectively.

Figure (4.3): word Yameen " ϦϴϤϳ using PRAAT program

Figure (4.4): word Yameen " ϦϴϤϳ before and after end point detection

52 The same thing for the Arabic words Yassar "έΎδϳ", Kaalf "ϒϠΧ" and Kef "ϒϗ" as show in figures (4.5),(4.6),(4.7),(4.8),(4.9) and (4.10) respectively.

Figure (4.5): word Yassar " έΎδϳ using PRAAT program

Figure (4.6): word Yassar " έΎδϳ before and after end point detection

53

Figure (4.7): word Kalf ϒϠΧ using PRAAT program

Figure (4.8): word Kalf ϒϠΧ before and after end point detection

54

Figure (4.9): word Kef ϒϗ using PRAAT program

Figure (4.10): word Kef ϒϗ before and after end point detection

55

4.2.2 Feature extraction result The resolution of the features is increased by increasing the number of frames with high overlapping and the number of features vectors. The frame size in this work 400 sample points and 160 sample points as shaft frame by using a hamming window as shown in Figure (4.11)

Figure (4.11): Hamming window The frame number depends on the length of the speech signal after end point detection and the word length according to how the speaker pronounce the word, for example Arabic word Ammam "ϡΎϣ΍" may be a speaker pronounces it like Ammaam "ϡ΍Ύϣ΍". The Figure (4.12) shows the speech signal after end point detection of the word Ammam "ϡΎϣ΍" with respect to the frame number in this example, the frame number is equal 40 frames.

56

Figure (4.12): speech signal with respect to the frame number After appalling the FFT, the signal passes to Mel-frequency filters banks, which are 40 filter banks as shown in Figure (4.13).

Figure (4.13): Mel-frequency filter bank

57 Then convert the frequency domain into time domain signal using Discrete Cosine Transform of the spectrum generates MFCC of each frame as shown in the Figure (4.15).

Figure (4.15): MFCC of each frame

Then the Log energy of each frame computed and added to feature vectors as shown in Figure (4.16) energy of word Ammam "ϡΎϣ΃" the frame number in this example is equal to 41. The features matrix consists now from MFCC vectors which are 12 vectors and the energy vectors which represent the column and row represents the number of frames as shown in Figure (4.17).

58

Figure (4.16): Energy with respect to frame number

Figure (4.17): MFCC vectors with respect frame number

After that the Delta coefficient of the 13 vectors is an applied to get the speed of change in each frame, these vectors 13 vectors too as shown in Figures (4.18) and (4.19).

59

Figure (4.18): Delta-MFCC

Figure (4.19): Delta Energy

60 And the same thing takes place when taking delta-delta MFCC coefficients and delta-delta energy coefficient as shown in Figures (4.20) and (4.21)

Figure (4.20): Delta-Delta MFCC

Figure (4.21): Delta-Delta Energy

61 The total feature vectors 39 of each frame stores in matrix X(i,j), i represents feature vector and j represents the frame number as shown in Figure (4.22).

Figure (4.22): Feature vector matrix The plot of this matrix is shown in Figure (4.23), the x-axis represents the feature vectors and the y-axis represents the value of each frame.

62

Figure (4.23): word frames with respect to feature vectors

At this point the feature extraction ends. As seen in the previous chapter, the feature extracted is taken in the two steps of speech recognition, training and testing. In this work, training step must be done for the word and speaker to perform speaker and word recognition. Then the test step must be feature extracted to match with the training feature matrices. As can seen in Figures (4.24) to (4.43), four different speakers feature vectors for the Arabic words Ammam "ϡΎϣ΃",Yammen "ϦϴϤϳ",Yassar "έΎδϳ" ,Kalf "ϒϠΧ" and Kef "ϒϗ", that can be noted the difference between each feature vectors and the other for the speaker and the word.

63

Figure (4.24): Speaker 1 feature vectors for word Ammam "ϡΎϣ΃"

Figure (4.25): Speaker 2 feature vectors for word Ammam "ϡΎϣ΃"

64

Figure (4.26): Speaker 3 feature vectors for word Ammam "ϡΎϣ΃"

Figure (4.27): Speaker 4 feature vectors for word Ammam "ϡΎϣ΃"

65

Figure (4.28): Speaker 1 feature vectors for word Yamem "ϦϴϤϳ"

Figure (4.29): Speaker 2 feature vectors for word Yamem "ϦϴϤϳ"

66

Figure (4.30): Speaker 3 feature vectors for word Yamem "ϦϴϤϳ"

Figure (4.31): Speaker 4 feature vectors for word Yamem "ϦϴϤϳ"

67

Figure (4.32): Speaker 1 feature vectors for word Yassar "έΎδϳ"

Figure (4.33): Speaker 2 feature vectors for word Yassar "έΎδϳ"

68

Figure (4.34): Speaker 3 feature vectors for word Yassar "έΎδϳ"

Figure (4.35): Speaker 4 feature vectors for word Yassar "έΎδϳ"

69

Figure (4.36): Speaker 1 feature vectors for word Kalf "ϒϠΧ"

Figure (4.37): Speaker 2 feature vectors for word Kalf "ϒϠΧ"

70

Figure (4.38): Speaker 3 feature vectors for word Kalf "ϒϠΧ"

Figure (4.39): Speaker 4 feature vectors for word Kalf "ϒϠΧ"

71

Figure (4.40): Speaker 1 feature vectors for word Kef "ϒϗ"

Figure (4.41): Speaker 2 feature vectors for word Kef "ϒϗ"

72

Figure (4.42): Speaker 3 feature vectors for word Kef "ϒϗ"

Figure (4.43): Speaker 4 feature vectors for word Kef "ϒϗ"

73

4.2.3 Feature matching result Feature matching of feature vectors that are extracted before, by using DTW to compute

the optimal path of warping between two feature

vectors. First the local distance matrix is computed, as shown in the Figure (4.44).

Figure (4.44): local distance After that the slop of the path is computed according to the size of two matrices features, and then the global distance is computed to get the optimal path, as shown in Figure (4.45).

Figure (4.45): DTW path

74 The test word recognized corresponding to the feature matrix with the lowest matching score, as shown in Figures (4.46) to (4.50), DTW path taken for recognition of utterance of APPDP µϡΎϣ΃¶ ZLWK UHVSHFW WR other direction words.

Figure (4.46): DTW path taken for word Ammam "ϡΎϣ΃" with respect to word Kalf "ϒϠΧ"

Figure (4.47): DTW path taken for word Ammam "ϡΎϣ΃" with respect to word Yammen "ϦϴϤϳ"

75

Figure (4.48): DTW path taken for word Ammam "ϡΎϣ΃" with respect to word Yassar "έΎδϳ"

Figure (4.49): DTW path taken for word Ammam "ϡΎϣ΃" with respect to word Keef "ϒϗ"

76

Figure (4.50): DTW path taken for word Ammam "ϡΎϣ΃" with respect to word Ammam "ϡΎϣ΃"

The recognition of the speaker for the same test word is in the same way, as shown in Figures (4.51) to (4.54), the word ammam "ϡΎϣ΃" for the speaker 1 in comparison with other speakers.

Figure (4.51): DTW path taken for word Ammam "ϡΎϣ΃" by speaker1 with respect to word Ammam "ϡΎϣ΃" by speaker2

77

Figure (4.52): DTW path taken for word Ammam "ϡΎϣ΃" by speaker1 with respect to word Ammam "ϡΎϣ΃" by speaker3

Figure (4.53): DTW path taken for word Ammam "ϡΎϣ΃" by speaker1 with respect to word Ammam "ϡΎϣ΃" by speaker4

78

Figure (4.54): DTW path taken for word Ammam "ϡΎϣ΃" by speaker1 with respect to word Ammam "ϡΎϣ΃" by speaker1

4.2.4 Recognition Results The recognition rate was computed to the effect of background noise for four speakers by testing each word ten times for noise level 40dB, 50dB, 55dB, 60dB, 66dB, and 73dB, as shown in tables (4.1) to (4.4).The first row represents the SLM measurements at different values, under each value of the SLM reading, the recognition rate corresponding to each uttered word is found. x Speaker1 The test for recognition rate of speaker1 shows 100% for all words under 66dB, while the recognition rate decrease at 66dB for words Amamm ϡΎϣ΃ and Ymmen ϦϴϤϳ 80%, 90% respectively. More decreasing of recognition rate at 73dB for words Amamm ϡΎϣ΃,Ymmen ϦϴϤϳ and Keef ϒϗ 40%,90% and 90% respectively, as show in Table (4.1).

79 x Speaker2 The test for recognition rate of speaker2 shows 100% for all words under 66dB, while the recognition rate decrease at 66dB for word Ymmen ϦϴϤϳ 80%. More decreasing of recognition rate at 73dB for words Amamm ϡΎϣ΃,Ymmen ϦϴϤϳ, Yassar έΎδϳ, and Keef ϒϗ 80%,60%,90% and 70% respectively, as show in Table (4.2). x Speaker3 The test for recognition rate of speaker3 shows 100% for all words under 66dB, while the recognition rate decrease at 66dB for words Amamm ϡΎϣ΃,Ymmen ϦϴϤϳ, and Keef ϒϗ 80%,90%,and 90% respectively. More decreasing of recognition rate at 73dB for words Amamm ϡΎϣ΃,Ymmen ϦϴϤϳ, Yassar έΎδϳ, and Keef ϒϗ 40%,60%,80% and 60% respectively, as show in Table (4.3). x Speaker4 The test for recognition rate of speaker4 shows 100% for all words under 66dB, while the recognition rate decrease at 66dB for word Kalf ϒϠΧ 90%. More decreasing of recognition rate at 73dB for words Amamm ϡΎϣ΃,Ymmen ϦϴϤϳ, Kalf ϒϠΧ, and Keef ϒϗ 70%,60%,70% and 70% respectively, as show in Table (4.4).

Table (4.1): Speaker1 recognition rate with noise level Word ϡΎϣ΃ ϦϴϤϳ έΎδϳ ϒϠΧ ϒϗ

40dB 100% 100% 100% 100% 100%

50dB 100% 100% 100% 100% 100%

55dB 100% 100% 100% 100% 100%

60dB 100% 100% 100% 100% 100%

66dB 80% 90% 100% 100% 100%

73dB 40% 90% 100% 100% 90%

80 Table (4.2): Speaker2 recognition rate with noise level Word ϡΎϣ΃ ϦϴϤϳ έΎδϳ ϒϠΧ ϒϗ

40dB 100% 100% 100% 100% 100%

50dB 100% 100% 100% 100% 100%

55dB 100% 100% 100% 100% 100%

60dB 100% 100% 100% 100% 100%

66dB 100% 80% 100% 100% 100%

73dB 80% 60% 90% 100% 70%

Table (4.3): Speaker3 recognition rate with noise level Word ϡΎϣ΃ ϦϴϤϳ έΎδϳ ϒϠΧ ϒϗ

40dB 100% 100% 100% 100% 100%

50dB 100% 100% 100% 100% 100%

55dB 100% 100% 100% 100% 100%

60dB 100% 100% 100% 100% 100%

66dB 80% 90% 100% 100% 90%

73dB 40% 60% 80% 100% 60%

Table (4.4): Speaker4 recognition rate with noise level Word ϡΎϣ΃ ϦϴϤϳ έΎδϳ ϒϠΧ ϒϗ

40dB 100% 100% 100% 100% 100%

50dB 100% 100% 100% 100% 100%

55dB 100% 100% 100% 100% 100%

60dB 100% 100% 100% 100% 100%

66dB 100% 100% 100% 90% 100%

73dB 70% 60% 100% 70% 70%

4.3 Microcontroller part As seen in previous chapter, the microcontroller programmed in MikroC language then it was loaded using the flash programmer after simulation the program using Proteus program and testing UART then sending it to microcontroller by using MikroC tools.

81 4.3.1 Simulation and test of UART The practical circuit was built using Proteus program and using tool called virtual terminal connected with circuit to send the character code that represents the word that coordinates microcontroller output to the relays

which

control

motors

directions,

as

shown

in

Figure (4.55).

Figure (4.55): Simulation using Proteus program The sending of UART from microcontroller was tested, and computed the sending and processing time by using loop back program in microcontroller and MikroC PRO tool called USART Terminal, as shown in Figure (4.56).

82

Figure (4.56): Test UART communication between microcontroller and computer

83 4.3.2 Loaded program to Microcontroller After simulating the MikroC program, the hex file of program was loaded using flash program to microcontroller, as shown in Figure (4.57).

Figure (4.57): Loaded hex file to microcontroller

84

Chapter Five Conclusions and Suggestions for Future Work

5.1 Conclusions The overall objective of this project is to build wheelchair control system with speech recognition control and manual joystick control. There are many of points concluded, based on all experiments conducted in this work as follows: 1. The increase of feature vectors makes the system more immune to noise and increase the recognition rate. Feature extraction based on MFCC method, which provides an excellent method for compression audio signals by extracting most features of the voice signal to convert the features vectors matrix and add to it the energy then the derivative and the second derivative of these features vectors to be 39 dimension features vectors in each frame, the number of frames depends on the length of the of speech signal. 2. The use of dynamic time warping with features vectors matrices gives an excellent alignment and a recognition rate, because its good method for computing similarity of two non-equal matrices. The minimum optimal path between the test signal features matrix and each training signal features matrix represents the recognition word with respect to threshold that increases with increment of features vectors. 3. This system does not require recording the word in training step more than one time to be able to recognize the uttered word correctly.

85 4. The use of Antel (AT89S8253) microcontroller with the 8051-Ready additional

board

and

MikroC

Pro

for

programming

the

microcontroller, provides a suitable platform for implementing an embedded control system and it is possible to modify it to meet future requirements easily and quickly.

5.2 Suggestions for Future Work Several considerations may be achieved for future extension of the research by using the following suggestions: 1. Wheelchair can be developed features like

automatic

avoiding walls or barriers by adding sensors attached to Microcontroller, speed control, and wireless control of system using Zigbee technology. All these features and more can easily be added to the system using microcontroller part. 2. System can be developed to control robotics via both speech recognition and manual control. 3. System can be developed to control secure doors depending

on

speaker

recognition

or

security

recognition. 4. Speech recognition to control operation of electrical appliances of smart houses. 5. Developing speech recognition system using another method like HMM and vector quantization (VQ) in addition to the DTW to match the features vectors.

86

References [1] A. Refeis, ³)3*$ ,PSOHPHQWDWLRQ RI 6SHHFK 5HFRJQLWLRQ 6\VWHP %DVHGRQ+00´, M.Sc.Thesis, University of Technology, Baghdad, Iraq, August 2012. [2] L. Rabiner & B. Juang, ³)XQGDPHQWDOV RI 6SHHFK 5HFRJQLWLRQ´ Printice-Hall International, Inc., ISBN: 0-13-285826-6, 1993. [3] V. Tunali, ³$ Speaker Dependent, Large Vocabulary, Isolated Word

Speech

Recognition

System

for

Turkish´,

M.Sc.Thesis

T.CMarmara University Institute for Graduate Studies in Pure and Applied Sciences, Istanbul, Turkey, 2005. [4] M. Al-Hassani, and A. Kadhim, ³'HVLJQ $7H[W-Prompt Speaker Recognition

System

LPC-'HULYHG )HDWXUHV´,

Using

The

13th

International Arab Conference on Information Technology ACIT, 2012. [5] 97LZDUL ³MFCC and its Applications in Speaker Recognition´ International

Journal

on

Emerging

Technologies,

PP.

19-22,

ISSN 0975-8364, 2010. [6] H. Beigi, ³)XQGDPHQWDOVRI6SHDNHU5HFRJQLWLRQ´, Springer, e-ISBN 978-0-387-77592-0, 2011. [7] E. Keogh and C. Ratanamahatana, ³([DFWIndexing of Dynamic Time WDUSLQJ´, Knowledge and Information Systems,PP. 358±386, DOI 10.1007/s10115-004-0154-9, 2005. [8] S. Gaikwad, and et al., ³$ 5HYLHZ RQ 6SHHFK 5HFRJQLWLRQ 7HFKQLTXH´, International Journal of Computer Applications, Vol 10, No.3, PP. 16-24 ,November 2010. [9] P. Senin, ³'\QDPLF7LPH:DUSLQJ$OJRULWKP5HYLHZ´ Information and Computer Science Department, University of Hawaii at Manoa, Honolulu, USA, December 2008.

87 [10] G. Gridling and B.Weiss, ³,QWURGXFWLRQ WR 0LFURFRQWUROOHUV´ Courses 182.064 & 182.074, Version 1.4, Vienna University of Technology, Institute of Computer Engineering, Embedded Computing Systems Group, 2007. [11] M. Mazidi and J. Mazidi, ³7KH  0LFURFRQWUROOHU DQG (PEHGGHG

6\VWHPV´,first

edition,

Prentice

Hall,

ISBN-13:

9780138610227, 1999. [12] K.S. Ananthakrishnan and R. Andrew, ³6peaker Adaptable Voice Controlled Model-Vehicle Using Energy Threshold and MFCC Parameters´ 9th Australian International Conference on Speech Science & Technology, Melbourne, PP. 220-225, December 2002. [13] Z. Abd Ghani, ³Wireless Speed Control with Voice for Wheelchair Apllication´06F7KHVLVUniversiti Teknologi Malaysia, May 2007. [14] S. Ke, Y. Hou, Z. Huang, and H. Li³$+006SHHFK5HFRJQLWLRQ 6\VWHP %DVHG RQ )3*$´, IEEE, Congress on Image and Signal Processing, Vol. 5, PP. 305-309, 2008. [15] H. Nik, ³Hum-Power Controller for Powered Wheelchairs´ M.Sc.Thesis, George Mason University, 2009. [16] M. Qadri and S. Ahmed, ³9oice Controlled Wheelchair Using DSK 706&´ International Conference on Signal Acquisition and Processing, PP. 217-220, 2009. [17] S. Jothilakshmi, V. Ramalingam, and S. Palanivel, ³8QVXSHUYLVHG Speaker Segmentation with Residual Phase and MFCC )HDWXUHV´, Elsevier, Expert Systems with Applications, Vol. 36, Issue 6, PP. 97999804, August 2009. [18] L. Muda, M. Begam, and I. Elamvazuthi, ³9RLFH 5HFRJQLWLRQ Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and '\QDPLF 7LPH :DUSLQJ '7:  7HFKQLTXHV´, Journal of Computing, VOL. 2, ISSUE 3, ISSN 2151-9617, PP. 138-143, March 2010.

88 [19] A. AL-Thahab, ³&RQWUROOHG RI 0RELOH 5RERWV E\ 8VLQJ 6SHHFK 5HFRJQLWLRQ´, Journal of Babylon University, Pure and Applied Sciences, Vol. 19, Issue 3, PP. 1123- 1139, 2011. [20] H. Moslehi, ³'esign and Development of Fuzzy Logic Operated Microcontroller Based Smart Motorized Wheelchair´, M.Sc.Thesis, Dalhousie University, Halifax, Nova Scotia, April 2011. [21] B. Plannerer, ³An Introduction to Speech Recognition´ Germany, 2005. [22] S. Pan, C. Chen, and J. Zeng, ³6SHHFK 5HFRJQLWLRQ YLD +LGGHQ 0DUNRY 0RGHO DQG 1HXUDO 1HWZRUN 7UDLQHG E\ *HQHWLF $OJRULWKP´, Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, PP. 2950-2955, 11-14 July 2010. [23] M. Nilsson and M. Ejnarsson, ³6SHHFK 5HFRJQLWLRQ XVLQJ +LGGHQ 0DUNRY 0RGHO´, Department of Telecommunications and Signal Processing, Blekinge Institute of Technology, March 2002. [24] W. Chu, ³6SHHFK&RGLQJ$OJRULWKPV´, John Wiley & Sons, ISBN 0-471-37312-5, 2003. [25] L. R. Rabiner and R. W. Schafer, ³'LJLWDO3URFHVVLQJRI6SHHFK 6LJQDOV´, Prentice Hall, ISBN 0-13-213603-1, 1978. [26] R.S.Kurcan, ³,VRODWHG:RUG5HFRJQLWLRQIURP,Q-Ear Microphone 'DWD 8VLQJ +LGGHQ 0DUNRY 0RGHOV +00 ´, M.ScThesis, Naval Postgraduate School, California, March 2006. [27] L. R. Rabiner,and M. R. Sambur, ³$Q $OJRULWKP IRU 'HWHUPLQLQJ the (QGSRLQWV RI ,VRODWHG 8WWHUDQFHV´, The Bell System Technical Journal, Vol. 54, No. 2, PP. 297-315, February 1975. [28] J.-C. Wang, J.-Fa. Wang, and Y.-S. Weng, ³&KLS'HVLJQRI0)&& ([WUDFWLRQ IRU 6SHHFK 5HFRJQLWLRQ´, CiteSeer,Vol. 32, No. 1-2, PP.111-131, November 2002.

89 [29] J. Cao, and et al., ³$ 7ZR-stage Pattern Matching Method for Speaker Recognition of 3DUWQHU5RERWV´, ©2010 IEEE, 2010. [30] M. Hossan and M. Gregory, ³6SHDNHU Recognition Utilizing Distributed DCT-II Based Mel Frequency Cepstral Coefficients and Fuzzy Vector QXDQWL]DWLRQ´, Int J Speech Technol, PP. 103±113, DOI 10.1007/s10772-012-9166-0, 2013. [31] S. Jothilakshmi, V. Ramalingam, and S. Palanivel, ³8QVXSHUYLVHG 6SHDNHU 6HJPHQWDWLRQ ZLWK 5HVLGXDO 3KDVH DQG 0)&& )HDWXUHV´, Elsevier, Expert Systems with Applications, Vol. 36,Issue 6, PP. 97999804, August 2009. [32] S. Chapaneri, ³6SRNHQ Digits Recognition using Weighted MFCC DQG ,PSURYHG )HDWXUHV IRU '\QDPLF 7LPH :DUSLQJ´, International Journal of Computer Applications, DOAJ, Vol. 40, No. 3, PP. 6-12, February 2012. [33] A. Karahoca, ³$GYDQFHVLQ'DWD0LQLQJ.QRZOHGJH'LVFRYHU\DQG ApplicDWLRQV´, InTech, 1st Edition, ISBN 978-953-51-0748-4, 2012. [34] L. Kesen, ³,mplementation of an 8-Bit Microcontroller with system C´, M.Sc.Thesis, The Graduate School of Natural and Applied Sciences, Middle East Technical University, November 2004. [35]http://www.mikroe.com/chapters/view/67/chapter-4-at89s8253microcontroller/#ch4.1. [36] MikroElektronika Corporation, ³8051-Ready PDQXDO´, Ver.100, 2010. [37] Texas Instruments Incorporated, ³8/1$´, 2006.

A Appendix A Many Features of AT89S8253 Microcontroller are as follows: ‡&RPSDWLEOHZLWK0&6Š-51 Products ‡.%\WHVRI,Q-System Programmable (ISP) Flash Program Memory ± SPI Serial Interface for Program Downloading ± Endurance: 10,000 Write/Erase Cycles ‡.%\WHV((3520'DWD0HPRU\ ± Endurance: 100,000 Write/Erase Cycles ‡-byte User Signature Array ‡9WR92SHUDWLQJ5DQJH ‡)XOO\6WDWLF2SHUDWLRQ+]WR0+] LQ[DQG[0RGHV ‡7KUHH-level Program Memory Lock ‡[-bit Internal RAM ‡3URJUDPPDEOH,2/LQHV ‡7KUHH-bit Timer/Counters ‡1LQH,QWHUUXSW6RXUFHV ‡(QKDQFHG8$576HULDO3RUWZLWK)UDPLQJ(UURU'HWHFWLRQDQG$XWRPDWLF Address Recognition ‡(QKDQFHG63, 'RXEOH:ULWH5HDG%XIIHUHG 6HULDO,QWHUIDFH ‡/RZ-power Idle and Power-down Modes ‡,QWHUUXSW5HFRYHU\IURP3RZHU-down Mode ‡3URJUDPPDEOH:DWFKGRJ7LPHU ‡'XDO'DWD3RLQWHU ‡3RZHU-off Flag ‡)OH[LEOH,633URJUDPPLQJ %\WHDQG3DJH0RGHV ± Page Mode: 64 Bytes/Page for Code Memory, 32 Bytes/Page for Data Memory ‡Four-level Enhanced Interrupt Controller ‡3URJUDPPDEOHDQG)XVHDEOH[&ORFN2SWLRQ

B ‡,QWHUQDO3RZHU-on Reset ‡-pin PDIP Package Option for Reduced EMC Emission ‡*UHHQ 3E+DOLGH-free) Packaging Option The

architecture

of

microcontroller

(AT89S8253)

is

shown

Figure (A).

Figure (A): architecture of microcontroller (AT89S8253)

in

C

Software tools Software tools of Microcontroller are used to programming and loading program to Microcontroller:

MikroC and 8051FLASH MikroC PRO for 8051 is a full-featured C compiler for 8051 microcontrollers from Atmel and Silicon Labs. It is designed for developing, building and debugging 8051-based embedded applications. This development environment has a wide range of features such as: easyto-use Integrated Design Environment (IDE), very compact and efficient code,

many

hardware

and

software

libraries,

comprehensive

documentation, software simulator and many more. MikroC for 8051 allows programmer to quickly develop and deploy complex applications: - The programmer writes C source code using the built-in Code Editor (Code and Parameter Assistants, Code Folding, Syntax Highlighting, Auto Correct, Code Templates, and more.) - Included MikroC libraries are used to dramatically speed up the development:

data

acquisition,

memory,

displays,

conversions,

communication « etc. - Then the programmer monitors the program structure, variables, and functions in the Code Explorer. -Generate commented, human-readable assembly, and standard HEX compatible with all programmers. - Inspect program flow and debug executable logic with the integrated Software Simulator. - Get detailed reports and graphs: RAM and ROM map, code statistics, assembly listing, calling tree, and more.

D On board MLNURSURJŒ SURJUDPPHU UHTXLUHV a special programming software called 8051FLASH. It is used for programming all of Atmel® at89 microcontrollers. Software has intuitive interface and SLQJOHFOLFNŒ programming technology.

2.3.4 8051prog Programmer 7KHSURJŒSURJUDPPHUDVVKRZn in figure (B), is a great tool used for programming 8051 microcontrollers from Atmel®. As a lowconsumption device, it is ideal to be used with notebooks. The 8051prog programmer communicates to the microcontroller through a USB cable which is also used for powering the programmer. The programmer, that work with 8051Flash program, and a hex code generated in any 8051compiler are used to load the program into an 8051 microcontroller.

Figure (B): 8051prog Programmer