Genetic Algorithm Based High Performance Control ...

Academy of Romanian Scientists

C

S

University of Oradea, Faculty of Electrical Engineering and Information Technology

S

Vol. 4, Nr. 2, October 2011

Journal of Computer Science and Control Systems

University of Oradea Publisher

2 Volume 4, Number 2, October 2011 __________________________________________________________________________________________________________

EDITOR IN-CHIEF Eugen GERGELY - University of Oradea, Romania EXECUTIVE EDITORS Cornelia GORDAN - University of Oradea, Romania Cristian GRAVA - University of Oradea, Romania Dorel HOBLE - University of Oradea, Romania Teodor LEUCA - University of Oradea, Romania Erica MANG - University of Oradea, Romania

Daniela E. POPESCU Ioan C. RADA Helga SILAGHI Daniel TRIP

- University of Oradea, Romania - University of Oradea, Romania - University of Oradea, Romania - University of Oradea, Romania

ASSOCIATE EDITORS Mihail ABRUDEAN Lorena ANGHEL Gheorghe Daniel ANDREESCU Angelica BACIVAROV Valentina BALAS Barnabas BEDE Eugen BOBASU Dumitru Dan BURDESCU Petru CASCAVAL Horia CIOCARLIE Tom COFFEY Geert DECONINCK Ioan DESPI Jozsef DOMBI Toma Leonida DRAGOMIR Ioan DZITAC János FODOR Voicu GROZA Kaoru HIROTA Stefan HOLBAN Štefan HUDÁK Geza HUSI Ferenc KALMAR Jan KOLLAR Tatjana LOMAN Marin LUNGU Anatolij MAHNITKO Ioan Z. MIHU Shimon Y. NOF George PAPAKONSTANTINOU Mircea PETRESCU Emil PETRIU Constantin POPESCU Dumitru POPESCU Alin Dan POTORAC Nicolae ROBU Hubert ROTH Eugene ROVENTA Ioan ROXIN Imre J. RUDAS Rudolf SEISING Ioan SILEA Lacramioara STOICU-TIVADAR Athanasios D. STYLIADIS Lorand SZABO Janos SZTRIK Lucian VINTAN Mircea VLADUTIU

Technical University of Cluj-Napoca, Romania I.N.P. Grenoble, France "Politehnica" University of Timisoara, Romania University Politehnica of Bucharest, Romania “Aurel Vlaicu” University of Arad, Romania The University of Texas at El Paso, USA University of Craiova, Romania University of Craiova, Romania "Gheorghe Asachi" Technical University of Iasi, Romania "Politehnica" University of Timisoara, Romania University of Limerick, Ireland Katholieke Universiteit Leuven, Belgium University of New England, Armidale, Australia University of Szeged, Hungary "Politehnica" University of Timisoara, Romania Agora University of Oradea, Romania Szent Istvan University, Budapest, Hungary University of Ottawa, Canada Tokyo Institute of Technology, Yokohama, Japan "Politehnica" University of Timisoara, Romania Technical University of Kosice, Slovakia University of Debrecen, Hungary University of Debrecen, Hungary Technical University of Kosice, Slovakia Technical University of Riga, Latvia University of Craiova, Romania Technical University of Riga, Latvia “Lucian Blaga” University of Sibiu, Romania Purdue University, USA National Technical University of Athens, Greece University Politehnica of Bucharest, Romania University of Ottawa, Canada University of Oradea, Romania University Politehnica of Bucharest, Romania "Stefan cel Mare" University of Suceava, Romania "Politehnica" University of Timisoara, Romania Universität Siegen, Germany Glendon College, York University, Canada Universite de Franche-Comte, France Tech Polytechnical Institution, Budapest, Hungary European Centre for Soft Computing, Mieres (Asturias), Spain "Politehnica" University of Timisoara, Romania "Politehnica" University of Timisoara, Romania Alexander Institute of Technology, Greece Technical University of Cluj Napoca, Romania University of Debrecen, Hungary "Lucian Blaga" University of Sibiu, Romania "Politehnica" University of Timisoara, Romania

ISSN 1844 - 6043 This volume includes papers in the folowing topics: Artificial intelligence and robotics, Real-time systems, Software engineering and software systems, Advanced control of electrical drives, Dependable computing, data security and cryptology, Computer networks, Modern control systems, Process control and task scheduling, Web design, Databases and data mining, Computer graphics and virtual reality, Image processing.

Journal of Computer Science and Control Systems 3 __________________________________________________________________________________________________________

CONTENTS

ABDELMALEK Ibtissem1, GOLÉA Noureddine2 - 1University of Batna, Algeria, 2University of Oum El Bouaghi, Algeria LMI-Based Design of Fuzzy Controller and Fuzzy Observer for Takagi-Sugeno Fuzzy Systems: New Non-Quadratic Stability Approach.....................................................................................................................................5 DAXIAO Wang, YINGJIAN Xu, WEI Zhang, LEIJUN Xiang - Shanghai Jiao Tong University, China Controller Designing and Parameter Tuning in the First Order plus Large Delay Time System.....................................13 DRĂGHICIU Nicolae, POPA Sorin - University of Oradea, Romania Computer Analysis of Doppler Signals ...........................................................................................................................19 FILIP Andrei-Toader, HANGIU Radu-Petru, MARŢIŞ Claudia, BIRO Karoly-Agoston - Technical University of Cluj-Napoca, Romania Design of an Interior Permanent-Magnet Synchronous Machine for an Integrated Starter-Alternator System Used on an Hybrid-Electric Vehicle ................................................................................................................................23 GIRIRAJKUMAR S.M.1, KUMAR Atal. A.2, ANANTHARAMAN N.3 - 1M.A.M. College of Engineering, India, University, India, 3National Institute of Technology, India Application of Simulated Annealing for Tuning of a PID Controller for a Real Time Industrial Process .........................29

2SASTRA

HARLIŞCA Ciprian, SZABÓ Loránd - Technical University of Cluj-Napoca, Romania Wavelet Analysis and Park's Vector Based Condition Monitoring of Induction Machines ..............................................35 MANG Erica, MANG Ioan, ANDRONIC Bogdan, POPESCU Constantin - University of Oradea, Romania Theoretical Implementation of the Rijndael Algorithm Using GPU Processing...............................................................39 MANG Ioan, MANG Erica, POPESCU Constantin - University of Oradea, Romania VHDL Implementation of an Error Detection and Correction Module Based on Hamming Code ...................................43 NEAMTU Iosif Mircea - “Lucian Blaga” University of Sibiu, Romania Advanced Methods for Testing Security .........................................................................................................................47 NEAMTU Iosif Mircea - “Lucian Blaga” University of Sibiu, Romania Software Tools to Detect Suspicious Files .....................................................................................................................51 NOVAC Ovidiu, NOVAC Mihaela, VARI-KAKAS Stefan, VLADU Ecaterina - University of Oradea, Romania Aspects of Cache Memory Simulation using Programs Under Windows and UNIX Operating Systems.......................55 NOVAC Ovidiu1, SZTRIK Janos2, VARI-KAKAS Stefan1, KIM Che-Soong3 - 1University of Oradea, Romania, 2University of Debrecen, Hungary, 3Sangji University, South Korea Reliability Increasing Method Using a SEC-DED Hsiao Code to Cache Memories, Implemented with FPGA Circuits............................................................................................................................................................................59


SUKUMAR Senthilkumar, MT PIAH Abd Rahni - University Sains Malaysia A Study on Parallel RKAM for Raster FCNN Simulation ................................................................................................63 ŢEPELEA Laviniu, GAVRILUŢ Ioan, GACSÁDI Alexandru - University of Oradea, Romania A CNN Based Algorithm for Medical Images Correlation ...............................................................................................69 ZERROUKI Nadjib1, GOLÉA Noureddine2, BENOUDJIT Nabil1 - 1University of Batna, Algeria, 2University of Oum El Bouaghi, Algeria Genetic Algorithm Based High Performance Control for Rigid Robot Manipulators.......................................................73


LMI-Based Design of Fuzzy Controller and Fuzzy Observer for Takagi-Sugeno Fuzzy Systems: New Non-Quadratic Stability Approach ABDELMALEK Ibtissem 1, GOLÉA Noureddine 2 1 University of Batna, Algeria, Department of Electronics, Faculty of Engineering Sciences, 05000 Batna, Algeria, E-mail: [email protected] 2 University of Oum El Bouaghi, Algeria, Department of Electronics, Faculty of Sciences and Technology, 04000 Oum El Bouaghi, Algeria, E-mail: [email protected]

Abstract –This paper deals with design procedures of fuzzy controller and fuzzy observer, one of the most important and basic concepts for fuzzy control system design. Fuzzy controller guarantees stability of the whole system i.e. fuzzy controller and fuzzy observer by the Lyapunov stability approach, while fuzzy observer estimates states of the fuzzy dynamic plants. The two designs are formulated as two separate LMI feasibility problems using new non-quadratic stability conditions based on non-quadratic Lyapunov function and Parallel distributed Compensation scheme to stabilize Takagi-Sugeno (T-S) fuzzy systems. Then a separation property based on a vector comparison principle is applied to check the stability of the whole system. Whereas, the obtained results are less conservative. Finally, numerical examples are presented to illustrate the effectiveness of our proposal by showing very satisfactory results. Keywords: T-S fuzzy systems; Non-quadratic stability conditions; Linear Matrix Inequalities; Fuzzy controller; Fuzzy observer; Separation property I. INTRODUCTION During the last years, there has been a great interest in using Takagi-Sugeno fuzzy models to approximate nonlinear systems [1- 3]. Many researches have been developed to demonstrate this concept [4- 5], where the main idea is the use of models that consist of fuzzy IFTHEN rules with linguistic terms in antecedents, and analytic dynamical equations in the consequents. Also, several researchers in the control community have come up with different techniques for designing control systems. In fact, Takagi and Sugeno [3] proposed a multimodel based approach to overcome the difficulties of the conventional modeling techniques [6]. For this purpose, a nonlinear plant is represented by the T-S fuzzy model, where local dynamics in different state regions are represented by linear models. The overall

model of the system is obtained by a fuzzy blending of these local models. This same fuzzy structure is used to control [3, 7- 10] and to study the stability of the T-S fuzzy system using Lyapunov method [7, 11] and Linear Matrix Inequalities (LMI), where the problem can be numerically solved by convex optimization techniques [28]. It can provide an effective solution to the control of many complex systems that are difficult to describe using the linearization or identification techniques. The fuzzy control design is carried out using the Parallel Distributed Compensation (PDC) scheme [7-8]. The main idea of the PDC controller design is to derive each control rule from the corresponding rule of T-S fuzzy model so as to compensate it. The resulting overall fuzzy controller, which is nonlinear in general, is a fuzzy blending of each individual linear controller. Wang et al. [8] used this concept to design fuzzy controllers to stabilize fuzzy systems. The advantage of the T-S fuzzy model lies in that the stability and performance characteristics of the system represented by a T-S fuzzy model can be analyzed using Lyapunov function approach [7, 11] by satisfying a set of Lyapunov inequalities [8, 12]. This has been discussed in a huge number e.g. [7- 9, 13-17] and [23], who used multiple Lyapunov functions, due to their properties of conservatism reduction. The states of a system are not always available for measurement which is not the case in a lot of practical problems. To overcome this problem, the notion of observer was introduced. The concepts of linear regulator and linear observer were introduced by Kalman [19] for linear systems in stochastic environment, whereas for nonlinear systems different designs of observer were proposed [20- 21], that have the disadvantage of design complexity. To overcome these disadvantages, fuzzy observer concepts were introduced, whose the most popular is the T-S fuzzy observer that was introduced by several authors in the literature. Feng et al. [30] and Tanaka [31] give a certain form for the T-S fuzzy observer with an asymptotic

6 __________________________________________________________________________________________________________ Volume 4, Number 2, October 2011

convergence. Also, Tanaka proposed in his papers [27, 31] a globally exponentially stable fuzzy controllers and fuzzy observers designs for continuous and discrete fuzzy systems for both measurable and non measurable premises variables. Another approach was proposed by Ma and Sun [32] for a fuzzy observer analysis and design of reduced-dimensional and fuzzy functional observer with a separation property. In this paper, we extend the stability results given in [16] to the case when the states are not available for measurement and feedback in other terms fuzzy observer, by guarantying the stability of the whole system. The observer design is based on the T-S fuzzy model. Each fuzzy rule is responsible for observing the states of a locally linear subsystem [24- 25], then a separation property is used to check the stability of the global system. The separation property was introduced by Jadbabaie et al. [25] and Ma et al. [26] by two different approaches to assure an independent design for the controller and the observer and the stability of the global T-S system. Whereas, we applied in our proposal the Ma et al. method due to its simplicity, since it does not depend on the stability conditions but rather on the fuzzy Lyapunov functions. Indeed, the separation principle design proposed in [25] is not appropriated for the case of several stability conditions. The reminder of this paper is organized as follows. Section II presents a short outline of the fuzzy controller design proposed in [16] based on the PDC concept for the stabilization of T-S fuzzy systems. Section III and section IV discusses the proposed fuzzy observer and the separation property principle used for assuring the stability of the global system. Finally, in section V, a simulation example shows the effectiveness of the new observer/controller design. Concluding remarks are given in section VI. II. FUZZY CONTROLLER DESIGN Fuzzy controllers are required to satisfy x (t )  0 when t  ¥, that implies stabilization of the fuzzy control system. A T-S fuzzy system is described by fuzzy IF-THEN rules that represent locally linear inputoutput relations of a system. The ith rule of this fuzzy system is of the following form: Model Rule i : IF z1 (t ) is M i 1 and... z p (t ) is M ip ìï x. (t ) = Ai x (t ) + Biu (t ) (1) ï THEN í ïïy (t ) = C i x (t ), i = 1, 2,..., r ïî where z (t ) = éêz1 (t ), ..., z p (t )ùú is the premise variables ë û T é vector, x (t ) = êx 1 (t ), ..., x n (t )ùú is the state vector, ë û é ù u (t ) = êu1 (t ), ..., um (t )ú is the input vector, r is the ë û number of fuzzy rules, and M ij is a fuzzy set. The final

outputs of the fuzzy system are inferred as follows:

r

x (t ) = åhi (z (t ))(Ai x (t ) + Biu (t )) .

(2)

i =1

y (t ) =

r

åh (z (t ))C x (t ) i

(3)

i

i =1

where hi (z (t )) is the normalized weight for each rule, that is hi (z (t )) ³ 0,

r

åh (z (t )) = 1, i

and is given by:

i =1

hi (z (t )) =

wi (t ) where wi (t ) = P pj =1 M ij (z j (t )), å ri=1wi (t )

M ij (z j (t )) is the grade of membership of z j (t ) in M ij . The PDC scheme that stabilizes the Takagi-Sugeno fuzzy system was proposed by Wang et al. [8, 13] as a design framework for fuzzy control. The PDC controller is given by: r

u (t ) = -åhi (z (t )) Fi x (t ) .

(4)

i =1

The goal is to find appropriated Fi gains that ensure the closed loop stability. In this sense and by substituting (4) in (2), we obtain the Takagi-Sugeno closed loop fuzzy system as follows: .

x (t ) =

r

r

ååh (z (t ))h i

j

i =1 j =1

(z (t )) éëAi - Bi Fj ùû x (t )

(5)

which can be rewritten as .

x (t ) =

r

r

ååh (z (t ))h (z (t ))G x (t ) i

i

ii

i =1 i =1

r ìGij + G ji ï ü ï ï + 2ååhi (z (t ))h j (z (t )) ï í ý x (t ) ï ï 2 i =1 i < j ï ï î þ

(6)

where Gij = Ai - Bi Fj and Gii = Ai - Bi Fi . The stabilization of a feedback system containing a state feedback fuzzy controller has been extensively considered. The objective is to select Fi to stabilize the closed loop system. The stability conditions corresponding to a quadratic Lyapunov function were derived by Tanaka and Sugeno in [7]. They give sufficient conditions for stable fuzzy models based on Lyapunov approach. Due to their property of conservatism reduction, a fuzzy Lyapunov function is defined: V (x (t )) = åri=1hi (z (t )) x T (t ) Px i (t ) [7, 15], for studying the stability and stabilization of a TakagiSugeno fuzzy system (2). In this context new nonquadratic stability conditions were proposed under two assumptions in [16]. III. FUZZY OBSERVER DESIGN In practice if states are not available for measurement and feedback, an observer is needed. The objective is that the estimation error that is the difference between the system states and the observer states tends to zero, in other terms fuzzy observer is required to satisfy


x (t ) - xˆ (t )  0 when t  ¥, where xˆ (t ) denotes the states vector estimated by the fuzzy observer that is designed via the PDC. The ith observer rule is of the following form:

ìïG jk + Gkj ïüï ïìïG jk + Gkj ïüï í ý Poi + Poi ïí ý < 0, ïîï ïþï ïîï ïþï 2 2

Observer rule i :

é 1 x (0)ù ê ú ³ 0, for i = 1,.., r ê -1 ú êx (0) Poi ú ë û éfrPoi WijTr ù ê ú ³ 0, "i, j, r Î 1, 2,..,r êW ú êë ij r fr I ûú

 ì ï ïxˆ (t ) = Ai xˆ (t ) + Biu (t ) + K i (y (t ) - yˆ(t )) (7) THEN í ï yˆi (t ) = C i xˆ (t ), i = 1, 2,..., r ï ï î

where the premise variables z (t ) are independent from the estimated state variables, K i , i = 1,..., r are the observation gain matrices and yˆ(t ) is the estimated output. Then, the final outputs of the fuzzy observer are inferred as follows: .

r

i

i

i

(8)

r

+åhi (z (t )) K i (y (t ) - yˆ(t )) i =1

r

åhi (z (t ))C i xˆ (t )

(9)

i =1

By substituting (3) and (9) into (8), we obtain: xˆ (t ) = åhi (z (t ))(Ai xˆ (t ) + Biu (t )) i =1 r

r

+ ååhi (z (t )) h j (z (t )) K iC j (x (t ) - xˆ (t ))

(10)

r

åh (z (t ))h i

i =1

j

(z (t )) éê(Ai - K iC j ) xˆ (t ) ë

(11)

The controller is also based on the estimate of the state, i.e., we have: r

u (t ) = -åhi (z (t )) Fi xˆ (t )

(12)

i =1

Using (12) instead of (4) in (2) we obtain: .

r

r

ååh (z (t ))h i

j

(z (t ))(Ai x (t ) - Bi Fj xˆ (t )) (13)

i =1 j =1

Defining the estimation error as x = x - xˆ, substracting (13) from (11), we obtain: .

x (t ) =

r

r

ååh (z (t ))h i

j

(z (t ))(Ai - K iC j ) x (t )

and (14)

i =1 j =1

The design of the fuzzy observer is to determine the local gains K i using stability conditions proposed in [16], such that the estimation error tends to zero. Hence, the observer dynamics is stable if there exist positive definite matrices Po1, Po2 ,..., Por and matrices K 1 , K 2 ,..., K r .such that the following is satisfied: Poi > 0, r

åf P

r or

r =1

G jj = Aj - K jC j .

and

Wij r = xr (Ai - K iC j ) . These inequalities can be

recast in terms of LMIs by the following changes of variables:

{

} {

Xoi = aij Xoj s.t aij = 1/ aji "i, j Î 1, 2,..., r

{

K i = bij K j s.t bij = 1/ b ji "i, j Î 1, 2,..., r

{

N i = K iC i Xoi "i Î 1, 2,..., r

} and i ¹ j

} and i ¹ j

}

and i ¹ j , can be chosen heuristically according to the considered application. aij and bij must be different of xr is obtained from h i (z (t )) . Subsequently, we will

+Biu (t ) + K iC j x (t )ùû

x (t ) =

G jk = Aj - K jC k ,

(19)

.

that can be written as .

where

} ",

from 1 (for i = j ; aij = bij = 1 ). However, selection

i =1 j =1

xˆ (t ) =

{

(18)

The coefficients aij , bij and fr for i, j , r = 1, 2,..., r

r

.

} s.t. j < k

Poi = Xoi-1 "i Î 1, 2,..., r

åh (z (t ))(A xˆ (t ) + B u (t )) i =1

yˆ(t ) =

{

"i, j , k Î 1, 2,.., r

(17)

T

IF z1 (t ) is M i 1 and... z p (t ) is M ip

xˆ (t ) =

T

i = 1, 2,.., r

+ (G Tjj Poi + PoiG jj ) < 0,

consider that the premises variables do not depend on the estimated states xˆ (t ) . IV. SEPARATION PROPERTY OF OBSERVER/CONTROLLER By augmenting the states of the system with the state estimation error, we obtain the following 2n dimensional state equations for the observer/controller closed-loop system: é.ù é(A - B F ) ù éx ù Bi Fj i j êx ú ê i úê ú r r = h h ê . ú åi =1å j =1 i j ê ú êxú ê 0 (Ai - KiC j )úûú êëxúû (20) ëê ëê ûú éx ù 0ùú êxú (21) û êë úû To show that the whole system above is stable, we must show that the separation property holds. For this purpose we suggest to extend the separation property principle proposed by Ma et al. in their paper [26] to the nonquadratic design that we propose in [16]. We have to ⋅ construct a comparison system w = Aw, where A is y = éê åri=1hiC i ë

(15)

function of gi and g i , i = 1, 2, 3, 4 . Then using the

i, j = 1, 2,.., r (16)

vector comparison principle, we can obtain that the whole system is globally asymptotically stable. The

~

8 __________________________________________________________________________________________________________ Volume 4, Number 2, October 2011

separation property is expressed by the following theorem: Theorem 1: [26]: If there exist two scalar functions ~

V (x ) : Rn  R and V (x ) : Rn  R and positive ~

~

~

~

x1 (t ) = x 2 (t ), x 2 (t ) =

~

real numbers g1 , g 2 , g 3 , g 4 , g 1 , g 2 , g 3 and g 4 such that:

g1 x

2

£ V (x ) £ g 2 x 2 , g 1 x

¶V ¶x

(x ) r

~

¶V (x ) ~

~

¶x

~ 2

~

~

£ V (x ) £ g 2 x ~

~

~ 2

r

ååm m (A - B F ) x £ -g i

j

i

i

j

3

x 2,

i =1 j =1

r

(22)

(23)

r

ååm m (A - K C ) x

~

i

j

i

i

j

~

~ 2

£ -g3 x

i =1 j =1

~

¶V (x ) ¶V (x ) ~ ~ £ g4 x , £ g4 x ~ ¶x ¶x ~

(24)

Then, the whole system is globally asymptotically stable. Hence, this Theorem shows that the fuzzy controller and the fuzzy observer can be designed to be stable independently and the whole system that is fuzzy controller and fuzzy observer is still stable. This Theorem is extended to the case of non-quadratic stability conditions where ~

V (x ) = åri=1hi x T Px and V (x ) = åri=1hi x Poi x . The i ~

~T

~

principle of this method is to find the scalars g1, g 2 , ~

~

~

~

g 3 , g 4 , g 1, g 2 , g 3 and g 4 that satisfy inequalities (22-

24) and then to satisfy the following inequality: éV⋅ (x (t ))ù é g3 ag 2 éV (x (t ))ù ê ú ê- 2 g2 2 g3 4~g1 ùú éêV (x (t ))ùú ê ú ê ⋅ ú£ê = A ~ ú ê ú ê ~ ~ ú (25) ~ ~ ê ~ ~ ú ê g3 ú ê ú ê ú V x V x 0 ( ) ( ) ~ ê V x ú ê úû êë úû g 2 úû êë ëê ( ) ûú ë where é- g3 ê 2 g2 A=ê ê 0 êë

ù ú ú - úú û a g 42 ~ 2 g3 g1 ~ g3 ~ g2

(26)

is a stability matrix. Hence the construction of the ⋅ comparison system w = Aw, which is obviously globally asymptotically stable and the use of the vector comparison principle, allow us to verify that the whole system (20) is globally asymptotically stable (the proof is given in [26]). In this paper, we consider the continuous case and all results are easily transposable to the discrete case. We also note that the separation property is not applicable for the case when z (t ) is replaced by zˆ(t ) . V. DESIGN EXAMPLE This part presents the design example that illustrates the effectiveness of the proposed controller-observer design with the separation property principle for the stability checking of the global system, i.e. fuzzy model, fuzzy controller and fuzzy observer. The inverted pendulum on a cart equations of motion are [6]:

g sin (x 1 (t )) - amlx 22 (t ) sin (2 (x 1 (t ))) / 2 4l / 3 - aml cos2 (x 1 (t ))

-

a cos (x 1 (t )) 4l / 3 - aml cos2 (x 1 (t ))

(27)

u (t )

where x 1 (t ) denotes the angle (in radians) of the pendulum from the vertical and x 2 (t ) is the angular velocity, g = 9.8 m/s 2 is the gravity constant, m is the mass of the pendulum, M is the mass of the car, 2l is the length of the pendulum, u is the force applied to the cart (in Newton) and a = 1/ (m + M ) . For the simulations, the values of the parameters are m = 2.0 kg , M = 8.0 kg , 2l = 1.0 m. The system (27) is modeled by the following two fuzzy rules: Rule 1 : IF x 1 (t ) is about 0 .

THEN x (t ) = A1x (t ) + B1u (t ), y (t ) = C 1x (t ), Rule 2 : IF x 1 (t ) is about  p / 2

( x1

< p / 2)

.

THEN x (t ) = A2x (t ) + B2u (t ), y (t ) = C 2x (t ),

where 0 1ù é 1ù ú ú, A = ê é ù 2g ê ú ,C 1 = C 2 = ëê1 0úû 2 ú 0 0ú 2 ê ú 4 / 3 p b l aml ( ) û ë û 0 é ù 0 é ù  ú ú , B2 = ê B1 = êê ê- a b ú , b = cos (88 ), - a ú êë 4l / 3-aml b 2 úû ë 4l / 3-aml û The control objective for this example is to balance the inverted pendulum for the approximate range é 0 A1 = êê 2g êë 4l / 3-aml

(

)

x1 Î -p / 2, p / 2 by using our fuzzy controller. The

PDC control laws are as follows: Rule 1 : IF x 1 is about 0 THEN u (t ) = -F1xˆ (t ) p Rule 2 : IF x 1 is about  THEN u (t ) = -F2xˆ (t ) 2 whereas the observer rules are: Rule 1 : IF x 1 is about 0 .

THEN xˆ (t ) = A1xˆ (t ) + B1u (t ) + K 1C 1 (x (t ) - xˆ (t )) Rule 2 : IF x 1 (t ) is about 

p 2

.

THEN xˆ (t ) = A2xˆ (t ) + B2u (t ) + K 2C 2 (x (t ) - xˆ (t )) Hence the control law that grantees the stability of the fuzzy model and the fuzzy observer system is given by:

u (t ) = -h1 (x1 (t )) F1xˆ (t ) - h2 (x 1 (t )) F2xˆ (t )

(28)

where h1 and h2 are the membership values of triangular form of rules 1 and 2, respectively. Applying our approach, the objective of balancing and stabilizing the pendulum and the estimation process are reached with success for different initial conditions of

(

)

x 1 (0) Î -p / 2, p / 2 and x 2 (0) = 0 . we consider two

cases:


A. With pole placement We choose the closed-loop eigenvalues éê-3.0 -5.0ùú ë û for A1 - B1 .F1 and A2 - B2 .F2 , we then have: F1 = éê-645.8824 -160.0000ùú , ë û 3 é ù F2 = 10 ê-4.6525 -1.5279ú , ë û and the closed-loop eigenvalues éê-50.0 -60.0ùú for ë û A1 - K 1 .C 1 and A2 - K 2 .C 2 , and we have: T

K 1 = éê110.0000 174.4694 ùú , ë û T é K 2 = ê110.0000 321.5121ùú . ë û

Hence, for f1 = f2 = 0.5 and x11 = -0.0637, x12 = 0.0637, x21 = -0.0637, x22 = 0.0637 and for

the separation property design is very flexible since it do not depends on the stability conditions but directly on the Lyapunov functions. In fact, the separation property proposed by [24] depends on the stability conditions, it requires to find a positive l such that the bloc diagonal matrix P satisfies the quadratic stability conditions of the augmented system and is given by P = dig[lP, Po ] where P and Po are respectively the positive definite matrices of the controller and the observer. Applying Jadbabaie approach, the following results (Fig. 3 and Fig. 4) are obtained for the same pole placement. l > 0.0381 é 4.2519 0.1328ù ú > 0, P = 103 êê ú êë0.1328 0.1381úû é 1.0315 -0.1496ù ú>0 Po = 103 êê ú 0.1496 0.0714 ëê ûú

T

0.6

x1(t) and x 1est(t)

the initial condition x (0) = éê p / 6, 0ùú , we obtain the ë û following results: For the controller design: 0.1270ù ú > 0, 0.1262úú û 0.1267 ù ú>0 0.1259úú û

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5 time

3

3.5

4

4.5

5

x2(t) and x 2est(t)

0

-1

1000

800

600

200

0

-200

0

0.5

1

1.5

2

2.5 time (sec)

3

3.5

4

4.5

5

Fig. 2: Inverted pendulum control evolution with pole placement. x 1(t) and x 1est(t)

0.6

0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5 time

3

3.5

4

4.5

5

0.5 x2(t) and x 2est(t)

g 3 = 5.0038 ´ 10 , g 4 = 4.3820 ´ 10 . Comparing our results with those obtained for the same example with a pole placement in [24], our results are very interesting, since on one side the stability design that depends on non-quadratic stability conditions is less conservative [16] and on the other side

0.2

1

~

g 1 = 1.3174 ´ 104 , g 2 = 4.4097 ´ 108 ,

0.4

-0.2

g 3 = 1.2411 ´ 108 , g 4 = 4.3764 ´ 108 , 8

400

~

positive real numbers obtained from simulation and that satisfy the inequality (22-24), their values are: g1 = 4.6738 ´ 107 , g 2 = 4.6039 ´ 108 ,

~

0.5

Fig. 1: Inverted pendulum performance with pole placement

~

6

0

-0.5

-1.5

principle, having two scalar functions V (x ), V (x ) and

~

0

0.5

For the observer design: é 7.3509 -1.0523ù ú > 0, Po1 = 107 êê -1.0523 0.5042 úú ëê û é 7.3102 -1.0432ù ú>0 Po 2 = 107 êê ú êë-1.0432 0.5000 úû Fig. 1 and Fig. 2 show the closed loop behavior of the fuzzy controller and the fuzzy observer, for respectively, the inverted pendulum position, velocity and control force evolution of the closed loop system. The stability of the whole system (fuzzy controller - fuzzy observer fuzzy model) is verified applying the vector comparison

~

0.2

-0.2

u(t)

é3.0097 P1 = 108 êê 0.1270 ëê é2.9974 P2 = 108 êê êë0.1267

x(t) xest(t)

0.4

0 -0.5 -1 -1.5 -2

Fig. 3: Inverted pendulum performance with pole placement: Jadbabaie approach

10__________________________________________________________________________________________________________ Volume 4, Number 2, October 2011

900

3500

800

3000

700 2500

600 2000

u(t)

u(t)

500

400

300

1500

1000

200 500

100 0

0

0

0.5

1

1.5

2

2.5 time

3

3.5

4

4.5

-500

5

B. Without pole placement For f1 = f2 = 1, x21 = -0.0064,

x11 = -0.0064, x22 = 0.0064 ,

x12 = 0.0064, a12 = 0.4,

a21 = 1/ a12 and b12 = 1.2, b21 = 1/ b12 , we obtain the following P1, P2 , F1, F2 , Po1, Po2 , K 1 and K 2 for the T

initial condition x (0) = éê p / 3 0ùú ë û

:

1.5

2

2.5 time

3

3.5

4

4.5

5

x(t) xest(t)

10

é0.6131 0.2181ù ê ú ê0.2181 0.0794 ú > 0 úû ëê

8 6 4 2 0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5 time

3

3.5

4

4.5

5

10

5

0

-5

-10

Fig. 7: Inverted pendulum performance without pole placement: Jadbabaie approach. 4

2.5

T

x 10

T 2

~

8

g 4 = 2.0327 ´ 10 ,

1

0.5

-6

g 1 = 1.6547 ´ 10 , ~

1.5 u(t)

Also very good results are obtained for the stability of the whole system which is checked applying the vector comparison principle, and the positive values are: g 2 = 2.0100 ´ 108 , g1 = 0.0868, g 3 = 0.1475, 8

1

12

K 1 = éê9.6602 24.7890ùú , K 2 = éê5.9401 12.6046ùú ë û ë û

~

0.5

Fig. 5 and Fig. 6 show the closed loop behavior of the whole system, for respectively, the inverted pendulum position, velocity and control force evolution for the initial condition above. However, with Jadbabaie approach, the following results given in Fig. 7 and Fig. 8 are obtained without pole placement with a l that tends to infinity.

x2(t) and x 2est(t)

é0.6122 0.2216ù ú > 0, P = P1 = êê 2 0.2216 0.0878úú ëê û F1 = éê-937.3591 -294.8718ùú , ë û 3 é ù F2 = 10 ê-6.0636 -2.0828ú ë û é 0.1459 -0.0034 ù ú > 0, Po1 = êê ú êë-0.0034 0.0095 úû é 0.0425 -0.0061ù ú>0 Po2 = êê ú 0.0061 0.0062 ëê ûú

0

Fig. 6: Inverted pendulum control evolution without pole placement.

Fig.4: Inverted pendulum control evolution with pole placement: Jadbabaie approach

x 1(t) and x 1est(t)

-100

~

g 2 = 1.9823 ´ 10 , g 3 = 0.0053, g 4 = 1.9619 ´ 108.

0

0

0.5

1

1.5

2

2.5 time

3

3.5

4

4.5

5

Fig. 8: Inverted pendulum control evolution without pole placement: Jadbabaie approach.

x1(t) and x 1est(t)

1.5 x(t) x est(t)

1

0.5

0

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.5

1

1.5

2

2.5 time(sec)

3

3.5

4

4.5

5

0.5

x 2(t) and x 2est(t)

0 -0.5 -1 -1.5 -2

Fig. 5: Inverted pendulum performance without pole placement

According to these results, we can conclude that the proposed approach in this paper based on non quadratic stability conditions [16] and a vector comparison principle [26] presents very good performances (especially for case B) comparing to Jadbabaie approach [24] that is based on quadratic stability conditions, what makes it conservative. According to the control action of Fig. 2 and Fig. 4, we have a fast stabilization. Moreover, the proposed separation property of Jadbabaie becomes complex in the presence of several stability conditions what brings one to the Ma et al. separation property.


VI. CONCLUSION In this paper a new fuzzy design procedure for fuzzy observer is discussed. The non-quadratic stability conditions also developed in [16] are used for the stabilization of Takagi-Sugeno fuzzy models; they are based on fuzzy Lyapunov functions and fuzzy state feedback laws. The controller and the observer are designed separately. The fuzzy controller guarantees the stabilization of the T-S fuzzy model, whereas the fuzzy observer guarantees that the estimation error for states converges to 0. However to check the stability of the whole system that comprises the fuzzy controller, the fuzzy observer and the fuzzy model, we applied a separation property based on a vector comparison principle. This principle conceived in the beginning for quadratic Lyapunov functions, is adapted in this paper to the case of non-quadratic Lyapunov functions, so as to obtain less conservative results. The design example allows us to assess the performances of the new proposed observer/controller design and to check the truth of the separation property and hence to prove the effectiveness of our proposal that is less conservative and a very flexible design. REFERENCES [1] L. X. Wang and J. M. Mendel, "Fuzzy basis functions,universal approximators and orthogonal leastsquares learning", IEEE Transactions on Neural Networks, vol. 3, pp. 807-814, 1992. [2] L. X.Wang, "Adaptive fuzzy systems and control: Design and stability analysis", Englewood Cliffs, NJ: Prentice Hall, 1994. [3] T. Takagi and M. Sugeno, "Fuzzy identification of systems and its applications to modelling and control", IEEE Transactions on Systems, Man and Cybernetics., vol. SMC-15, no. 1, pp. 116-132, January/February 1985. [4] B. Kosko, "Fuzzy systems as universal approximators", IEEE Transactions on Computers, vol. 43, no. 11, pp. 1329-1333, November 1994. [5] A. Sala and C. Arino, "Fuzzy logic controllers are universal approximator", IEEE Transactions on Systems, Man and Cybernetics, part B, vol. 37, no. 3, pp. 727-732, June 2007. [6] K. Tanaka and H. O. Wang, "Fuzzy control systems design and analysis. A linear Matrix Inequalities approach", John Wiley & Sons Inc, 2001. [7] K. Tanaka and M. Sugeno, "Stability analysis and design of fuzzy control systems", Fuzzy Sets and Systems, vol. 45, no. 2, pp. 135-156, 1992. [8] Hua O. Wang, K. Tanaka and M.F. Griffin, "An approach to fuzzy control of nonlinear systems: Stability and design issues", IEEE Transactions on Fuzzy Systems., vol. 4, no. 1, pp. 14-23, February 1996. [9] N. Goléa, A. Goléa and I. Abdelmalek, "Takagi-Sugeno fuzzy systems based nonlinear adaptive control", International Journal on Fuzzy Systems, vol. 8, no. 2, pp. 106-112, June 2006. [10] S. Zhou, J. Lam and W. X. Zheng, "Control design for fuzzy systems based on relaxed nonquadratic stability and H∞ performance conditions", IEEE Transactions on Fuzzy Systems, vol. 15, no. 2, pp. 188-199, April 2007.

[11] J. Zhao, “System modeling, identification and control using fuzzy logic”, Phd. Thesis, CESAME, UCL, Belgium, 1995. [12] K. Tanaka, T. Ikeda and Hua O.Wang, "Robust stabilization of a class of uncertain nonlinear systems via fuzzy control: Quadratic stabilizability, ∞control theory, and linear matrix inequalities", IEEE Transactions on Fuzzy Systems, vol. 4, no. 1, pp. 1-13, February 1996. [13] Hua O. Wang, K. Tanaka and M. Griffin, "Parallel distributed compensation of nonlinear systems by Takagi and Sugeno’s fuzzy model", Proceeding of 4th IEEE International Conference on Fuzzy Systems, Yukohama, Japan, pp. 531-538, 1995. [14] L. K.Wong, F. H. F. Leung and P. K. S. Tam,"Lyapunov-function-based design of fuzzy logic controllers and its application on combining controllers", IEEE Transactions Industries Electronics, vol. 45, no. 3, pp. 502-509, June 1998. [15] K. Tanaka T. Hori and H. O.Wang, "A multiple Lyapunov function approach to stabilization of fuzzy control systems", IEEE Transactions on Fuzzy Systems, vol. 11, no. 4, pp. 582-589, Aug. 2003. [16] I. Abdelmalek, N. Goléa and M. L. Hadjili, "A new fuzzy Lyapunov approach to nonquadratic stabilization of Takagi-Sugeno fuzzy models", International Journal of Applied Mathematics and Computers Sciences, vol. 17, no. 1, pp. 101-113, 2007. [17] Y. Blanco, W. Perruquetti and P. Borne, "Non-quadratic stability of nonlinear systems in the Takagi-Sugeno form". Proceeding Of European Control Conference, ECC, Porto, Portugal, pp.3917-3922, 2001. [18] M. L. Hadjili, "Fuzzy logic in nonlinear modeling and control", Phd. Thesis, CESAME, UCL. Belgium, November 2002. [19] E. Kalman, "On the general theory of control systems", Proceeding of IFAC. Butterworths, London, vol. 1, pp. 481-492, December 1961. [20] V. Utkin and S. Drakunov, "Sliding mode observers. Tutorial", Proceeding of 34th IEEE Conference on Decision and Control. Anchorage, AK, USA, vol. 4, pp. 3376-3378, September 1995. [21] S. Nicosia and A. Tornambe, "High-gain observers in the state and parameter estimation of robots having elastic joints", Systems and Control Letters, vol. 113, pp. 331337, 1989. [22] K. Tanaka, T. Hori and Hua O. Wang, "A fuzzy Lyapunov approach to fuzzy control system design", Proceeding of American Control Conference, pp. 47904795, 2001. [23] M. Bernal and P. Hušek, "Non-Quadratic performance design for Takagi-Sugeno fuzzy systems", International Journal of Applied Mathematics and Computer Sciences, vol. 15, no. 3, pp. 383-391, 2005. [24] A. Jadbabaie, "Robust non-fragile controller synthesis using model-based fuzzy systems: a linear matrix inequality approach", Master Science in Electrical Eng. University of New Mexico, Albuquerque, October 1997. [25] A. Jadbabaie, A. Titli and M. Jamshidi,"Fuzzy observerbased control of nonlinear systems", Proceeding of 36th, Conference on Decision and Control, San Diego, California, USA, December 1997. [26] X.-J. Ma, Z.-Q. Sun and Y.-Y. He,"Analysis and design of fuzzy controller and fuzzy observer", IEEE Transactions on Fuzzy Systems, vol. 6, no. 1, pp. 41-51, February 1998. [27] K. Tanaka, T. Ikeda and H. O. Wang, "Fuzzy regulators and fuzzy observer: Relaxed stability conditions and LMI-

12__________________________________________________________________________________________________________ Volume 4, Number 2, October 2011

based designs", IEEE Transactions on Fuzzy Systems, vol. 6, no. 4, pp. 250-265, May 1998. [28] S. boyd, L. El Ghaoui, E. Feron and V. Balakrirshman, "Linear Matrix Inqualities in Systems and Control Theory", SIAM Studies in Applied Mathematics, Philadelphia, PA:SIAM, 1994. [29] K. Tanaka,T. Hori and H. O. Wang, "New parallel distributed compensation using time derivative membership functions: a fuzzy Lyapunov approach", Proceeding of 40th IEEE Conference on Decision and Control. Orlando, Florida USA, vol. 4, pp. 3942-3947, 2001.

[30] G.Feng, S.G. Cao, N.W. Rees and C.K. Chark, "Design of fuzzy design control systems with guaranteed stability", Fuzzy Sets and Sysems, vol. 85, pp. 1-10, 1997. [31] K. Tanaka, M. Sano,"On the concept of regulator and observer of fuzzy control systems", Proceeding of 3th IEEE International Conferece on Fuzzy Systems, vol. 2, pp. 767-772, 1994. [32] X.-J. Ma, Z.-Q. Sun,"Analysis and design of fuzzy reduced-dimensional observer and fuzzy functional observer", Fuzzy Sets and Systems, vol. 120, pp. 35-63, 2001.


Controller Designing and Parameter Tuning in the First Order plus Large Delay Time System DAXIAO Wang*, YINGJIAN Xu, WEI Zhang and LEIJUN Xiang Shanghai Jiao Tong University, China Department of Automation and Key Laboratory of System Control and Information Processing, Ministry of Education of China, 800 Dongchuan RD. Minhang District, 200240 Shanghai, China E-Mail1: jqrs1203@ gmail.com (DAXIAO Wang); [email protected] (YINGJIAN Xu); [email protected] (WEI Zhang); [email protected] (LEIJUN Xiang); *Corresponding author: Phone: +86-21-34204019; E-mail: [email protected]

Abstract – Based on the traditional PID controller structure, we proposed a new controller which can be handle the large time delay systems magnificently. In the meantime, we provide an easy to understand and pragmatic strategy to tuning the parameters of the controller. To inspect and verify our idea, we apply this method in the large dead-time system and the common employed systems, and compared it with the ideal PID structure and the relative PID tuning methods. Keywords: PID; Large time delay; Ordinary system; Parameter tuning I. INTRODUCTION In automatic control theory and the application, controller designing is the most important objective. Moreover, since the proportional integral derivative (PID) controller applied in industrial system and proposed in theory, PID is widely applied in various fields. Today, although different and more advanced controllers, such as robust controller, adaptive controller and model predictive controller, are proposed in theory and some are applied or tested in real systems well, more than 90% controllers in the industrial systems are PID controller. Thus, it is valuable to research the application of the PID in different situation and different forms. In the study, to simulate the real condition of the controlled plants accurately, people introduce the time delay part into the mathematical model of the systems. However, the large time delay part is difficult to deal both in theory and in the real system, which leads to many researchers devoted in this field and proposed different methods to dispose this problem. In recent years, for instance, Mohammad Bozorg and Edward J. Davison [1] analyze influence of the uncertain time delays in the time delay systems. Jing Zhou [2] employed appropriate Lyapunov-Krasovskii functional to compensate the large time varying delays in the adaptive control system. Wenjian Cai et al. [3] proposed a new technique of adaptive PI controller for large dead time processes with sensitivity specification, which does

not need any prior knowledge of the process and previous controller while the control loop is still in the normal operation. To ensures asymptotic stability and the robust performance of the closed loop system for all admissible uncertainties, Shengyuan Xu et al. [4,5] proposed different methods to deal with the system with time varying delays. Tan et al. in literature [6] extends the standard internal model control (IMC) controller from stable processes to unstable processes and the set point tracking performance and disturbance rejection can be designed separately. Last decades, some researchers and engineers studied the IMC and PID controllers to solve the time delay systems. Lee and his co-researchers [7-10] did great work in PID controller and parameters tuning methods in time delay systems. Juan Chen et al. [11] based on the IMC proposes a new method to design Smith delay compensation decoupling controller which work well in multiple input multiple output (MIMO) first order plus dead time non-square systems. In literature [12], M. Ramasamy and S. Sundaramoorthy proposed ‘impulse response’ instead of ‘step response’ of the plant and used the Maclaurin series of the desired closed loop transfer function to design PID controller and present an effective PID parameters tuning strategy. Lo et al. extended the continuous time self-tuning control algorithm to systems with unknown or varying time delay [13]. Peiying Chen, Weidong Zhang and Liye Zhu [14] based on the Pade approximation theory and IMC to solve the dead time system. S. Alcantara, C. Pedret, R. Vilanova and Weidong Zhang addressed the set point robust PID tuning for stable first order processes with time delay (FOPTD) from a general minmax model matching formulation [15]. Tan et al. [16] proposed an accurate but too complex approach to control the time delay system, which impede the spread of this method in industrial plants. To learn more about the research papers about the delay time systems and people who did great work in this field, readers can study the literature [17]. As a useful and convenient method to control the industrial systems, PID can and will still work in many work space. Therefore, in this thesis, we study the


traditional PID and proposed a new PID based on the IMC theory [18] and the Maclaurin-PID (MacPID) [9] to solve the large time delay and small time constant in the FOPTD system, in which the classical PID performs drastically deviate the desired target. This article is organized as follows. In section 2, we provide a brief presentation about the achievement of the former researcher in the FOPTD field. Then in the following section - part 3, we propose our idea and describe the detailed information of this method. In the section 4 and 5, we present the detail procedures of the controller design and the tuning rule under the two different situations and compared the results of our idea with the other three strategies, and then give our conclusion about this research work. II. BRIEF LITERATURE REVIEW As shown in Fig. 1, it is the classical control system in which the symbol r refers the reference signal of the system, e is the deviation between the desired value and the output y of the system that is e(t )  r (t )  y (t ) ,

u is the output of the controller, d represents the disturbance of the system. And C ( s ) , G ( s ) and Gd ( s ) indicate the controller, plant and disturbance transfer function, respectively. PID is one of the most popular controllers in the industrial field. The ideal PID type is written as,

1 de(t ) e(t )dt  Td ) (1)  Ti dt where K c , Ti and Td indicate the three parameters, u (t )  K p (e(t ) 

proportional parameter, the integral time constant and the derivative time constant, of the PID controller. And with the help of Laplace transform, we can obtain the PID controller in the frequency domain as,

C ( s )  K p (1 

1  Td s ) Ti s

(2)

Although many engineers know the different structures of the PID controller, they can’t tune the three parameters, and indeed, testing and trailing is the strategy for them to use. Luckily, Morarl [18] and Lee [9] give us two simple and effective approaches to tune the parameters. In the research paper [9], Lee et al. based on the IMC controller and the Maclaurin series to express the controller as,

f (s) 1  [ f (0)  f ' (0) s  f '' (0) s 2 / 2 (3) s s (n) n  ... f (0) s / n !]

C ( s) 

To form the PID structure, we can neglect the other parts and just consider the first three fractions of the Maclaurin series. Then, we have,

C (s) 

f (s) 1  [ f (0)  f ' (0) s  f '' (0) s 2 / 2] (4) s s

Fig. 1. The unity feedback control system.

And then, compared the parameters of equation (2) and (4), we can derived the PID parameter tuning strategy,

 K p  f ' (0 )  '  Ti  f ( 0 ) / f ( 0 )  T  f '' ( 0 ) / 2 f ' ( 0 )  d

(5)

If readers are interesting in the detail of this method, you can refer to [9,18] III. PROPOSED STRATEGY Firstly, we give a definition of large time delay system. Definition 1. In FOPTD system series, the large time delay system is one the ratio of the dead time in the transfer function to the time constant is bigger than 10. Then, based on Definition 1, in research we find that the MacPID can satisfy most situations in the industrial system except the system with large time delay ones. Based on the work of predecessors, we proposed an effective and efficient methods to solve this problem, and our rules are easy to understand and the excellent feature of this method is it not only suit the large dead time system, but also work well in the ordinary systems. Secondly, we express the controlled plant as G ( s )  G ( s )G ( s ) , where G ( s ) is the minimum phase system whose poles and zeroes are on the left half domain, and G ( s ) refers the non-minimum phase system in which contains delay time part or right half domain poles or zeroes. Furthermore, used the IMC controller design method, we can obtain the controller of the system is,

C ( s) 

G1 ( s ) ( s  1) n  G ( s )

(6)

where ( s  1) represents the filter and  is called the closed loop time constant which can be adjust to meet the desired performance of the whole system, and the number n is depend on the degree of the controller to guarantee the degree of the denominator is not less than numerator. To deal with the large time delay part, we introduced the lag function and rewrite the controller as, n


C ( s) 

F ( s) s ( s   s  1)

(7)

2

A.

Detail of the parameters tuning rule

F ( s )  f (0)  [ f ' (0)   f (0)]s  [ f '' (0)

Based on the FOPTD model, the proposed controller parameter tuning rule can be calculated depend on the procedure of the section 3. Firstly, form equation (8), we can yield the IMC controller as,

 2 f ' (0)  2  f (0)]s 2 / 2  [ f ( 3) (0) 

C ( s) 

And then, we can apply the Maclaurin series to spread out the function F ( s ) as,

3 f (0)  6 f (0)]s / 3! [ f ''

4 f

(3)

'

3

(4)

(0) 

(0)  12  f (0)]s / 4! [ f ''

4

(5)

(8)

(0)

Form equation (8), to guarantee the controller integrity and calculate the parameter of the lag transfer function, we have,

 f ( 4) (0)  4 f (3) (0)  12  f '' (0)  0  (5) (4) (3)  f (0)  5 f (0)  20  f (0)  0

(9)

And then, we can obtain a new form of the controller as,

C ( s )  { f (0)  [ f ' (0)   f (0)]s  [ f '' (0)  2 f ' (0)  2  f (0)]s 2 / 2

(10)

 [ f (3) (0)  3 f '' (0)  6 f ' (0)]  s 3 / 3!} / s ( s 2   s  1)

To imitate the classical PID controller and convenient for engineers and other researchers to apply our method, our group defined,

 K p  f ' (0)   f (0)  Ti  [ f ' (0)   f (0)] / f (0)   T  [ f '' (0)  2 f ' (0)  2 f (0)]  d  '  / [ f (0)   f (0)] T  [ f (3) (0)  3 f '' (0)  6 f ' (0)]  s  / [ f ' (0)   f (0)]

(11)

Thus, the controller is in the following structure,

K p (1  1/ Ti s  Td s  Ts s 2 )

 s2   s  1

(12)

IV. SIMULATION AND DISCUSSION To examine and verify the proposed approach, we consider the first order processes with time delay system which is widely adopt in various researches to imitate the real industrial plants. Thus, we have,

K  s (13) G ( s)  e Ts  1 where K , T and  are the gain of the system, the time constant and the delay time constant of the system, respectively.

(14)

And then, we can define,

f ( s)  C ( s)  s 

 5 f (4) (0)  20  f (3) (0)]s 5 / 5! where f ( s )  C ( s )  s .

C ( s) 

Ts  1 K (( s  1) n  e  s )

(Ts  1) s K (( s  1) n  e  s )

(15)

Secondly, employ the Maclaurin series to extend the equation (15) and just consider the first six items. Thirdly, from equations (9-11), we can calculate the value of parameters  and  , and provide a series of four parameters tuning rule.

  2 (60 2T  6 2  2  180 2T 2  240T 2  12 3  60T 2 2   4 )    60(24 2T  60 2T 2  18T 2  4 3   120T 2  6 3T  60T 2 2  3 2 2   4 )   (24 2T 2  18T 2  6T 2 2    9 2T  6T 2   2  2 )   24 2T  60 2T 2  18T 2  4 3   4    120T 2  60T 2 2  3 2  2  (16) And

  T 2 Kp    K (    ) 2 K (   ) 2   2  T   T   i 2(   )    2 (3   )   Td  [   T  6(   )   (2T  2 T   2 ) 2 1   ]  2    4( ) Ti  2 2   (12  4   )  Ts  [ T  24(   )    2 (3   )(2T  2 T   2 )   12(   ) 2   2 2   (2  6T  6 T   ) ]1  24(   )3 Ti B.

(17)

Simulation and discussion

To advocate our idea, we choose one of the first order processes with large time delay system as,

G ( s) 

5 e 20 s 0.1s  1

(18)


4 Z-N C-C MacPID Proposed

3

Step Response

2

1

0

-1

-2

-3 0

10

20

30

40

50 60 Time (Second)

70

80

90

100

Fig. 2 The step response of the large time delay system.

1.4 Z-N C-C MacPID Proposed

1.2

Step Response

1

0.8

0.6

0.4

0.2

0

0

50

100 Time (Second)

Fig.3 The step response of the ordinary time delay systems.

150


In this thesis, we mainly consider the ZieglerNichols (Z-N) method [19], Cohen-Coon (C-C) rule [20] and MacPID to compare their effects with the proposed strategy. And the details of the parameters of the PID under different methods are shown in Table 1. Moreover, in the MacPID and proposed methods, we choose the closed loop time constant   0.4 (If readers want to know the detail of how to choose this value you can study the paper [9] and [18].), and in the proposed rule, the parameters of the lag part in the controller are   14.6440 and   1.0787 . As shown in Fig. 2, we can obtain that the performance of these four methods to deal with the large time delay systems is extremely different. Firstly, in the large dead-time systems, the Z-N method is almost futilely. Secondly, the C-C approach can work in the large delay system but have a longer tuning time. Thirdly, the classical MacPID may lead the system in to uncertainty direction. However, the proposed strategy can work well in the large time delay system. TABLE 1. Parameters of the Controllers.

Parameters K Ti Td Ts p Methods Z-N 0.0012 44 10 C-C 0.0553 16.9421 0.1805 MacPID 0.0517 7.2429 0.5682 Proposed 0.0594 8.3662 3.1953 3.9798 Now that our idea matches the large time delay system well, let’s test and verify whether it can be applied in the ordinary time delay systems. Thus, we choose the controlled plant as,

G ( s) 

5 10 s e 3s  1

(19)

In the same way, we calculate the parameters of the controllers and the values are shown in Table 2. Furthermore, in the MacPID method we choose the parameter   0.45 , and in our idea, we choose   0.45 and calculate the parameters as   2.9420 and   0.8618 . And the results of the simulation in this situation are shown in Fig. 3. From Fig. 3, we can find that in the ordinary situation, the proposed strategy still work well and not poorer than classical MacPID approach, which means that our idea can be applied widely in different conditions and the performance of the proposed idea will provide desired effects. TABLE 2. Parameters of the Controllers in the ordinary time delay systems

Parameters Methods Z-N C-C MacPID Proposed

Kp

Ti

Td

0.0720 0.1350 0.0889 0.1008

22 13.8889 6.4483 7.3043

5 2.2200 1.6657 2.6329 2.7212

Ts

V. CONCLUSIONS An effective method is proposed based on the classical PID structure to deal with the large dead-time system, and which works well not only in the large time delay systems, but also in the ordinary time delay systems. To verify our idea, we compared our idea with the traditional methods, Z-N, C-C and MacPID, and obtain good results in the research. The goal of every new technique is to be used in the real plants, and our one is no exception. So the next work for us is to test and certify this strategy in the industrial systems, and in the future, we will extend this idea into other systems and devoted ourselves in the control theory. REFERENCES [1] M. Bozorg and E. J. Davison, “ Control of time delay processes with uncertain delays: Time delay stability margins”, Journal of Process Control, vol. 16, Issue 4, pp. 403-408, April 2006. [2] J. Zhou, “Decentralized adaptive control for large-scale time-delay systems with dead-zone input”, Automatica, vol. 44, pp. 1790-1799, 2008. [3] Y. Wang, X. Xu and W. Cai, “Robust Adaptive PI Controllers for Large Dead-time Processes with Sensitivity Specification”, pp. 724-729, June 2010, (2010 the 5th IEEE Conference on Industrial Electronics and Applications (ICIEA), Taichung). [4] S. Xu, J. Lam and T. Chen, “Robust H∞ control for uncertain discrete stochastic time-delay systems”, Systems & Control Letters, vol. 51, pp. 203-215, March 2004. [5] S. Xu, J. Lam and Y. Zou, “New results on delaydependent robust H∞ control for systems with time-varying delays”, Automatica, vol. 42, pp. 343-348, February 2006. [6] W. Tan, H. J. Marquez and T. Chen, “IMC design for unstable processes with time delays”, Journal of Process Control, vol. 13, pp. 203-213, 2003. [7] Y. Lee and S. Park, “PID Controller Tuning To Obtain Desired Closed Loop Responses for Cascade Control Systems”, Ind. Eng. Chem. Res., vol. 37, pp.1859-1865, 1998. [8] Y. Lee, M. Skliar and M. Lee, “Analytical method of PID controller design for parallel cascade control”, Journal of Process Control, vol.16, pp.809-818, 2006. [9] Y. Lee, S. Park, M. Lee and C. Brosilow, “PID Controller Tuning for Desired Closed-Loop Responses for SI/SO Systems”, AKhE Journal, vol. 44, pp.106-115, January 1998. [10] Y. Lee, J. Lee and S. Park, “PID controller tuning for integrating and unstable processes with time delay”, Chemical Engineering Science, Vol. 55, pp.3481-3493, September 2000. [11] J. Chen, Z. Hea and X. Qia, “A new control method for MIMO first order time delay non-square systems”, Journal of Process Control, vol. 21, pp. 538-546, April 2011. [12] M. Ramasamy and S. Sundaramoorthy, “PID controller tuning for desired closed-loop responses for SISO systems using impulse response”, Computers and Chemical Engineering, vol. 32, pp.1773-1788, 2008. [13] W. L. Lo, A. B. Rad and C. K. Li, “Self-tuning control of systems with unknown time delay via extended polynomial identification”, ISA Transactions, vol. 42, pp. 259-272, 2003. [14] P. Chen, W. Zhang and L. Zhu, “Design and Tuning Method of PID Controller for A Class of Inverse Response Processes”, Proceedings of the 2006 American Control Conference Minneapolis, Minnesota, USA, June 2006.


[15] S. Alcantara, C. Pedret, R. Vilanova and W. D. Zhang, “Setpoint-oriented robust PID tuning from a simple min-max model matching specification”, pp. 1-8, 2009 (ETFA 2009. IEEE Conference on Emerging Technologies & Factory Automation, Sept. 2009 Mallorca). [16] K. K. Tan, T. H. Lee and X. Jiang, “On-line relay identification, assessment and tunging of PID controller”, Journal of Process Control, vol. 11, pp. 483-496, 2001. [17] J Richard, “Time-delay systems:an overview of some recent advances and open problems”, Automatica, vol.39, pp.1667-1694, 2003.

[18] D. E. Rivere, “Manfred Morarl and Siguard Skogestad, Internal Model Control. 4. PID Controller Design”, Ind. Eng. Proc. Des. Vol.25, pp.252-265, 1986. [19] J. G. Ziegler and N. B. Nichols Rochester, “Optimum Settings for Automatic Controllers”, Transactions of the American Society of Mechanical Engineers, vol.64, pp.759768, November 1942. [20] G. H. Cohen, G. A. Coon, “Theoretical consideration of retarded control”, Trans. ASME, vol.75, pp.827-834, 1953.


Computer Analysis of Doppler Signals DRĂGHICIU Nicolae, POPA Sorin University of Oradea, Romania Department of Electronics, Faculty of Electrical Engineering and Information Technology, Universitatii Street.1, 410087 Oradea, Romania, E-Mail: [email protected]

Abstract: The paper presents solutions used to procces a sonogram, which it is determined with accuracy at the start of the new heart cycle. The device aplication performances are based on a simple non-directional CW Doppler unit and its capabilities to interconect with a computer over an interface. Keywords: hearth rate, threshold voltage, Doppler shift frequency, Cw systems. I. INTRODUCTION Doppler device function by transmitting a beam of ultrasound into body,and collecting and analyzing the returning echoes.Ultrasound that it been use in Doppler application is the longitudinal waves. The source is normally a transducer in which the vibrating element is a piece of piezoelectric ceramic or plastic driven by an appropriate voltage signal. In most Doppler techniques [1] the frequencies employed is in the range 110MHz,respectively the wavelength 0.7-0.07mm in soft tissure. From the point of view of Doppler techniques the parameters that describe a wave: amplitude, frequency and phase are important as in pulse echo imaging. Indeed frequency and phase are more importanr for Doppler methods since the phase of the scattered beam , from a moving target:

fv f  f1  f r  2 1 * cos  were c

f - Doppler

shift frequency, f i -transmitted frequency,

f r -received frequency,

 -velocity of a source or target, c-velocity of sound in tissue,  -angle between a velocity and direction of wave propagation. II.SYSTEM DESCRIPTION AND COMPUTER ANALYSIS Doppler instrument generate either continuous wave (CW) or pulse wave (PW) ultrasound. In later case

the pulses vary in length from 2 to 10 cycles depending on the design of the instrument . Typically the intensities generated by CW and PW devices are 100 and 400mW

cm 2 . CW systems are simpler of the two,but have some advantages over PW systems an are therefore still widely used even in sophisticated instruments.CW Doppler units both transmit and recive ultrasound continuosly ,and because of thid it is necessary to use separate transmitting and receiving crystaks ,althought these are usually housed in the same probe. A block circuit diagram of a simple nondirectional CW Doppler unit is shown in fig.1,below,which produce a audio signal on output and offers the possibility of further processing.

Fig. 1. Block diagram circuit of a simple non-directional CW Doppler unit The master oscillator usually produces a frewuency of between 2 and 10 MHz.The frequency chosen depends on the depth of interest for superficial vessels frequency of around 8MHz are used, and for deep vessels frequency as loe as 2 MHz.The transmitting amplifier amplifies the oscillatins and the output used to drive the transmitting crystal (AT).The crystal converts the electrical energy into acoustic energy,which is then propagated as a longitudinal wave into the body ,and it will be reflected and acetterd by both moving and stationary particles. The transmitter is a aerodynamic transduce and it looks like a microphone used for recording audio, as represented in fig. 2. It consists of a handle textolit form cylinder, in front inwards, containing an aluminum capsule. As we see in the next paragraph and interior design details in the capsule is mounted second piezoelectric ceramic form of semicircles with a millimeter thick.


Fig. 2. Transducer structure Piezoelectric ceramics electrodes contacts are made with thin wire, which are then captured on a plate with four contacts textolit[2]. On these contacts to catch a two-wire cord and each screen isolated separately. Shielded cord out of the capsule passes through the plastic handle is longabout 1.5 meters. At the end of a cord connector is set to make contacts in ceramics, through the cord to reach the device for BCF receicer. It consists of the following components: two ceramic piezoelectric form of semicircles, with 1 mm thickness and diameter of 20 mm. These ceramic piezo transducer mounted together in our form basically two transducers, one transmitter and one receiver, they are united in a single mechanical mount inside the capsule. The piezoelectric ceramics surface is deposited on a layer of silver and is practical a electrode posterior face of the ceramic. In this case, the two ceramic electrodes will be 4 which is 4 pole power. It is important that these four electrical contacts to be made on the underside of the piezo-ceramics. Since the outer surface of the silver ceramics, electrical contact with a silver paste is passed through the side to the island of isolation, thus achieving contact front behindceramics. You can get the two electrical contacts activation from the two electrodes of ceramics on one of the surfaces.

Piezoelectric ceramics have both a spacer material between them in terms of ultrasonic absorption. Ceramics are then glued on a cover that comes in contact textolit intimate with the human body. Textolit cover is glued on a plastic ring mounted in an aluminum capsule. In the inside of the aluminium capsule are four wires that are thin contact of ceramics and silver electrodesshielded cord that goes to the device. In terms of electric aluminum capsule is earthed electrical device via shielded cord. Combination of electrical conductors shielded cord is thin and the insulating plate containing her four independent switch. Aluminum housing is designed for shielding electrical wires inside them, and the air trapped behind ceramics and capsule them until her ass is a good absorber of ultrasound generated by the back of piezoelectric ceramics. Electrical and physical characteristics of the two piezoelectric ceramic in the transducer are same. Piezoelectric ceramics have the property that when their surface is pressure and the material is deformed to cause a voltage. Conversely, applying a voltage to this material is shortened or lengthen depending on current polarity. This phenomenon is known as the piezoelectric effect[3]. Piezoelectric effect occurs in some ceramics policristalizate such as titanate barium titanate zirconatul lead, and lead metaniobiatul. Transducer ceramics tested by us are made of lead zirconate titanate lanthanum, has the following percentage composition: Pb - 0.94, La - 0.06, Zr - 0.64 Ti - 0.36, O3. These ceramic crystals containing randomly oriented areas in which the distribution pregnancy form a dipole. Above a certain temperature, the areas can be oriented by applying a bias voltage and this causes the material to become piezoelectric. Transducer ceramics were 1 mm thick and 20 mm diameter, which makes that they work, to oscillate at a frequency of 2 MHz. The working frequency range and measured exactly parallel to the two ceramic is: (1) fs = 1827 (0.2 dB) fp = 2152 (10 dB)

(2) fs = 1869.6 (0.2 dB) fp = 2094.9 (13 dB)

In our case we use the resonant frequency transducer series. It is worth mentioning that the processing technology [4] seeks to obtain two ceramic series resonant frequency as close as you two ceramic use pairs: one for transmitting and one for reception. Some transducers use a single ceramic transmitter and receiver role, making the electronic switching. Electromechanical coupling coefficient of ceramics has the formula: Fig. 3. Electrical activation contacts This settlement is important contacts to have a flat surface previous ceramics, without wires, which enables it to be applied to the human body.

Kp=

Wm where We

Wm - the mechanical energy developed ceramic We - the energy received or disposed of piezoelectric ceramics. Kp - is calculated using the formula:


2

Kp=

p 2 f 4  p 2 f x (1+ x +…) 4 fs 4 fs

In our case Kp1 = 0.57 and Kp2= 0.50 and fp - the parallel resonant frequency fs - the resonant frequency range  f = f p - fs Depending on the amplitude of the signal, these frequencies are reprezented in fig.4.

Fig. 4. Frequency variation Transducer operation is the basic principle Doppler effect[4,5]. This principle seemed to be based on the incoming wave and reflected frequencies are equal if the reflecting surface is not moving component in the direction of propagation of ultrasound. Movement reflective surface to determine the source reflected compression wave length and vice versa. Since the propagation speed is constant, these changes in wavelength produce corresponding changes in frequency. A small part of the beam finds its way back to the reciving crystal, which re-converts the acoustic energy into electrical energy. This small signal is then amplified by a radio frequency (RF), mixed with a reference signal from the master oscillator[7]. The low pas filter (LPF) removes all signal outside the audio range ,and this leaves anly the Doppler difference frequency, which is amplified and used to drive a loudspeaker ar geadphone,or sent for further processing.

There are now a range of methods available for obtaining a pictorial record of the Doppler shift signal, of which the best and commonly used is real/time spectral analysis. The output of spectrl analyzers is usually represented as a sonogram (fig 5).The horizontal axis represent time (t[s]), the vertical axis the Doppler shift frequency(f[Hz]), and the intensity of each pixel the power of the signal at the corresponding frequemcy and time. To do this computer analysis it is necessary to introduce the image of the sonogram into the computer[6]. If the Doppler unit, on which we do the investigation of one artery, has a computer interface or a magnetic disk to record the sonogram the transfer process into the computer it will be much easyer. If not all Doppler unit print the image on a paper. This image it will be scan with a resolution higher than 600 dpi and saved into computer in a file with “bmp” or “tif” etension.The analysis of this image it was make with MATLAB program and it has four steps. First step: The operator defines a backgroung rectangle in terms of its top-left and bottom-right corners,so that it lies entirely outside those areas of the sonogram corresponding to flow and covers approximately one heart cycle ,were his width gives first estimation of heart cycle’s duration d HC .This is the only instance where human intervention is required by the analysis-software. The pixel values inside this rectangle give the mean level of noise from the sonogram. Second step: We do the maximum frewuency envelope of the signal .This will be obtained by starting at the top of each spectral line and aliding down. When we found at five pixels whose values exceed a mean level of noise, we will register and sart a new spectral line. This envelop it is show in the fig 6a) The maximum frequency envelop is then smoothed with median filter,which is effective in eliminating spikers, as in fig 6.b) Third step: The start og the systolic upstroke of each heart cycle is identified by means of the local curvature of the maximum frequency envelope. Local curvature has been used extensively in omputer visioan to locate dominant points of an object’s shape.In our implementation the loval curvature is the convolution product.Convolution profuct that we use in this work represent the product of one window of (2n+1)-points whit the smoothed envelope.The function thet describe this windiw is specified below :

 1 pt , j  n h(j)= 0 pt , j  n 1 pt , j  n Fig. 5. Sonogram of the Doopler signal.

the formula that describes this convolution product is the next one:


n

y(k)=

 h( j ) x(k  j ) *

where

j 1

:y(k)-the resulted matrix fig.6.c); h(j)-filter function,the window function ; x(k)-the signal that we process; n-the number that give as the length of the window n=0,008d DC Four step: The start of the systolic upstroke of each cycle is located by searching for the maximum among the first c o d HC pixels of the difference-of-slopes sequence. Subsequent systolic upstrokes are identified by searching for the maximum value in the range of [c 1 d dc c 2 d DC ] pixels from the last found systolic upstroke.The results of this search it is plot as vertical markers on the maximum frequency envelope fig.6.d).Typical parameters values for this part of the processing are :c 0 =1,5; c 1 =0,5; c 2 =1,5.

III. CONCLUSIONS In earlier application it has been use the ECG, recorder which is make the synchronization to know the value of the length of hear cycles.In our implementation the length of the heart cycles it’s been approximate with that rectangle from the start of the software ,and for this we don’t needed the ECG for exterior synchronization.The noise robustness of this software it is very good because we estimate the mean level of noice ,and this value it is very important parameters on this implementation. This software it is indicated to use on sonogram segments between heart cycles reamin relatively stable, in terms of both duration and content,otherwise which have been aligned correctly on the basis of theis systolic upstroke may represent unrelated haemodynamic conditions at later stages of the heart cycles. The serch for the start of the systolic upstroke on heart cycle wih this implementation it is more to make because its redce the examine time of the patient ,its reduce human error and especially its grow the accuracy of the measurements and results. REFERENCES

Fig. 6. The results obtained during this computer analysis

[1] Haga Y., Esashi M., - “Biomedical Microsystems for Minimally Invasive Diagnosis and Treatment”, Proceedings of IEEE, vol. 92, no. 1, ianuary 2004, pp. 98-114. [2] D.Andrew. “3D Modelling and Visualization in Computed Tomography”. AVTK’s Approach with TCTCL Scripts. Matrix, Buc. 2004. [3]D.H.Evans,W.N.McDchen,R.Skidmore,P.WoodcockDopplerultrasound:Physici, instumentatin, and clinical applications-Chichester:Wiley 2001 [4] Herman S., - “Physical principles of modern medical equipment”,Teora, Buc. 2003. [5] Ramon P., – “Medical electronics”, Ed. Matrix, Buc 2006. [6] T.Loupas-Analysis of the early diastolic notchIn:Ultrasound in Medicine & Biology, Vol. 21 No. 8,2006 [7] Boahen K., - “Neuromorphic Microchips ”, Scientific American, vol. 292, no. 5, may 2005,pp56-63.


Design of an Interior Permanent-Magnet Synchronous Machine for an Integrated Starter-Alternator System Used on an HybridElectric Vehicle FILIP Andrei-Toader, HANGIU Radu-Petru, MARŢIŞ Claudia, BIRO Karoly-Agoston Technical University of Cluj-Napoca, Romania, Department of Electrical Machine and Drives, Faculty of Electrical Engineering, Memorandumului, 28, 400114, Cluj-Napoca, Romania, E-mail: [email protected]

Abstract  Nowadays, to reduce fuel consumption, we use more often vehicles with hybrid propulsion using for traction an electric motor and the regular combustion engine. There are three types of hybrid vehicles: serial, parallel and mixed propulsion. Hybrid vehicles use Integrated Starter Alternator (ISA) system instead of usual starter and alternator. This article points out the advantages of using an Integrated Starter Alternator System in comparison with the classical starter and alternator. This system saves energy by using the stop/start function and providing assistance during driving. For the study, a permanent magnet synchronous motor was chosen, due to its high efficiency. Keywords: hybrid Alternator, PMSM.

vehicles,

Integrated

Parallel Hybrid Vehicles are most commonly used today, having connected to the transmission system both an internal combustion engine and an electric one. In most cases a system consisting of an electric generator and an electric motor located between the internal combustion engine and transmission replacing the conventional system made of starter and alternator is used.

Starter-

I. INTRODUCTION

Fig. 1 : Parallel Hybrid Vehicle

Nowadays the use of a vehicle has become a necessity, but given the limited nature of oil resources automotive engineers looked for alternative solutions to this type of fuel. The best solution was the electric motor, to the attention of researchers for some time. Electric vehicles however had some problems which prevented their on a large scale use: autonomy was less than 100 km, the rapid change of speed regimes excluded, and charging a long process implied. However the saving idea was the use hybrid vehicles with mixed propulsion. In this case the classical internal combustion engine has a much lower power and electric motor is supplied from a battery and the transmission is common.

To store power, the vehicle uses a battery with a higher voltage than the 12 V used in traditional cars. Accessories such as automatic gearbox and air conditioning are connected to electric motors, instead of being connected to the combustion engine. This can provide increased efficiency such accessories running at constant speed which does not depend on the speed of rotation of the internal combustion engine.[17],[1]. Serial Hybrid Vehicles are driven only by electricity. Unlike the internal combustion engine, electric one has a big power / heavy ratio, producing a torque appropriate for a wide range of speeds. At this type of hybrid vehicles, trains an electric generator the combustion engine so an electric generator is not directly connected to vehicle wheels. This generator supplies energy to the electric motor that provides traction. Briefly, the serial hybrid vehicles have a simple structure, the drive being made with an electric engine and a generator supplies the necessary power. This type of arrangement is common in diesel locomotives and ships. Ferdinand Porsche used this gear in the early twentieth century to race cars, virtually inventing the serial hybrid vehicles. Using a blockmotor gear type using one engine on each of the front wheels got record speed. At that time it was used the

II. TYPES OF HYBRID VEHICLES Hybrid vehicles are classified according to the division of power between the two sources. Both can operate in parallel, propulsion being made by operating both engines, in serial when only one engine is used to drive and the other to get an additional power, or mixed when the vehicle is powered by one of the engines, the second can also participate directly in the drive, if necessary.[17],[1].


term "electric drive" because the electric generator and traction motor replaced the mechanical traction. However, the vehicle was moving only with the combustion engine running.

Fig. 2 : Serial Hybrid Vehicle

Currently, this kind of electric traction is used to replace mechanical traction. Nowadays, a serial hybrid vehicle type is characterized by: - electric traction only, using one or more electric motors for propulsion; - the combustion engine trains only the electric generator; -generator acts as starter; regenerative braking involves partial recovery of the vehicle's kinetic energy and stocking it in various forms (charging a battery or a reservoir of gas or liquid under pressure) so that, ultimately, can be used later. - can be connected to the mains supply for charging.[17] The Series-Parallel Hybrid system uses the electric motor to drive the vehicle at low loads and low speeds and the combustion engine when loads and speeds increase. A control unit determines the best balance of power to achieve the most efficient vehicle operation. The electric motor and the gasoline engine can work individually, or together, depending on the power required to drive the vehicle. In addition, as the system drives the wheels, the combustion engine drives a generator to simultaneously generate electricity to recharge the battery when necessary.[17]

Fig. 3 : Serial-Parallel Hybrid Vehicle

III. STARTER –ALTERNATOR SYSTEM The recent year’s trend tends to improve comfort and decrease fuel consumption. Thus significantly the number of auxiliary equipment that replaces mechanical or hydraulic elements has increased. Thus power consumption per vehicle has increased, which means that the generators of the vehicle must produce more power. So, the new ISA

System on 42-V Power Net is a good alternative. Researchers claimed that ISA can provide 20-25% fuel savings. Besides meeting the vehicle electrical load demand, ISA can help the vehicle making important fuel economy, good efficiency and low emissions. [14] The ISA drive system has two or more power sources for propulsion: an Internal Combustion Engine and a secondary power source (battery, electrical generator, fuel cell). This two power sources combined provides a higher motor efficiency and allows the vehicle to accelerate quickly when running on low speed. The Internal Combustion Engine has a good efficiency at high speed by running the motor alone at low speed and the combustion engine alone at high speed. When the ICE is driving alone the vehicle, especially at high speed, the traction motor has an alternator role charging the battery and supplying with electrical energy all the electrical equipments on the vehicle. [1],[11],[9] In classic cars, the electric power system consists of two separate devices: starter (electric motor) and alternator. Conventional starter is a car collector and DC permanent magnet which causes the internal combustion engine through a mechanical gear. Conventional alternator (limited to 2 kW) is operated by a belt, providing energy to the whole system. Latest innovation in the field replaces the two essential devices with a classic car in one, ISA device. This system automatically turns the engine off when the vehicle comes to a stop and restarts it instantaneously when the accelerator is pressed so that fuel isn't wasted for idling. In addition, regenerative braking is often used to convert mechanical energy lost in braking into electricity, which is stored in a battery and used to power the automatic starter. ISA was first used around 1930. Based on the electromechanical conversion reciprocity principle researchers first used a DC electric machine to perform two functions: take action when starting the combustion engine and to generate electric power. If the electric generator is replaced with an alternator with electronic rectifier we obtain ISA system, often used in modern vehicles. [7]. The Integrated Starter Alternator System is well adapted to urban traffic stopping and restarting when we stop at traffic lights, providing extra power when accelerating and also providing regenerative braking during deceleration. [12]. In Fig. 4 a classic clutch with two separate electric machines, starter and alternator, is exemplified. Internal combustion engine starter acts through a mechanical gear and belt driven alternator distribute energy to the vehicle. Fig. 5 presents a direct drive solution; starter and alternator are integrated into a single electrical machine. The machine is mounted coaxial with the shaft drive combustion engine, i.e. between it and the transmission system (clutch, gearbox). An ISA system functions first as a starter (at startup), then the electrical


machine will work in generator mode, providing electricity. Thus, the two features, power generation and the start of the vehicle are made with the same electrical machine, replacing the two existing electrical machines on a classic car.[18]. Seering wheel Alternator

Combustion Engine

Starter

Gearbox

Clutch

Fig. 4 : Conventional Starter and Alternator

Fig. 5 : Starter Alternator Integrated System

This new device has several improvements including starter mode for the usual combustion engine and DC generator mode for battery charging (replacing the regular alternator). It also can provide extra power to the combustion engine to obtain better acceleration and can operate as an electric motor for electric propulsion, reducing the system cost. Using ISA to provide superior acceleration allows faster and smoother acceleration, reduced fuel consumption; emissions are lower, thus increasing the total efficiency of the vehicle. The ISA machine is required to provide high cranking and launch assistant torque at motoring state and supplying constant voltage to charge the battery at wide speed range. This configuration when which has a ratio of 1:1 eliminate some mechanical elements (belt transmission gears and other mechanical) whose function is taken over by the car axle starter alternator rotor. Using this type of configuration can cause problems due to different requirements of the two modes of operation of the electric car. Both regimes are harsh: as motor when high torque is required at startup, as well as generator when the electrical machine runs at high speeds, at constant power and weak excitation flux.[18]. IV. ISA SYSTEM CONFIGURATION The ISA electric drive subsystem consists of an electric machine and power electronics box. The power

electronics box is composed by an inverter/rectifier and a bidirectional DC/DC converter. Unlike other components that are connected to the network, ISA is connected to the battery without a stabilized output voltage. Its available voltage may be below motoring requirement while its generating DC bus voltage must be higher than battery voltage. The ISA machine has to enable to charge battery and operate electric loads with nearly constant voltage at the speed range corresponding to the engine speed from 600rpm to 6000rpm. The speed/voltage requirement for this system does not permit to use a machine with low flux weakening capability unless is added an expensive DC/DC converter to reach the DC bus voltage charging requirements.[12]. V. BEST CHOICE FOR ISA Numerous studies have been made to determine the most suitable motor to be used for an ISA application. The system can be mounted directly on the crankshaft, or driven with a belt. Accordingly, they are named belt-driven alternator starter (BAS) and normal ISA, respectively.[1] The induction motor (IM) and the PM synchronous motor (PMSM) are the most attractive candidates for automotive application. The PMSM is more expensive than the induction machine but has higher efficiency. Ford and General Motors are more inclined toward the IM, whereas Toyota and Honda have chosen the PMSM.[8]. The PMSM is one of the most attractive candidates for ISA since it processes high efficiency and reliability. The great advantage of this machine is high efficiency due to absence stator coil losses. Interior permanent magnet synchronous machine (IPMSM) has good efficiency when high power density and high speed operation is possible with weakening flux at constant power. Another advantage of the PM motors is that the air gap can actually increase because the permeability of the magnets is almost equal to the permeability of air. Permanent magnet motors have the disadvantage that the flux control strategy is very complex. Using a new type of excitation (in addition to the classic electrical machine coils, permanent magnets appear) the flux control and control of the airgap flux density can be achieved only by changing the value of current through the winding. The fact that the magnets are buried in the rotor core makes the IPMSM a hybrid machine, the torque being produced both by the permanent magnets and the variable reluctance of the magnetic circuit. This fact gives freedom in designing the SAI, (in particular it makes possible to obtain a wide range of speeds at constant power). These benefits are an important factor in choosing IPMSM as a solution for ISA. PMSM can be classified according to several criteria, such as shown in fig. 6.[12],[10],[2]. From the efficiency point of view, the PMSM machine with 8~12 poles is preferable. The most common PMSM are the surface mounted permanent magnets synchronous machine with distributed or concentratedwindings.[7].


Surface PM motor The surface magnet motor can have magnets magnetized radially or sometimes circumferentially. An external high conductivity nonferromagnetic cylinder is sometimes used. It protects the PMs against the demagnetizing action of armature reaction and centrifugal forces, provides an asynchronous starting torque, and acts as a damper. If rare-earth PMs are used, the synchronous reactances in the d- and q-axis are practically the same.[7],[3],[2]. VI. ANALYTICAL APPROACH OF THE IPMSM Based on a review of the state-of-the-art related to ISA systems an IPMSM was chosen. The design starts with the set of initial data that includes the input parameters presented in Table 2. TABLE 2 : Motor main dimensions Fig. 6 : PMSM Classification

DC voltage [V] Rated power [W] Pole pair number Rated speed [rpm] Number of phases Torque[Nm]

TABLE 1 : IPMSM vs SPMSM

Surface PM

Interior PM

Magnetic flux density in air gap is smaller than the remanent induction

Magnetic flux density in air gap is bigger than the remanent induction

Simple construction

More complicated construction

Low armature reaction

Bigger armature reaction,consequently requiring a more expensive converter

PM are not protected against the fluxes from the armature

PM are protected against the fluxes from the armature

Eddy currents losses in PM

No eddy currents losses in MP

42 5000 12 400 3 120

The chosen topology corresponds to the topology presented in Fig. 7, with 48 stator slots and 24 rotor poles. The outer diameter of the stator core results from the output power equation[16]:

P

1 m f ke ki k p B A 02 D02 Le 1  k m1 2 p

(1)

where k is the ratio for electrical loading on rotor and stator, m1 the number of phases of each stator, m number of phases of the machine, ke is the emf factor,

Interior-type PM motors The interior-magnet rotor has radially magnetized and alternately poled magnets. Because the magnet pole area is smaller than the pole area at the rotor surface, the air gap flux density on open circuit is less than the flux density in the magnet. The synchronous reactance in d-axis is smaller than that in qaxis since the q-axis magnetic flux can pass through the steel pole pieces without crossing the PMs. The magnet is very well protected against centrifugal forces. Such a design is recommended for high frequency high speed motors.[7],[3],[2].

ki the current waveform factor, k p is the electrical power waveform factor,



the total machine efficiency,

B is the magnetic flux density in the air gap, A is the total electrical loading, f is the frequency, p the number of pole pairs, 0 D0 outer stator diameter and Le effective stack length, and \

0 

D Do

(2)


Vm 

2

k  k ad  (1   )



 f    k lm  Br  H C  m  cos

 k m  P (3)

Chosing the air gap width  =1 mm, the outer rotor diameter and the interior diameter of placing magnets results from ecuations no. (3) and (4):

Dr  Ds  2

(4)

Dim  k1 (1   i ) Dr

(5)

To complete the sizing of the rotor, the last step is to calculate the main dimensions of the permanent magnet : height, width and length, which are calculated with ecuations (5),(6)and(7):

bpm  [1  k1 (1   i )

Fig. 7 : IPMSM Topology

In order to reduce the manufacturing cost of the machine, existing stator lamination geometry was chosen. The topology and main dimensions of the stator lamination are presented in Fig. 8 and given in Table 3. The stack length of the machine results from the volume constraints and corresponds to the value given in Table 3.

hm 

Dr ] 2

 (1   i ) Dr

(6)

(7)

2p lm  L

(8)

The resulted values are given in Table 4, along with the electrical machine parameters and the materials are made the three parts of the PMSM : rotor, stator and permanent magnets. TABLE 4 : Rotor and PM main dimensions and materials Rotor main dimensions

Outside Diameter-Dr[mm] Shaft diameter[mm] Air gap [mm] Number of Slots

138 14 1 24

Magnet main dimensions

Length[mm] Width[mm] Height[mm] Fig. 8 : Stator lamination : topology and dimension TABLE 3 : Stator main dimensions Outside Diameter[mm] Inside Diameter[mm] Number of Slots Motor length[mm]

200 140 48 165

Knowing these values, we started sizing the rotor, obtaining the following values for the main dimensions.We can also see the materials are made the three parts of the PMSM : rotor, stator and permanent magnets For the permanent magnet, no.(1) equation has been used for obtaining the volume for requested output power.

165 4 48

Electrical machine parameters

Stator resistance[Ω] Winding losses[W] Iron losses[W] Mechanical losses[W] Efficiency[%]

0.164 7593 59.421 41.595 55.5 Materials

Stator Core Rotor Magnet

Rotor Core

Category Type Category Type Magnetization Category Type

NiponSteel 50H600.hb TDK-Nd-FeB NEOREC41 Parallel NipponSteel 50H600

VII.CONCLUSIONS This paper provides an overview on the SAI systems, presenting the advantages of the new integrated


starter alternator against the classical starter and alternator. It has been chose an IPMSM following rigorous selection, and has been starting the design process. The results have been presented at the end of this paper. In the future , I intend to develop 2D and 3D models using JMAG Studio. The numerical analysis of electromagnetic field will be made to validate the analytical results. After that, the electrical machine will be optimized, to obtain better efficiency and torque. ACKNOWLEDGMENT This paper was supported by the project "Improvement of the doctoral studies quality in engineering science for development of the knowledge based society-QDOC" contract no. POSDRU/107/1.5/ S/78534, project co-funded by the European Social Fund through the Sectorial Operational Program Human Resources 2007-2013. REFERENCES [1] T. Moldovan , M.M.Radulescu , G.Cimuca “Simulation of an Integrated Starter-Alternator System for newgeneration autovehicles”, Proceedings of the 4th Int. Conf. on Electromechanical and Power Systems, SIELMEN'04, Chisinau, 26-27 sept. 2003 [2] Carrillo E.L. Arroyo, the thesis in Electrical Engineer  Modeling and Simulation of PM synchronous motor drive system , University of Puerto Rico Mayaguez Campus, 2006. [3] J.F. Gieras, M. Wing – “Permanent Magnet Motor Technology”, Ed. Marcel Dekker, New York, 2002 [4] D.Iles-Klumpner : Phd thesis Automotive Permanent Magnet Brushless Actuation Technologies, Technical University from Timisoara, 2005. [5] I. Cioc, N. Cristea, N. Bichir “Masini electrice, Indrumar de proiectare” Ed. Scrisul Romanesc, Craiova 1985. [6] N. Vasile, S. Slaiher, “Servomotoare Electrice” , Ed. Electra , Bucuresti 2003. [7] Dr.W. Cai “Comparison and Review of Electric Machines for Integrated Starter Alternator Applications”

[8] H. Rehman, “An Integrated Starter–Alternator and LowCost High-Performance Drive for Vehicular Applications “IEEE Transactions on Vehicular Technology ,vol. 57, No. 3, May 2008,pp. 1454-1465. [9] C. Liu, K. T. Chau, J.Z. Jiang “A Permanent-Magnet Hybrid Brushless Integrated Starter–Generator for Hybrid Electric Vehicles”, IEEE Transactions on Industrial Electronics, Vol. 57, No. 12, December 2010 ,pp. 40554064 [10] D. Hagstedt, F. Márquez, M. Alaküla “A comparison between PMSM, EMSM and SMSM in a BAS application, “Proceedings of the 2008 International Conference on Electrical Machines Paper ID 1323 [11] I.A. Viorel, L. Szabo, L. Lovenstein, C. Şteţ “Integrated Starter Generators for Automotive Applications” I.A. Viorel, L. Szabo, L. Lovenstein, C. Şteţ, CNAE Volume 45, Number 3, 2004 [12] L. Alberti, M. Barcaro, M. Dai Pré, A. Faggion, L. Sgarbossa, N. Bianchi, S. Bolognani, “IPM Machine Drive Design and Tests for an Integrated Starter– Alternator Application”. IEEE Transactions on Industry Applications, Vol. 46, No. 3, May/June 2010 pp.993-1001. [13] M. Zeraoulia, M. El Hachemi Benbouzid, D. Diallo, “Electric Motor Drive Selection Issues for HEV Propulsion Systems: A Comparative Study” [14] F. Caricchi, F. Crescimbini, F. Giulii Capponi , L. Solero “Permanent-Magnet, Direct-Drive, Starter/Alternator Machine with Weakened Flux Linkage for Constant-Power Operation Over Extremely Wide Speed Range”, 2001 [15] J.Colton,Dissertation: “Design of an Integrated StarterAlternator for a Series Hybrid Electric Vehicle.” [16] S. Huang, J. Luo, F. Leonardi, T.A. Lipo, “A General Approach to sizing and Power Density Equation for Comparison of Electrical Machines”, University of Wisconsin –Madison, June 1996. [17] J.G. Kassakian, H.-C. Wolf, J.M. Miller, C.J. Hurton, “Automotive electrical system circa 2005” IEEE Transactions on Industry Applications, Vol. 46, No. 3, May/June 2010 pp.993-1001. [18] W.L. Soong and N. Ertugrul E.C. Lovelace T.M. Jahns, “Investigation of Interior Permanent Magnet OffsetCoupled Automotive Integrated Starter/Alternator”, IEEE Transactions on Industry Applications, Vol. 1, October 2001, pp 429 – 436.


Application of Simulated Annealing for Tuning of a PID Controller for a Real Time Industrial Process GIRIRAJKUMAR S.M.1, KUMAR Atal. A.2, ANANTHARAMAN N.3 1

M.A.M. College of Engineering, India Department of Electronics and Instrumentation Engineering,Tiruchirappalli-621105, India. Email:[email protected] 2 SASTRA University, India Department of Mechanical Engineering, Thanjavur - 613 402,. Email:[email protected] 3

National Institute of Technology, India Department of Chemical Engineering, Tiruchirappalli - 620 015, India.

Abstract: As part of control and automation Proportional-Integral-Derivative (PID) control schemes have been widely used in most of the control engineering applications including the field of mechanical engineering. A real time mechanical industrial process is considered in the paper for which a controller is designed using non-traditional optimization technique. The stability and performance of this mechanical system is greatly governed by the controller settings. It is seen that the traditionally tuned PID controllers fails to meet industrial requirements in terms of accuracy and precision when it comes to system with varied complexities. Many techniques have been developed to obtain the optimum PID parameters for a particular system. In this report Simulated Annealing (SA) is proposed as a method for PID parameter optimization and its effectiveness is compared with those of traditional PID tuning methods. The proposed approach has superior features, including: easy implementation; stable convergence characteristics; and good computational efficiency. Keywords: Mechanical system, controller, PID tuning, SA algorithms. I. INTRODUCTION PID (Proportional Integral Derivative) controllers are important elements in various engineering application. Approximately 80 to 90 percent of control systems particularly in industrial process are using these controllers. It is well known that fine-tuned PID controllers are functioning properly with Proportional band (PB), Integral time (I), and Derivative time (D). In general, these three parameters are determined according to several Open-Loop Step Response Methods, such as Ziegler-Nichols method [1], Cohen-Coon method [2] , IMC [3] method etc. According to the handbook of PI and PID tuning

rules several sets of PID tuning formulas and methods have been discussed by many researchers. However, little research has been performed on practical PID tuning method based on closed-loop measured data. For a satisfied control performance by the PID control system, an appropriate PID parameter tuning is necessary [4]. In fact, PID parameter tuning depends on operator’s know-how; therefore a PID parameter has not been frequently optimal from the viewpoint of qualities. The above said traditional methods are for the basic PID controller. Therefore, these are not optimal for the practical controller such as PI-D or I-PD controllers including nonlinear characteristics [5]. Optimization algorithms are another area that has been receiving increased attention in the past few years by the research community as well as the industry [6, 7]. An optimization algorithm is a numerical method or algorithm for finding the maxima or the minima of a function operating with certain constraints [5]. Computational intelligence (CI) is a successor of artificial intelligence relying on Evolutionary computation, which is a famous optimization technique. Computational intelligence combines elements of learning; adaptation and evolution to create programs that are, in some sense, intelligent. Computational intelligence research does not reject statistical methods, but often gives a complementary view [9]. Computational intelligence finds its fundamental application in the area of fitness function design, methods for parameter control, and techniques for multimodal optimization. The importance of CI lies in the fact that these techniques often find optima in complicated optimization problems more quickly than the traditional optimization methods [8]. Simulated Annealing (SA) is a derivative-free stochastic search method for determining the optimum solution in an optimization problem. The method was proposed by Kirkpatrick in 1983 and


has since been used extensively to solve large-scale problems of combinatorial optimization. The SA evolves a single solution in the parameter space with certain guiding principles that imitate the random behavior of molecules during annealing process. It is similar to the physical process of heating up a solid until it melts, followed by cooling it down slowly until it crystallizes into a perfect lattice. The objective function here corresponds to the energy of the states of a solid [10]. The SA algorithm requires the definition of the neighborhood structure as well as the parameters for the cooling schedule. The temperature parameter distinguishes between large and small changes in the objective function. An attractive feature of SA is that it is very easy to program and the algorithm typically has few parameters that require tuning. This has led to its vast application in industries and research in recent years [10, 11, and 12]. In the proposed work we compare the time domain specifications, the values of the performance measures like the integral time absolute error (ITAE), the integral of absolute error (IAE), the integral of time weighted square error (ITSE) and the integral of square time error (ISTE) [13, 14] obtained by using the conventional technique i.e. the Ziegler-Nichols method and our proposed method using SA to prove that our method is better than the conventional methods. In the section that follows we have given the explanation of our set up, a view of the conventional methods used, our proposed algorithm, the values obtained, the results and graphs and finally the conclusion.

(RTD) is used to measure temperature in vessel. The range of RTD used is 0-2000C. The output signal is conditioned and converted to a current signal, which is of the standard range of 4-20mA.Here, 4mA corresponds to 00C and 20mA corresponds to 2000C of RTD. The output signal is given to the host computer through the panel board of a Distributed Control System (DCS). Man-Machining Interface allows human interruption in the process whenever necessary through the host computer, which acts as a controller. Then the output signal from the computer is sent to a current to pressure (I/P) converter. The signal is a current signal of 4-20mA, the I/P converter is a device which gives out pressurized air in the range of 3 – 15 psi to the control valve proportional to the current supply given to it. The control valve is of globe type which is used to control the steam supplied to the agitated vessel. The steam is supplied through metal pipes of about 2” diameter. When the pressure of the supply air to the control valve from the I/P converter is 15 psi the valve will be 100% opened and when the pressure is 3 psi the valve will be 0% opened. Thus the valve opening is proportional to the air pressure supplied to it. The steam supplied to the control valve is about 4-5 kgf/cm2, which increases the temperature suitably as per the requirement. The piping and instrument diagram of the process is shown in figure 1.

II. INDUSTRIAL PROCESS BASED CLOSED LOOP SYSTEM The closed loop system that has been considered here is used to maintain the temperature in an agitated vessel. The agitator consists of three paddles which are connected to a vertical shaft. The shaft is connected to an electric motor. The agitator is used to mix two acids namely sulphuric acid (H2SO4) and oleum (H2S2O7). There are three inlets, for the inflow of sulphuric acid, oleum and steam. The steam is used to supply heat to the agitating vessel and maintain it at a temperature of 110-1300C. The steam is supplied from a 10 ton boiler through a regulator which supplies steam at a maximum pressure of about 5kgf/cm2. The mixed acid solution is then sent to the reactor through an overhead pipeline. The steam from the boiler is sent through a pipeline. This line is connected as inlet to the control valve which controls the steam entering the agitated vessel. A resistance temperature detector

Fig 1: Piping and Instrument diagram of the process

The industrial process system is further considered as a closed loop system with the components having the specifications as indicated. Mathematical model for this process is estimated by considering a step change of 10 % to the steam valve, after putting the system in an open loop mode. The response curve was traced, and was found similar to be that of a FOPTD, and the mathematical model was found to be,

G( s) 

0.468e 42 s (328s  1)

(1)


III. NON TRADITIONAL OPTIMIZATION TECHNIQUES The implementation of non-traditional optimization techniques for a process based on temperature as the variable to be controlled in a process industry has been attempted. A PID controller is proposed for the system, which fulfills the need for anticipatory control. PID controllers are also considered more suitable for temperature based processes. The transfer function of the process system based on the operating conditions was estimated as in equation 1.The conventional method chosen for the proposed work is called Internal Model Control (IMC) technique. The various formulae used for this method for tuning the PID controller are given in table 1. Table 1: Tuning rules for IMC technique

Controll er Type

Controller Gain (no units)

Integral Time (seconds)

Derivative Time (seconds)





PID control

process dead time (seconds) process lag time (seconds) K = process gain (dimensionless) used for aggressive but less robust tuning used for more robust tuning Some controller mechanisms use proportional band instead of gain. Proportional band is equal to 100 divided by gain. The values in the table are for an ideal type controller. The controller computes controller gain, integral time, and derivative time using the formulas shown. IV. SIMULATED ANNEALING (SA) SA is a numerical optimization technique based on the principles of thermodynamics. The idea of SA comes from a paper published by Metropolis et al.in1953 [15]. The Simulated Annealing method resembles the cooling process of molten metals through annealing [16-18]. At high temperature, the atoms in the molten metal can move freely with respect to each another. But, as the temperature is reduced, the movement of the atoms gets reduced. The

atoms start to get ordered and finally form crystals having the minimum possible energy. However, the formation of the crystal depends on the cooling rate. If the temperature is reduced at a very fast rate, the crystalline state may not be achieved at all; instead the system may end up in a polycrystalline state, which may have a higher energy state than the crystalline state. Therefore in order to achieve the absolute minimum state, the temperature needs to be reduced at a slow rate. The process of slow cooling is known as annealing in metallurgical parlance. SA simulates this process of slow cooling of molten metal to achieve the minimum function value in a minimization problem. The cooling phenomenon is simulated by controlling a temperature-like parameter introduced with the concept of the Boltzmann probability distribution. According to the Boltzmann probability distribution, a system in thermal equilibrium at a temperature T has its energy distributed probabilistically according to P (E) = exp (- ∆E / kT), where k is the Boltzmann constant. This expression suggests that a system at a high temperature has almost uniform probability of being at any energy state, but at a low temperature it has a small probability of being at a high energy state. Therefore, by controlling the temperature T and assuming that the search process follows the Boltzmann probability distribution, the convergence of an algorithm can be controlled using the Metropolis algorithm. Let us say, at any instant the current point is x (t) and the function value at that point is E (t) = f (x(t)). Using the Metropolis algorithm, the probability of the next point being at x(t+1) depends on the difference in the function values at these two points or on E = E (t+1)–E (t) and is calculated using the Boltzmann probability distribution: P ( E (t+1) ) = min [1, exp ( - E/kT)]. If E  0, this probability is one and the point In the function x (t+1) is always accepted. minimization context, this makes sense because if the function value at x (t+1) is better than that at x(t), the point x (t+1) must be accepted. When E > 0, which implies that the function value at x(t+1) is worse than that at x(t). According to the Metropolis algorithm, there is some finite probability of selecting the point x(t+1) even though it is a worse than the point x (t) . The probability of accepting a worse state is high at the beginning and decreases at the temperature decreases. For each temperature, the system must reach an equilibrium i.e., a number of new states must be tried before the temperature is reduced typically by 10 %. It can be shown that the


algorithm will attain the global minimum and not get stuck in local minima.

Search process was initiated through SA, and the implementation of the algorithm on the respective transfer function model, yielded the results categorized in figures 4 to 6. B. Objective function for the algorithm: The objective functions considered are based on the error criterion. A number of such criteria are available and in this work controller’s performance is evaluated in terms of Integral time absolute error (ITAE) error criteria. The error criterion is given as a measure of performance index given by equation (2): T

ITAE   t e(t ) dt 0

(2) The values of the objective function that has been considered to be the ITAE, was estimated and presented in figure 3 for various iterations at their best values.

Fig 2: Flow chart for SA Algorithm

A. Realization of SA algorithm: The optimal values of the PID controller parameters Kp, Ki, Kd is to be found using SA Algorithm. All possible sets of controller parameter values are particles whose values are adjusted so as to minimize the objective function which in this case is the error criterion, is discussed in detail. For the PID controller design, it is ensured that the controller settings estimated, results in a stable closed loop system. SA algorithm is implanted for the industrial process with the objective of controlling the temperature of the process. The variables associated with the algorithm are presented in Table 2.

Fig 3: Variation of ITAE for 100 iterations

Table 2: Variables associated with SA for industrial process system

Variable

Specification

Range of Kp

0 – 30

Range of Ki

0 – 0.1

Range of Kd

0 – 500

Population size

100

Number of iterations

100

Objective function

ITAE

Fig 4: Variation of Kd for 100 iterations

Fig 5: Variation of Kp for 100 iterations


Based on these responses, the time domain specifications with relevance to the real time data, is noted and they are tabulated and presented in the Table 4. Table 4: Time domain specifications for industrial process system

Fig 6: Variation of Ki for 100 iterations

The ultimate value after implementing SA for the industrial process is found to be as Kp= 17.83,Ki=0.078 and Kd= 269.25. V. RESULTS AND COMPARISON The PID controller is designed for an industrial closed loop process for which the control variable is temperature. The process is allowed to reach the steady state condition at 122ºC , and the PID controller is studied for its response by giving a servo change in the control variable by 2ºC, making the new set point to be 124ºC. The IMC controller is the best among the traditional techniques. Also, the various PID controller parameters considered for analysis in this section are shown in the below Table3. Table 3: Various PID controller parameters for industrial process

Controller parameters Proportional gain, Kp Integral gain constant, Ki Derivative gain constant, Kd

IMC

SA

3.620

17.830

0.0103

0.0780

71.445

269.25

Figure 7 gives the comparative response of IMC and SA based controllers for the industrial process considered.

Fig 7: Comparison of IMC and SA Algorithm

Inverse peak (degree) Inverse peak time(seconds) Rise time (seconds) Peak time (seconds) Overshoot (%) Settling time (seconds)

IMC 119.8

SA 121.52

360

360

2500

1440

2500

1680

1.2

9.0

2500

1860

The robustness investigation for the process is analysed by calculating the performance index to the transfer function model whose parameters are deviated by ±20%. The altered model which possesses the uncertainties is given by,

G (s) 

0.561e 33.6 s (393.6 s  1)

(3)

The calculation of the performance index for the mentioned model with the various PID controllers are tabulated and presented in the Table 5. Table 5: Comparison of Performance indices

Performance indices ITAE IAE MSE ISE

IMC 6.2366e+003 46.3145 1.1502 0.0033

SA 1.1277e+004 33.0220 1.89 4.1546e-006

VI. CONCLUSION The response curve with the IMC controller has a larger negative peak as the delay is not properly taken care, whereas the SA controller is the best, proving to be a better one to achieve the set point. Also, the robustness investigation illustrates the proposed tuning techniques always have a lesser value than the traditional PID controller. SA presents multiple advantages to a designer by operating with a reduced number of design methods to establish the type of the controller, giving a possibility of configuring the dynamic behavior of the control system with ease, starting the design with a reduced amount of information about the controller (type and allowable range of the parameters), but keeping


sight of the behaviour of the control system. Hence the real time closed loop industrial process is tuned in the best possible way by the evolutionary computation method outperforming the conventional tuning techniques. REFERENCES [1] .G. Ziegler and N.B. Nichols, optimum Settings for Automatic Controllers. Trans. ASME 64.00.7591768, 1942. [2] G.H Cohen and G.A Coon: Theoretical Consideration of Retarded Control, Trans ASME 75, pp.827/834, (1953) [3] S. Skogestad, Simple analytical rules for model reduction and PID controller tuning, J. Process Control, 13, pp. 291-309, 2003. [4] Carl Knospe: “PID Control”, IEEE Control Magazine, pp.30-31, 2006. [5] Development of PSO-based PID Tuning Method Akihiro Oi , Chikashi Nakazawa, Tetsuro Matsui International Conference on Control, Automation and Systems 2008 Oct. 14-17, 2008 in COEX, Seoul, Korea. [6] K.E.Parsopoulos and M.N.Vrahatis, “Particle swarm optimizer in noisy and continuously changing environment”, Indianapolis.IN (2001). [7] Hirotaka Yoshida, Kenichi Kawata, yoshikazu Fukuyana, yosuke Nakanishi, “A particle swarm optimization for reactive power and voltage control considering voltage stability”, IEEE international conference on intelligent system applications to power systems(ISAP’99), Rio de Janeiro, April 4-8 1999. [8] K.E. Parsopoulos and M.N. Vrahatis, “Recent approaches to global optimization problems through Particle Swarm Optimization, Natural Computing 1: 235–306, 2002.

[9] Javed Alam Jan and Bohumil Sulc, “Evolutionary computing methods for optimizing virtual reality process models”, International Carpathian control conference ICCC’2002, Malenovice Czech republic, May 27-30. [10] L.R.Varela, R.A.Ribeiro and F.M.Pires, “Simulated annealing and fuzzy optimization”, Proceedings of the 10th Mediterranean conference on control and automation- MED2002,Portugal, July 9-12, 2002. [11] Pierpaolo Caricato and Antonio Grieco, “Using simulated annealing to design a material handling system”, IEEE intelligent systems, 2005. [12] Hsien-Yu Tseng and Chang-Ching Lin, "A simulated annealing approach for curve fitting in automated manufacturing systems", Journal of Manufacturing technology management, Vol 18, No.2 pp. 202-216, 2007. [13] Dorigo M, ManiezzoV, ColorniA. The ant system: optimization by a colony of cooperating agents .IEEE Trans System Man Cybernet B;26(1):29–41,1996. [14] Mwembeshia.M.M, C.A.Kenta, S.Salhi, A genetic algorithm based approach to intelligent modelling and control of pH in reactors, "Computers and Chemical Engineering, vol.28, pp.1743-1757, 2004. [15] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth , Augusta H. Teller and Edward Teller, “ Equation of state calculations by fast computing”, The journal of Chemical Physics, Vol 21, No. 6, June 1953. [16] Mark Fleischer, “Simulated annealing: past, present, and future”, Proceedings of the 1995 Winter Simulation Conference, 1995. [17] Xin Yao, “Simulated annealing with extended neighborhood”, International journal of computer mathematics, 40: 169-189, 1991. [18] S. Kirkpatrick, C.D.Gelatt Jr. and M.P.Vecchi, “Optimization by simulated annealing”, Science, Volume 220, number 4598, 13 May, 1983.


Wavelet Analysis and Park's Vector Based Condition Monitoring of Induction Machines HARLIŞCA Ciprian, SZABÓ Loránd Technical University of Cluj-Napoca, Romania Department of Electrical Machines and Drives, 400114 Cluj-Napoca, Memorandumului 28, Romania; e-mail: [email protected]

Abstract – Condition monitoring is an important issue in the maintenance of electrical machines. In the last years many methods of fault detection were cited in literature. Most of the newly proposed advanced methods are for on-line and non-invasive monitoring. This means that the faults can be detected already in their incipient phase during the usual work of the monitored machines. The aims of this paper is to make a brief review of two nowadays more and more frequently used induction machines' fault detection practices: the wavelet analysis based method and the Park’s vector approach. Keywords: induction machines; diagnosis; fault detection; condition monitoring; wavelet analysis; Park’s vector.

fault, air-gap eccentricity faults, bearing faults, load faults, etc. This method is non-intrusive and can be applied both on-line and in a remotely controlled way. In the paper two of the most frequently used current monitoring methods used in condition monitoring of the squirrel cage induction machines are overviewed. II. WAVELET ANALYSIS BASED MONITORING By its basic definition a wavelet is a particular wave whose energy is concentrated in a specific temporal location. It can be considered as a known signal with some peculiar characteristics. It can be easily used in studying the properties of other signals simultaneously in the frequency and time domains. A typical plot of a wavelet is given in Fig. 1.

I. INTRODUCTION Squirrel cage induction machines are widely used in modern industry applications due to their robustness, low cost, and easy maintenance. Many of its basic components are susceptible to failure. The faults occur in most of the cases in bearings or shafts, stator windings and rotor bars. However, by means of advanced on-line diagnosis methods it is possible to detect the faults in their incipient phase, before the catastrophic effects of the failures on the machine which can cause long time downtimes and huge costs in industrial environment [1]. About 40% of the faults that occur in induction machines are in the bearings, 30 to 40% in the stator, 10% in the rotor and the remaining ones in the auxiliary devices of the machine [2]. Condition monitoring of induction motor requires wide range multidisciplinary knowledge, therefore it is a challenging topics for numerous engineers working in various fields. Many condition monitoring methods are cited in the literature, including vibration, thermal, chemical and acoustic emission monitoring. Unfortunately all these methods require expensive sensors or specialized tools. Therefore the widely used current monitoring methods have a main advantage over those mentioned above: they require only (frequently already existing) simple and cheap current sensors. The current monitoring based techniques can be used to detect almost all of the faults of a squirrel cage induction machine: rotor bar faults, shorted winding

Fig. 1. A typical plot of a wavelet

A wavelet is an oscillation similar to that recorded by a seismograph or a heart monitor. Its amplitude starts out at zero, increases and then decreases back to zero. There are a variety of wavelets that have specific properties, and can be used for signal analysis in various fields of sciences, as computer imaging, climate analysis, heart monitoring, seismic signal denoising, audio and video compression, fast solution of partial differential equations, etc., but also in condition monitoring of rotating machines [3]. There are several common families of wavelets:  Wavelets for Continuous Wavelet Transform (Gaussian, Morlet, Mexican Hat, etc.)  Daubechies Maxflat Wavelets  Symlets  Coiflets  Biorthogonal Spline Wavelets  Complex Wavelets


The most typical wavelet types used in condition monitoring of electrical machines are given in Fig. 2.

Fig. 2. The most typical wavelet types

The Fourier transform is widely used since several years in fault diagnosis and condition monitoring, but it has several drawbacks. It gives excellent results if the analyzed signal is stationary or periodic, but is not appropriate for a signal that has transitory characteristics such as drifts, abrupt changes and frequency trends [4]. The wavelet transform was introduced with the idea of overcoming the drawbacks of the Fourier transform. The wavelet transform assures both time-frequency analysis and time scale analysis with multiresolution characteristic. Due to all these advantages it can be very useful in the fault detection of electrical machines working in frequently existing variable load applications [5]. The Fourier transform decomposes a signal to complex exponential functions of different frequencies. This is defined by the following equation [6]: Xf 



 xt  e

 2 jft

dt

STFTx (t ) t , f  

 x(t ) (t  t)e *

 j 2ft

dt

where ω(t) is the window function and * marks the complex conjugate. Fourier analysis consists of decomposing a signal into sine waves of different frequencies. In a similar way, wavelet analysis is the breaking up of a signal to a shifted and scaled version of the original wavelet, also known as mother wavelet [7]. The scaling of a wavelet means stretching (or compressing) it, while the shifting is the delaying (or hastening) its onset [8]. If the scale of a wavelet is high, results that it corresponds to a low frequency stretched wavelet. If it is low, it corresponds to a high frequency compressed wavelet. The smaller scale factor corresponds to a more compressed wavelet [9]. In the analysis of the starting current of an induction machine, two types of wavelet transform, continuous wavelet transform (CWT) and discrete wavelet transform (DWT) can be used [10]. The continuous wavelet transform was developed to overcome the resolution problem of the short time Fourier transform. The wavelet analysis is very similar to that of the STFT analysis. In both cases the signal is multiplied with a function, a wavelet in the case of wavelet analysis and window function in the STFT, also the transform is computed separately for different segments of the time-domain signal. The CWT is defined by the following equation [6]: t  1 CWTx (t ) (t , f )   x(t ) * ( )dt (3) x ( , s )  s s



where τ and s are the translation and scale parameters, ψ(t) is the transforming function, also called the mother wavelet. If the discrete wavelet transform is used, the signal is analyzed at different frequency bands with different resolutions in order to decompose the signal into a coarse approximation and detail information. This is done by using digital filtering techniques. The signal is passed through high pass filters to analyze the high frequencies and through low pass filters to analyze the low frequencies [6]. The DWT associates low pass and high pass filters with two sets of functions, called scaling functions and wavelet functions, as shown in Fig. 3.

(1)



where f is the frequency, t the time and x is the analyzed signal. In short time Fourier transform (STFT) the signal to be processed is divided into small segments, which can be assumed to be stationary. For this purpose a window function is chosen. The expression of the STFT is the following [6]:

(2)

t

Fig. 3. The filtering of the signal by means of DWT


The decomposition of the signal to be analyzed into different frequency bands is obtained by successive high pass and low pass filtering of the time domain signal (see a two-levels decomposition in Fig. 4).

   t being the angular displacement. By using the above transformation the orthogonal components of the Park's current vector can be computed from the symmetrical three-phased current system, having the components: ia, ib and ic: 2 2  2     ic cos  ia cos   ib cos  3 3  3  

  

2 2  2   iq     ic sin   ia sin   ib sin   3 3 3   

  

id 

(6)

If the reference is fixed in the stator of the machine (   0 ) the above equation becomes: id  ia 

ib ic  2 2

3 ib  ic  iq  2

(7)

When the induction machine if healthy its three-phased stator current system is perfectly symmetric: Fig. 4. Two levels decomposition of the signal

The wavelet based detection methods have good sensitivity in detecting faults. They assure short fault detection time and can be easily applied also in on-line and remote condition monitoring of electrical machines. Wavelets can be used to examine signals simultaneously in both time and frequency. III. PARK'S VECTOR APPROACH One of the difficulties met in the analysis description of the behavior of most rotating electric machines is that the inductances are function of the relative position of the rotor and stator. In order to simplify the study of the electrical machines R.H. Park developed a transformation that made their analysis more straightforward by transforming the motor equations into a two-phased orthogonal reference frame [11]. The transformation of the three-phased system to the two-phased orthogonal one can be performed upon:  fa   fd       (4) f P dq0  f b   q  f c   f 0 



ia  2 I sin st  2   ib  2 I sin  s t   3   4   ic  2 I sin  st   3  

where I is the maximum value of the supply phase current, ωs is the supply frequency and t is the time variable. In this case by replacing (8) in (7) the following equation can be obtained for the two orthogonal components of Park's current vector in the case of a healthy electrical machine: 3 id  I sin(t ) 2 (9) 3 iq  I cos(t ) 2 Upon equation (9) it can be stated that that a healthy machine shows a perfect circle in Park's vector representation, as shown in Fig. 5.



where f is the function to be transformed (it can be the current, voltage or magnetic flux). The Park transformation matrix is:

Pdq0 

 2  4      cos    cos  cos  3  3     4  (5) 2  2     sin    sin      sin   3  3 3       1 1  1    2 2 2

(8)

Fig. 5. The plot of Park’s current vector for a healthy machine during startup


When any type of fault occurs the three-phased current phase current system becomes unbalanced. This result in an elliptic representation of the Par k's current vector (see Fig. 6).

Fig. 6. The plot of Park’s current vector for a faulty machine in steady-state regime

The Park's vector based approach is widely used in the diagnosis of most common faults of the squirrel cage induction machines. From the parameters of the ellipse the fault's gravity can be established. If the severity of the fault increases, the elliptical form of the representation will be more highlighted [13]. If this method is used alone it encounters difficulties in isolating different faults of the squirrel cage induction machine, because different faults may cause a similar deviation in the Park’s vector. Unfortunately it cannot distinguish the effects of the non-symmetries of the feeding voltage or of the machine itself not connected to the faults. IV. CONCLUSIONS The presented methods are non-invasive fault detection techniques. Compared with Fourier transform based methods the two methods in discussion can detect faults even in electrical machines with a light or variable load. Both detailed methods allow continuous real-time tracking of various types of faults in squirrel cage induction motors operating under continuous stationary and non-stationary conditions. These recognize the fault signatures produced by the electrical machine and can estimate the severity of the faults under different load conditions. ACKNOWLEDGMENT This paper was supported by the project "Improvement of the doctoral studies quality in engineering science for development of the knowledge based society-QDOC" contract no. POSDRU/107/1.5/ S/78534, project co-funded by the European Social Fund through the Sectorial Operational Program Human Resources 2007-2013.

REFERENCES [1] M.E.H. Benbouzid, "A Review of Induction Motors Signature Analysis as a Medium for Faults Detection", IEEE Transactions on Industrial Electronics, vol. 47, no. 5 (October 2000), pp. 984-993. [2] Motor Reliability Working Group, "Report of large motor reliability survey of industrial and commercial installations Part I, and II", IEEE Transactions on Industry Applications, vol. IA21, no. 4 (July 1985), pp. 853-872. [3] K.P. Soman, K.I. Ramachandran and N.G. Resmi, "Insight into Wavelets from Theory to Practice", PrenticeHall of India, 2008. [4] A.A. da Silva and B.R. Upadhayaya, "Rotating Machinery Monitoring and Diagnosis Using Short-Time Fourier Transform and Wavelet Techniques", in Proc. of the 1997 International Conference on Maintenance and Reliability, Knoxville (USA), vol. 1, pp. 14.01-14.15, 1997. [5] H. Douglas, P. Pillay and A. Ziarani, "Detection of Broken Rotor Bars in Induction Motors using Wavelet Analysis", Proc. of the IEEE International Electric Machines and Drives Conference (IEMDC '03), vol. 2, pp. 923-928, 2003. [6] L. Szabó, J.B. Dobai and K.Á. Bíró, "Discrete Wavelet Transform Based Rotor Faults Detection Method for Induction Machines", Intelligent Systems at the Service of Mankind, vol. 2., (eds: W. Elmenreich, J.T. Machado, I.J. Rudas), Ubooks, Augsburg (Germany), pp. 63-74, 2005. [7] N. Mehala and R. Dahiya, "Rotor Fault Detection in Induction Motor by Wavelet Analysis", International Journal of Engineering, Science and Technology, vol. 1, no. 3, pp. 90-99, 2008. [8] M. Misiti, Y. Misiti, G. Oppenheim and J. M. Poggi, "Wavelet Toolbox Users Guide", Version 1, The Mathworks, Inc., 1996. [9] S.K. Ahmed, S. Karmakar, M. Mitra and S. Sengupta, "Diagnosis of Induction Motors Faults due to Broken Rotor Bars and Rotor Mass Unbalance through Discrete Wavelet Transform of Starting Current at No-Load", J. of Electrical Systems, vol. 6, no. 3, pp. 442-456, 2010. [10] R.S. Arashloo and A. Jalilian, "Induction Motor Broken Rotor Bar Faults Using Discrete Wavelet Transform", 24th International Power System Conference, 2009. [11] C. Laughman, S. Leeb, L. Norfold, S. Shaw and P. Armstrong, "A Park Transform-Based Method for Condition Monitoring of 3-Phase Electromechanical Systems", Mitsubishi Electric Research Laboratories, May 2010. [12] H. Nejjari and M.E.H. Benbouzid, "Condition Monitoring and Diagnosis of Induction Motors Electrical Faults Using a Park’s Vector Pattern Learning Approach", IEEE Transactions on Industrial Application, vol. 36, no. 3 (May/June 2000), pp. 730-735. [13] L. Szabó, E. Kovács, F. Tóth and G. Fekete, "Rotor Faults Detection Method for Squirrel Cage Induction Machines Based on Park’s Vector Approach", Oradea University Annals , Electrotechnical Fascicle, Computer Science and Control Systems Session, pp. 234-329, 2007. [14] N. Mehala and R. Dahiya, "Detection of Bearing Faults on Induction Motor Using Park's Vector Approach", International Journal of Engineering and Technology, vol. 1, no. 3, pp. 90-99, 2010. [15] C.J. Verucchi, G.G. Acosta and F.A. Benger, "A Review on Fault Diagnosis of Induction Machines", Latin American Applied Research, vol. 38 no. 2, pp. 113-121, 2008.


Theoretical Implementation of the Rijndael Algorithm Using GPU Processing MANG Erica, MANG Ioan, ANDRONIC Bogdan, POPESCU Constantin University of Oradea, Romania Department of Computer Science, Faculty of Electrical Engineering and Information Technology, 410087 Oradea, Bihor, Romania, 1, Universitatii street, Tel/Fax:+40 259/408412, Tel:+40 259/408104; +40 259/408204 E-Mail: [email protected], [email protected], [email protected]

Abstract – In 1997, the National Institute of Standards and Technology (NIST) initiated a process to select a symmetric-key encryption algorithm to be used to protect sensitive Federal Information. In 1999 six of the fifteen candidate algorithms where selected, and in November 2000 Rijndael was announced to be the winner. This paper presents the study of AES Rijndael implemented using modern Graphics Processing Units, highlighting the benefits of parallel implementation over the standard Central Processing Unit implementation. It describes the approach based on Nvidia CUDA platform. AES is the standard encryption model adopted in current software applications. Keywords: Cipher, Rijndael, Encryption, Decryption, ShiftRow Transformation, Cuda, GPU, Byte Substitution, MixColumn Transformation.

The key addition layer: a simple EXOR of the Round Key to the intermediate State. Before the first round, a key addition layer is applied. The motivation for this initial key addition is the following. Any layer after the last key addition in the cipher (or before the first in the context of knownplaintext attacks) can be simply peeled off without knowledge of the key and therefore does not contribute to the security of the cipher (e.g., the initial and final permutation in the DES). Initial or terminal key addition is applied in several designs, e.g., IDEA, SAFER and Blowfish. In order to make the cipher and its inverse more similar in structure, the linear mixing layer of the last round is different from the mixing layer in the other rounds. It can be shown that this does not improve or reduce the security of the cipher in any way. This is similar to the absence of the swap operation in the last round of the DES.

I. INTRODUCTION II. MAIN DIFFERENCES BETWEEN CPU AND GPU The three criteria taken into account in the design of Rijndael are the following:  Resistance against all known attacks;  Speed and code compactness on a wide range of platforms;  Design simplicity. In most ciphers the round transformation has the Feistel Structure. In this structure typically part of the bits of the intermediate State are simply transposed unchanged to another position. The round transformation of Rijndael does not have a Feistel structure. Instead, the round transformation it is composed of three distinct invertible uniform transformations, called layers. By "uniform", we mean that every bit of the State is treated in a similar way. The specific choices for the different layers are for a large part based on the application of the Wide Trail Strategy, a design method to provide resistance against linear and differential cryptanalysis. In the Wide Trail Strategy, every layer has its own function: The linear mixing layer: guarantees high diffusion over multiple rounds. The non-linear layer: parallel application of S-boxes that have optimum worst-case nonlinearity properties.

The CPU (central processing unit) has often been called the brains of the PC. But increasingly, that brain is being enhanced by another part of the PC – the GPU (graphics processing unit), which is its soul. All PCs have chips that render the display images to monitors. But not all these chips are created equal. Intel’s integrated graphics controller provides basic graphics that can display only productivity applications like Microsoft PowerPoint, low-resolution video and basic games [4].

Figure 1. CPU architecture


Figure 2. GPU architecture

The processor architecture differes considerably from that of the GPU because of the CPU main scope. The CPU has a purpoise of multiple tasks and multiple types of coputations. The GPU has a single purpoise of mathematica computation in order to display graphics. III. SPECIFICATION Rijndael is an iterated block cipher with variable block length and a variable key length. The bloc length and the key length can be independently specified to 128, 192 or 256 bits. A. The State, the Cipher Key and the number of rounds The different transformations operate on the intermediate result, called the State. Definition: the intermediate cipher result is called the State. The State can be pictured as a rectangular array of bytes. This array has four rows, the number of columns is denoted Nb and is equal to the block length divided by 32. The Cipher Key is similarly pictured as a rectangular array with four rows. The number of columns of the Cipher Key is denoted by Nk and is equal to the key length divided by 32. These representations are illustrated in Figure 3.

Figure 3. Example of State (with Nb = 6) and Cipher Key (with Nk = 4) layout.

In some instances, these blocks are also considered as one-dimensional arrays of 4-byte vectors, where each vector consists of the corresponding column in the rectangular array representation. These arrays hence have lengths of 4, 6 or 8 respectively and indices in the ranges 0..3, 0..5 or 0..7. 4-byte vectors will sometimes

be referred to as words. Where it is necessary to specify the four individual bytes within a 4-byte vector or word the notation (a, b, c, d) will be used where a, b, c and d are the bytes at positions 0, 1, 2 and 3 respectively within the column, vector or word being considered. The input and output used by Rijndael at its external interface are considered to be one-dimensional arrays of 8-bit bytes numbered upwards from 0 to the 4*Nb-l. These blocks hence have lengths of 16,24 or 32 bytes and array indices in the ranges 0..15, 0..23 or 0..31. The Cipher Key is considered to be a one-dimensional arrays of 8-bit bytes numbered upwards from 0 to the 4*Nk-l. These blocks hence have lengths of 16, 24 or 32 bytes and array indices in the ranges 0..15, 0..23 or 0..31. The cipher input bytes (the "plaintext" if the mode of use is ECB encryption) are mapped onto the state bytes in the order a0,0, a1,0, a2,0, a3,0, a0,1, a1,1, a2,1, a3,1, a4,1 ... , and the bytes of the Cipher Key are mapped onto the array in the order k0,0, k1,0, k2,0, k3,0, k0,1, k1,1, k2,1, k3,1, k4,1 ... At the end of the cipher operation, the cipher output is extracted from the state by taking the state bytes in the same order. Hence if the one-dimensional index of a byte within a block is n and the two dimensional index is (i,j), we have: i=n mod 4;

j=[n/4];

n=i+4*j.

Moreover, the index i is also the byte number within a 4-byte vector or word and j is the index for the vector or word within the enclosing block. The number of rounds is denoted by Nr and depends on the values Nb and Nk. It is given in Table I. TABLE I. Number of rounds (Nr) as a function of the block and key length

Nr Nk = 4 Nk = 6 Nk = 8

Nb = 4 10 12 14

Nb = 6 12 12 14

Nb = 8 14 14 14

B. The round transformation The round transformation is composed of four different transformations. In this notation the "functions" (Round, ByteSub, ShiftRow, ...) operate on arrays to which pointers (State, RoundKey) are provided. We implemented a reduced version of the Rijndael algorithm with 6 round and 128 bits for cipher text and for cipher key. It can be seen that the final round is equal to the round with the MixColumn step removed. We consider that the last round is similar with other rounds. The component transformations are specified in the following subsections. 1) The ByteSub transformation The ByteSub Transformation is non-linear byte substitution, operating on each of the State bytes independently. The substitution table (or S-box) is


invertible and is constructed by the composition of two transformations: 1. First, taking the multiplicative inverse in GF(28). '00' is mapped onto itself. 2. Then, applying an affine (over F(2)) transformation.

IV. OVERWIEV ABOUT CUDA LANGUAGE OF PROGRAMATION

In the computer gaming industry, in addition to graphics rendering, graphics cards are used in game physics calculations (physical effects like debris, smoke, fire, fluids), an example being PhysX and Bullet (software). CUDA has also been used to accelerate nongraphical applications in computational biology, cryptography and other fields by an order of magnitude or more. An example of this is the BOINC distributed computing client. CUDA provides both a low level API and a higher level API. The initial CUDA SDK was made public 15 February 2007. NVIDIA has released versions of the CUDA API for Microsoft Windows and Linux. Mac OS X was also added as a fully supported platform in version 2.0, which supersedes the beta released February 14, 2008. CUDA has several advantages over traditional general purpose computation on GPUs (GPGPU) using graphics APIs. Scattered reads – code can read from arbitrary addresses in memory. Shared memory – CUDA exposes a fast shared memory region (16KB in size) that can be shared amongst threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture lookups. Faster downloads and readbacks to and from the GPU. Full support for integer and bitwise operations, including integer texture lookups.

CUDA (originally an acronym for Compute Unified Device Architecture although this is no longer used) is a parallel computing architecture developed by NVIDIA. Simply put, CUDA is the computing engine in NVIDIA graphics processing units or GPUs, that is accessible to software developers through industry standard programming languages. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. CUDA architecture supports a range of computational interfaces including OpenCL and DirectX Compute . Third party wrappers are also available for Python, Fortran and Java. The latest drivers all contain the necessary CUDA components. CUDA works with all NVIDIA GPUs from the G8X series onwards, including GeForce, Quadro and the Tesla line. NVIDIA states that programs developed for the GeForce 8 series will also work without modification on all future Nvidia video cards, due to binary compatibility. CUDA gives developers access to the native instruction set and memory of the parallel computational elements in CUDA GPUs. Using CUDA, the latest NVIDIA GPUs effectively become open architectures like CPUs. Unlike CPUs however, GPUs have a parallel "many-core" architecture, each core capable of running thousands of threads simultaneously - if an application is suited to this kind of an architecture, the GPU can offer large performance benefits.

Limitations It uses a recursion-free, function-pointer-free subset of the C language, plus some simple extensions. However, a single process must run spread across multiple disjoint memory spaces, unlike other C language runtime environments. Texture rendering is not supported. For double precision there are no deviations from the IEEE 754 standard. In single precision, Denormals and signalling NaNs are not supported; only two IEEE rounding modes are supported (chop and round-tonearest even), and those are specified on a perinstruction basis rather than in a control word (whether this is a limitation is arguable); and the precision of division/square root is slightly lower than single precision. The bus bandwidth and latency between the CPU and the GPU may be a bottleneck. Threads should be running in groups of at least 32 for best performance, with total number of threads numbering in the thousands. Branches in the program code do not impact performance significantly, provided that each of 32 threads takes the same execution path; the SIMD execution model becomes a significant limitation for any inherently divergent task (e.g., traversing a ray tracing acceleration data structure). CUDA-enabled GPUs are only available from NVIDIA (GeForce 8 series and above, Quadro and Tesla).

2) The ShiftRow transformation In ShiftRow, the rows of the State are cyclically shifted over different offsets. Row 0 is not shifted, row 1 is shifted over C1 bytes, row 2 over C2 bytes and row 3 over C3 bytes. The shift offsets Cl, C2 and C3 depend on the block length Nb. 3) The MixColumn transformation In MixColumn, the columns of the state are considered as polynomials over GF(28) and multiplied modulo x4 + 1 with a fixed polynomial c(x), given by: c(x) = '03' x3+ '01' x2 + '01' x1+ '02'. 4) The Round Key addition In this operation, a Round Key is applied to the State by a simple bitwise EXOR. The Round Key is derived from the Cipher Key by means of the key schedule. The Round Key length is equal to the block length Nb.


VI. CONCLUSIONS

Figure 4. CUDA processing flow[5]

Encryption on GPU can improve speed of files of 8MB and larger up to 20 times. - The current GPU’s can provide good parallelism processing for intense threading programs, thus relieving the CPU. - Current solutions of unified architectures can outperform CPU’s in massive parallel programming. Still there are efforts to introduce software that can rely solely on the GPU, the effort it greatly scaled because the GPU cannot act alone and retrieve the data without CPU aid, thus increasing the processing time in some cases by up to 50%, thus small programs do not actually present benefits by running on the GPU because of the data transfer time. If the data can be retrieve by bypassing the CPU then the processing time can be decreased significantly.

V. STUDY CASE REFERENCES The tests were conducted using an AMD Athlon 300+, 2.2GHz, an NVIDIA GeForce GT240. As it can be seen, a comparison has been made between GPU and a CPU implementation. A peak throughput rate of 8.28 Gbit/s is achieved with an input size of 8MB. In that case the GPU is 19.60 times faster than the CPU. For a input file of 2KB the GPU time was approximately 0.26 ms, the CPU time was approximately 1 ms. For a file of 512 KB GPU time was approximately 2.7ms versus 9ms for CPU, for files of 1 MB GPU time was 3.3 ms versus 19ms CPU, files of 4BM GPU time was 13.9ms and CPU time 74ms, for files of 8MB GPU time was 27.4ms and CPU time was 150ms.

[1] OpenGL Architecture Review Board, M. Woo, J. Neider, T. Davis, D. Shreiner, “The OpenGL Programming Guide: The Official guide to Learning OpenGL, Version 2”, 5th edition. ISBN 0321335732, Addison-Wesley, New York, 2005 [2] General Purpose Computation Using Graphics Hardware, http://www.gpgpu.org. [3] NVidia CUDA, http://developer.NVidia.com/object/ CUDA.html. [4] http://blogs.nvidia.com/2009/12/whats-the-difference -between-a-cpu-and-a-gpu/ [5] http://en.wikipedia.org/wiki/CUDA


VHDL implementation of an error detection and correction module based on Hamming code MANG Ioan 1, MANG Erica2, POPESCU Constantin 3 University of Oradea, Romania, Department of Computer Science, Faculty of Electrical Engineering and Information Technology, 1, Universităţii St., 410087 Oradea, Romania, E-Mail1: [email protected], E-Mail2: [email protected], E-Mail3: [email protected]

Abstract – Error detection and correction (ECC) is found in many high-reliability and performance applications. Error correction codes (ECCs) permit detection and correction of errors that result from noise or other impairments during transmission from the transmitter to the receiver. ECC becomes an important feature for many communication applications; it is more performance and cost efficient to correct an error rather than retransmit the data. This paper describes the implementation of an Error Correction Control (ECC) module in a Virtex-4 device. Keywords: Hamming code; error detection, error correction; FPGA circuits. I. INTRODUCTION The design described in this paper implements error detection and correction at the speed of the data read/write rate. It detects double bit errors and corrects single bit errors anywhere within the codeword. The design detects and corrects all single bit errors (in a codeword consisting of either 64-bit data and 8 parity bits, or 32-bit data and 7 parity bits), and it detects double bit errors in the data. This design utilizes Hamming code, a simple yet powerful method for ECC operations. As a result, this design offers exceptional performance and resource utilization. Hamming code is a relatively simple yet powerful ECC code. It involves transmitting data with multiple check bits (parity) and decoding the associated check bits when receiving data to detect errors. The check bits are parallel parity bits generated from XORing certain bits in the original data word. If bit error are introduced in the codeword, several check bits show parity errors after decoding the retrieved codeword. The combination of these check bit errors display the nature of the error and the position of any single bit error is identified from the check bits. The Hamming codeword is a concatenation of the original data and the check bits (parity). The data to be transmitted consists of a certain number of information

bits u, and he adds to these a number of check bits p such that if a block is received that has at most one bit in error, then p identifies the bit that is in error (which may be one of the check bits). Specifically, in Hamming‟s code p is interpreted as an integer which is 0 if no error occurred, and otherwise is the 1 - origined index of the bit that is in error. Let k be the number of information bits, and m the number of check bits used [2]. Because the m check bits must check themselves as well as the information bits, the value of p, interpreted as an integer, must range from 0 to which is distinct values. Because m bits can distinguish cases, we must have: 2m ≥ m + k + 1

(1)

where: k - Number of “information” or “message” bits. m - Number of parity-check bits (“check bits,” for short). The value of „m‟ can be determined by substituting the value of „k‟ (the original length of the data to be transmitted). The check bits will be interspersed among the information bits. Because p indexes the bit (if any) that is in error, the least significant bit of p must be 1 if the erroneous bit is in an odd position, and 0 if it is in an even position or if there is no error. A simple way to achieve this is to let the least significant bit of p, p0, be an even parity check on the odd positions of the block, and to put p0 in an odd position. The receiver then checks the parity of the odd positions (including that of p0). If the result is 1, an error has occurred in an odd position, and if the result is 0, either no error occurred or an error occurred in an even position [5]. This satisfies the condition that p should be the index of the erroneous bit, or be 0 if no error occurred. Similarly, let the next from least significant bit of p, p1, be an even parity check of positions 2, 3, 6, 7, 10, 11, … (in binary, 10, 11, 110, 111, 1010, 1011, …), and put p1 in one of these positions. Those positions have a 1 in their second from least significant binary position number. The receiver checks the parity of these positions (including the position of p1). If the result is 1, an error occurred in one of those positions, and if the


result is 0, either no error occurred or an error occurred in some other position. Continuing, the third from least significant check bit, p2, is made an even parity check on those positions that have a 1 in their third from least significant position number, namely positions 4, 5, 6, 7, 12, 13, 14, 15, 20, …, and p2 is put in one of those positions. Putting the check bits in power-of-two positions (1, 2, 4, 8, …) has the advantage that they are independent. That is, the sender can compute p0 independently of p1, p2, … and, more generally, it can compute each check bit independently of the others[1]. Hamming codes can be computed in linear algebra terms through matrices because Hamming codes are linear codes. Two Hamming matrices can be defined: the code generator matrix G and the parity-check matrix H. The parity matrix [P] can be expressed as (equation 1): [P]=[D]•[G] (1)

The check bits are written in the memory along with the associated 64-bit data. During a memory read, the data and the check bits are read simultaneously. Any error introduced during the read/write access between the FPGA and memory are detected. The parity bits are generated based on an unmodified Hamming code.

where [D] is the data matrix and [G] is the generator matrix. The [G] matrix consists of an identity matrix [I] and a creation matrix [C] (equation 2).

B.

[G] = [I:C]

(2)

To detect errors, the codeword vector multiplies with the transpose of the generator matrix to produce an 8-bit vector [S], known as syndrome vector (equation 3): [S]=[D,P]•[G′]

(3)

For example, the (7,4) Hamming code is shown in equation (4):

1  G  0 0  0

0 0 0 1 0 0 0 1 0  0 0 1

1 0  1  1

1 1 1 1 0 1  1 0

The Decoder

Decoding is performed by multiplying the codeword vector by the parity check matrix. The Hamming decoder module generates a syndrome vector to determine if there is any error in the received code word. The ECC detects single-bit and double-bit errors, but only single-bit errors are corrected. The decoder unit shown in Figure 2 consists of three blocks: Syndrome generation block, Syndrome LUT and mask generation block and Data correction block [4].

(4)

If all of the elements of the syndrome vector are zeros, no error is reported. Any other non-zero result represents the bit error type and provides the location of any single bit errors. It is then used to correct the original incoming data[4]. II. PARITY ENCODER AND PARITY DECODER A.

Figure 1. Parity encoder block diagram

The Encoder

Encoding is performed by multiplying the original message vector by the generator matrix. The encoder consists of XORs and a bit-error generator implemented in look-up tables (LUTs). It is computationally inexpensive. Figure 1 shows the block diagram of the parity encoder. This design uses the (72,64) Hamming code. It mens that the Hamming codeword width is 72 bits, comprised of 64 data bits and eight check bits.em

Figure 2. ECC functional block diagram

In figure 1 and figure 2 the input/output signals descriptions are: PARITY_OUT - Parity bits generated from the encoder based on the data (encin) registered at the same clock edge; PARITY_IN - Parity bits associated with the incoming data (DECIN) registered at the same rising clock edge; ENCIN - Original data input to the encoder; ENCOUT - Registered original data through the encoder; DECIN - Incoming data to the decoder; DECOUT - Corrected data from DECIN; FORCE_ERROR - Introduce bit error in the encoded dataword for test purpose (00 – Normal operation, 01 – Inject single bit error, 10 – Inject double bit error, 11 – Inject triple bit error).


The incoming 64-bit data along with the 8-bit parity are XOR'd together to generate the 8-bit syndrome (S1 through S8). This is very similar to check bit generation, for example (equation 6): S1=DECIN0⊕DECIN1⊕DECIN3⊕DECIN4⊕ DECIN6⊕DECIN8⊕DECIN10⊕DECIN11⊕ DECIN13⊕DECIN15⊕DECIN17⊕DECIN19⊕ DECIN21⊕DECIN23⊕DECIN25⊕DECIN26⊕ DECIN28⊕DECIN30⊕DECIN32⊕DECIN34⊕ (6) DECIN36⊕DECIN38⊕DECIN40⊕DECIN42⊕ DECIN44⊕DECIN46⊕DECIN48⊕DECIN50⊕ DECIN52⊕DECIN54⊕DECIN56⊕DECIN57⊕ DECIN59⊕DECIN61⊕DECIN63⊕PARITY_IN(1)

Then the next stage uses the syndrome to look for the error type and the error location. An optional pipeline stage can be added here to improve performance further. In order to correct a single bit error, a 64-bit correction mask is created. Each bit of this mask is generated based on the result of the syndrome from previous stage. When no error is detected, all bits of the mask become zero. When a single bit error is detected, the corresponding mask masks out the rest of the bits except for the error bit. The subsequent stage then XORs the mask with the original data. As a result, the error bit is reversed (or corrected) to the correct state. If a double bit error is detected, all mask bits become zero. The error type and corresponding correction mask are created during the same clock cycle. In the data correction stage, the mask is XOR'd together with the original incoming data to flip the error bit to the correct state, if needed. When there are no bit errors or double bit errors, all the mask bits are zeros. As a result, the incoming data goes through the ECC unit without changing the original data. To display the error type, the reference design also supports diagnostic mode. Single, multiple, and triple bit errors can be introduced to the output codeword. When the ERROR port is 00, no single, two, or greater bit error is detected. In other words, the examined data has no parity error. When the ERROR port is 01, it indicates single bit error occurred within the 72-bit codeword. In addition, the error is corrected, and the data is error free. When the ERROR port is 10, a two bit error has occurred within the codeword. In this case, no error correction is possible in this case. When the ERROR port is 11, errors beyond the detection capability can occur within the codeword and no error correction is possible. This is an invalid error type. A deliberate bit error can be injected in the codeword at the output of the encoder as a way to test the system. Force_error provides several types of error modes. Force_error = 00 This is the normal operation mode. No bit error has been imposed on the output of the encoder. Force_error = 01 Single bit error mode. One bit is reversed (0 to 1 or 1 to 0) in the codeword at every rising edge of the clock. The single bit error follows the sequence moving from

bit 0 of the codeword to bit 72. The sequence is repeated as long as this error mode is active. Force_error = 10 Termed double bit error mode. Two consecutive bits are reversed (0 becomes 1 or 1 becomes 0) in the codeword at every rising edge of the clock. The double bit error follows the sequence moving from bit (0,1) of the codeword to bit (71, 72). The sequence repeats as long as this error mode is active. Force_error = 11 Termed triple-error mode. Three-bits are reversed (0 becomes 1 or 1 becomes 0) in a codeword generated at every rising edge of the clock. The double bit error follows the sequence moving from bit (0,1, 2) of the codeword together to bit (70, 71, 72) sequentially. The sequence repeats as long as this error mode is active. III. IMPLEMENTATION OF HAMMING CODE USING FPGA CIRCUITS The Hamming code implementation was done in VHDL [3] using the Xilinx ISE 8.2i package and the XC4VLX25 board [6][7]. We implemented three versions of the Hamming code: for 32, 64 bits wide words and finaly for the 64 bits word we introduced a pipelining stage. I/Os of the module are registered. For the encoder, the latency from the time the input data is presented at ENCIN to encoded data becomes available at ENCOUT is two clock cycles unpipelined or three clock cycles pipelined. For the decoder, the latency from the time the input data is presented to DECIN to the processed data becomes available at DECOUT is two clock cycles unpipelined or three clock cycles pipelined. The status signal ERROR is synchronous to DECOUT. The device utilization is shown in figure 3. Analyzing the file Map Report we can conclude that only 1% of the total CLB circuits have been used.

Figure 3. Device utilization

The next figures demonstrate the way single errors appear and are then corrected. First the data vector is set to 0. On the next clock, using the FORCE_ERROR signal we introduce an error – the COUNT signal differ from ENCOUNT signal on a single bit (figure 4). On the rising edge of the next clock the error is detected and the ERROR signal changes from 00 to 01. The next clock corrects the error (figure 5).


Error detection and correction is found in many highreliability and performance applications. For example, in enterprise data storage systems, memory caches are utilized to improve system reliability. The cache is typically placed inside the controller between the host interfaces and the disk array. A robust cache memory design often includes ECC functions to avoid single point of failure losses of customer data. ECC becomes an important feature for many communication applications, such as satellite receivers; it is more performance and cost efficient to correct an error rather than retransmit the data. Figure 4. Introducing an error

REFERENCES [1]

[2] [3] [4] Figure 5. Correcting the error

[5]

IV. CONCLUSIONS The biggest advantage of the Hamming ECC is its absolute simplicity in generating of the Hamming code and thereafter recovery of the original data word. Moreover, by use of the relatively simple Hamming rule, data words of any length can be encoded along with the corresponding number of parity bits. The reference design utilizes a minimum amount of resources and has high performance. The design was synthesized using the Xilinx Synthesis Tool (XST). The performance summary is based on ISE 10.1 speed specifications and reflects the 64-bit version ECC reference design only [7].

[6] [7]

Avizienis, J.-C. Laprie, B. Randell, C. Landwehr, “Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure Computing, Vol.1, No.1, pp 11 - 33, JanuaryMarch 2004. Jonathan I. Hall, “Notes on Coding Theory”, Department of Mathematics Michigan State University East Lansing, MI 48824 USA, 2003 Mang Gerda Erica, “VHDL”, University of Oradea Publishing House, ISBN 973-613-485-7, 2003 Simon Tam, “Single Error Correction and Double Error Detection”, Xilinx Application Note, 2006 Luca Trevisan, “Some Applications of Coding Theory in Computational Complexity”, U.C. Berkeley, Computer Science Division. Work supported by NSF grant CCR9984703, 2004 Xilinx, Virtex-4 Family Overview, http://www.xilinx. com/support/documentation/data_sheets/ds112.pdf, 2010 Xilinx ISE 10.1 Design Suite Software Manuals and Help, http://www.xilinx.com/support/documentation/dt_ ise10-1_userguides.htm , 2008


Advanced Methods for Testing Security NEAMTU Iosif Mircea “Lucian Blaga” University of Sibiu, Department of Informatics, Faculty of Science I. Ratiu Str. 5-7, Sibiu, Romania, E-Mail: [email protected]

Abstract – Information security technologies is a function that aims at a comprehensive review of vulnerabilities and risks by taking interconnected society. Information security technologies aimed not only prevent disasters, but they are the means of achieving business objectives. Therefore, the first thing that must be taken into account when developing an application are security policies, computer security incidents because the consequences can be disastrous. The program presented in this paper is an example of vulnerabilities in software testing using advanced methods of operation.

failure occurs. This fact alone is bad enough if we think of servers that can no longer perform their task. Worse, crushing some data, in a program that runs with super user privileges can have disastrous consequences. Programming languages commonly associated with buffer overflows include C and C++, which provide no built-in protection against accessing or overwriting data in any part of memory and do not automatically check that data written to an array (the built-in buffer type) is within the boundaries of that array. Bounds checking can prevent buffer overflows. A.

Keywords: vulnerability, memory, string, buffer, crash, injecting, shellcode. I. INTRODUCTION This article has been created for this new technology operating. To create exploits and understand their action, it is necessary to know several concepts of Operating Systems such as memory organization, the structure of executable file and the phases of compilation. These principles will be described below. Compared with other types of viruses this technology is very advanced because it is rather hard for antivirus programs to detect it. With a lot of exploitation of common applications such as: Adobe Reader, Flash Player, Firefox, Internet Explorer or even media players such as VLC or Winamp ca be exploited just by opening their respective file type. After opening the exploited file the application will crash, but the exploited file can make a lot of changes in the affected computer, such as: opening a port which will allow the attacker complete access to the infected computer or it can create automatically a connection to a specific server or run a specific process, etc. Our application will be built on such a connection to the server. This type of vulnerability is called "buffer overflow". Basically, in computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. In these kinds of situations, the data is still inserted in the memory even if it overrides the data that shouldn’t. During overwriting critical data system

Basic example

In the following example, a program has defined two data items which are adjacent in memory: A (an 8-bytelong string buffer) and B (a two-byte integer). Initially, A contains nothing but zero bytes, and B contains the number 1979. Characters are one byte wide. TABLE 1 variable name A [null string] value 00 00 00 00 00 00 00 00 hex value

B 1979 07 BB

Now, the program attempts to store the nullterminated string “excessive” in the A buffer. By failing to check the length of the string, it overwrites the value of B: TABLE 2 variable name A 'e' 'x' 'c' 'e' 's' 's' 'i' 'v' value 65 78 63 65 73 73 69 76 hex

B 25856 65 00

Although the programmer did not intend to change B at all, B's value has now been replaced by a number formed from part of the character string. In this example, on a big-endian system that uses ASCII code, "e" followed by a zero byte would become the number 25856. If B was the only other variable data item defined by the program, writing an even longer string that went past the end of B could cause an error such as a segmentation fault, terminating the process.


B.

Methodology

aspects of the framework, such as module and plugin management.

Given the vulnerability of "buffer overflow", try taking control of your computer by injecting a shellcode in buffer. Shellcode is used to directly manipulate registers and the functionality of an exploited program. We can of course write shell codes in the high level language they might not work for some cases, so assembly language is preferred for this. We write shellcode because we want the target program to function in a manner other than what was intended by the designer. One way to manipulate the program is to force it to make a system call or syscall. A system call is how a program requests a service from an operating system's kernel that it does not normally have permission to run. System calls provide the interface between a process and the operating system. Most operations interacting with the system require permissions not available to a user level process, e.g. I/O performed with a device present on the system, or any form of communication with other processes requires the use of system calls. Figure 1.

II. STE DEVELOPMENT - RUBY AND JAVA A.

STE (Security Testing and Exploit) is a security research program written in Ruby (dynamic language provides great flexibility to extend framework's functionality). The most important piece of the application architecture is the Ruby Extension Library (Rex) which is designed to have no dependencies other than what comes with the default Ruby install. B.

Module Types:

Introduction

Application description

1) 2) 3) 4) 5)

Unlike modules, the plugins are meant to add features to the application or to change the behavior of existing aspects. Plugins have a very loose definition in terms of the scope in which they can operate. For example, a plugin could add an entirely new module type for use by the application.

The framework is structured into different parts. The low level area being the framework core that is responsible for implementing all of the required interfaces that make possible the interaction with exploit modules, sessions, and plugins. This core library is extended by the framework base library which is designed to provide simpler wrapper routines for dealing with the framework core as well as providing utility classes for dealing with different aspects of the framework, such as serializing module state to different output formats. Above these two layers is the UI component developed in Java language which eases the work with the framework and links the application with the database server that is the place where are stored all the vulnerabilities known at a certain point in time and that can be tested on the local system and secured.

Figure 2. D.

C.

Operations management

The operations are handled by the framework managers that are responsible for some of the basic

ENCODER EXPLOIT NOP AUXILIARY PAYLOAD

Session management

The session manager is used to track sessions created from within a framework instance as the result of an exploit succeeding.


Figure 5.

Figure 3.

The purpose of sessions is to expose features to a programmer that allow it to be interacted with. For instance, a command shell session allows programmers to send commands and read responses to those commands through a well-defined API. E.

Job management

Each framework instance supports running various tasks in the context of worker threads through the concept of jobs. F.

Utility classes

Some classes in the framework core are intended to be used to make certain tasks simpler without being out of scope of the core aspects of the framework. III. CONCLUSIONS In the end every self respecting company will need this kind of solution to prevent hacking attempts by testing system vulnerabilities. That’s why we developed an easy to use java application which meets the user’s needs in every aspect. In the near future i want to implement more plugins and modules in our application. REFERENCES

Figure 4.

[1] D. Maynor, “Metasploit Toolkit for Penetration Testing, Exploit Development, and Vulnerability Research”, ISBN10: 1597490741, pp. 127-128, October 2, 2007.


[2] E. Eilam, “Reversing: Secrets of Reverse Engineering”, ISBN-13: 978-0764574818, pp. 224-226, April 15, 2005. [3] J. Seitz, “Gray Hat Python, Python Programming for Hackers and Reverse Engineers”, ISBN-10: 1593271921, pp. 116-120, April 30, 2009. [4] J. C. Foster, M. Price, “Stuart McClure, Sockets, shellcode, porting & coding: reverse engineering exploits and tool coding for security professionals”, ISBN-13: 9781597490054, pp. 467-468, April 26, 2005. [5] J. C. Foster, “Buffer overflow attacks: detect, exploit, prevent”, ISBN: 1-932266-67-4, pp. 179-182, December, 2004. [6] J. Koziol, “The shellcoder's handbook: discovering and exploiting security holes”, ISBN-13: 978-0764544682, pp. 220-224, April 2, 2004.

[7] J. Long, “Penetration tester's open source toolkit”, ISBN-10: 1597490210, pp. 104-116, June 1, 2005. [8] M.I. Neamtu, “Programare distribuita”, Edit. Alma Mater, ISBN 973-632-192-4, Sibiu, 2005. [9] M. Ligh, S. Adair, B. Harstein, M. Richard, “Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code”, ISBN-13: 978-470-61303-0, pp 544-554, November 2, 2010. [10] P. Szor, “The Art of Computer Virus Research and Defence”, Addison-Wesley Professional, ISBN-13: 978-0321-30454-4, pp 345-355, February 13, 2005. [11] T. Wilhelm, “Professional Penetration Testing: Creating and Operating a Formal Hacking Lab”, Lab, ISBN-10: 1597494259, pp 404-405, August 28, 2009.


Software Tools to Detect Suspicious Files NEAMTU Iosif Mircea “Lucian Blaga” University of Sibiu, Department of Informatics, Faculty of Science I. Ratiu Str. 5-7, Sibiu, Romania, E-Mail: [email protected]

Abstract – Nowadays in the technological era when the total amount of information is growing rapidly and the Internet has also become something common and insecure, most users on the web easily fall prey to viruses form accessing various links received form strangers or downloading unknown software from the Internet. Therefore we have developed a method to analyze the unknown applications, which the users run and install on their computer, and see the changes made in the computer. Keywords: vulnerability, memory, string, injecting, shellcode, Malware Analyzer, virtual machines. I. INTRODUCTION Today’s antiviruses, in their attempt of detecting viruses that are not in their signatures database, use a heuristic scanning method that analyzes system functions called by the running application, functions used by most viruses. If an application uses a lot of API call functions also used by viruses, then the antivirus targets the application as suspect. That’s why we have implemented this method, similar to the methods used in antivirus research departments from various antivirus companies, which acquire the suspicious files submitted by users for analysis, and concludes if the application is a virus. A.

programming language. Rather than forcing programmers to adopt a particular style of programming, it permits several styles: object-oriented programming and structured programming are fully supported, and there are a number of language features which support functional programming and aspect-oriented programming (including by metaprogramming and by magic methods). Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage collector for memory management. An important feature of Python is dynamic name resolution (late binding), which binds method and variable names during program execution. Python can also be used as an extension language for existing modules and applications that need a programmable interface. Basically Python is a programming language that lets you work more quickly and integrate your systems more effectively. Python runs on Windows, Linux/Unix, Mac OS X, and has been ported to the Java and .NET virtual machines. We will be running the Malware Analyzer with Immunity Debugger latest version by typing “!MalwareAnalyzer” in the command line. After that, a menu will be displayed containing all the available commands:

Description

Malware Analyzer was programmed Pyton. Python is an interpreted, general-purpose high-level programming language whose design philosophy emphasizes code readability. Python aims to combine "remarkable power with very clear syntax", and its standard library is large and comprehensive. Its use of indentation for block delimiters is unique among popular programming languages. Python supports multiple programming paradigms, primarily but not limited to object-oriented, imperative and, to a lesser extent, functional programming styles. It features a fully dynamic type system and automatic memory management, similar to that of Scheme, Ruby, Perl, and Tcl. Like other dynamic languages, Python is often used as a scripting language, but is also used in a wide range of non-scripting contexts. Python is a multi-paradigm

Figure 1.


After choosing one of the available commands we will attach to the wanted process. Immunity Debugger is a powerful way to write exploits and reverse engineer binary files. It builds on a solid user interface with function graphing; the industry's first heap analysis tool built specifically for heap creation, and a large and well supported Python API for easy extensibility. Immunity Debugger is a debugger with functionality designed specifically for the security industry. It has a very simple and understandable interface and it is a robust and powerful scripting language for automating intelligent debugging.

Trojan horses may allow a hacker remote access to a target computer system. Once a Trojan horse has been installed on a target computer system, a hacker may have access to the computer remotely and perform various operations, limited by user privileges on the target computer system and the design of the Trojan horse. Trojan horses require interaction with a hacker to fulfil their purpose, though the hacker need not be the individual responsible for distributing the Trojan horse. It is possible for individual hackers to scan computers on a network using a port scanner in the hope of finding one with a malicious Trojan horse installed, which the hacker can then use to control the target computer. In our case Poison Ivy is a remote administration tool (a RAT), which is used to remotely connect and manage a single or multiple computers with a variety of software tools such as shell control (from command prompt), computer control (power off/on/log off if remote feature is supported), registry management (query/add/delete/modify) or file management (download/upload/execute/etc.). Its primary function is for one computer operator to gain access to remote PCs. One computer will run the "client" software application, while the other computer(s) operate as the "host(s)".

Figure 2.

For our demonstration we will analyze a known Trojan, Poison Ivy. A Trojan is software that appears to perform a desirable function for the user prior to run or install, but (perhaps in addition to the expected function) steals information or harms the system. The term is derived from the Trojan horse story in Greek mythology. Unlike viruses, Trojan horses do not replicate themselves but they can be just as destructive. One of the most insidious types of Trojan horse is a program that claims to rid a computer of viruses but instead introduces viruses onto the computer.

Figure 3.

Since running viruses on a computer isn’t advised, we will be running Poison Ivy in an isolated environment such as Sandboxie. Sandboxie is a proprietary sandbox-based isolation program developed for 32- and 64-bit Windows NT-based operating systems. It creates a sandbox-like isolated operating environment in which applications can be run or installed without permanently modifying the local or mapped drive.


An isolated virtual environment allows controlled testing of untrusted programs and web surfing. Sandboxie runs your programs in an isolated space which prevents them from making permanent changes to other programs and data in your computer. In computer security, a sandbox is a security mechanism for separating running programs. It is often used to execute untested code, or untrusted programs from unverified third-parties, suppliers and untrusted users. The sandbox typically provides a tightly-controlled set of resources for guest programs to run in, such as scratch space on disk and memory. Network access, the ability to inspect the host system or read from input devices are usually disallowed or heavily restricted. In this sense, sandboxes are a specific example of virtualization.

In order to analyze the virus we must enter the “!MalwareAnalyzer -d” command to open all the monitoring windows: In case the virus acts suspicious then the corresponding window will display the called API functions:

Figure 4.

Figure 6.

As you can see, the Malware Analyzer can easily show which API calls the virus is using, in this case the virus is using send/receive and process injection API calls. By further using the Malware Analyzer we can observe that the virus ca do harm to the computer by modifying or deleting computer data or even key logging computer information. II. SOURCE CODE DESCRIPTION Figure 5.

For interceptions we use a hooking method for each function as shown in the following example:


III. CONCLUSIONS The next step will be an antivirus program with an engine that can develop this type of analysis. This type of analysis is more efficient for the discovery of viruses can be made without using a signature database, and also does not require internet connection to update the database. REFERENCES [1] D. Maynor, “Metasploit Toolkit for Penetration Testing, Exploit Development, and Vulnerability Research”, ISBN10: 1597490741, pp. 237-238, October 2, 2007. [2] E. Eilam, “Reversing: Secrets of Reverse Engineering”, ISBN-13: 978-0764574818, pp. 114-116, April 15, 2005. [3] J. C. Foster, M. Price, “Stuart McClure, Sockets, shellcode, porting & coding: reverse engineering exploits and tool coding for security professionals”, ISBN-13: 9781597490054, pp. 237-238, April 26, 2005. [4] J. C. Foster, “Buffer overflow attacks: detect, exploit, prevent”, ISBN: 1-932266-67-4, pp. 219-222, December 2004. [5] J. Koziol, “The shellcoder's handbook: discovering and exploiting security holes”, ISBN-13: 978-0764544682, pp. 420-421, April 2, 2004. [6] J. Seitz, “Gray Hat Python: Python Programming for Hackers and Reverse Engineers”, ISBN-10: 1593271921, pp. 216-220, April 30, 2009. [7] M.I. Neamtu, “Programare distribuita”, Edit. Alma Mater, ISBN 973-632-192-4, Sibiu, 2005. [8] M. Ligh, S. Adair, B. Harstein, M. Richard, “Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code”, ISBN-13: 978-470-61303-0, pp 641-655, November 2, 2010. [9] P. Szor, “The Art of Computer Virus Research and Defence, Addison-Wesley Professional”, ISBN-13: 978-0321-30454-4, pp 400-303, February 13, 2005. [10] T. Wilhelm, “Professional Penetration Testing: Creating and Operating a Formal Hacking Lab”, ISBN-10: 1597494259, pp 412-415, August 28, 2009.


Aspects of Cache Memory Simulation using Programs under Windows and UNIX Operating Systems NOVAC Ovidiu1, NOVAC Mihaela2, VARI-KAKAS Stefan3, VLADU Ecaterina3 University of Oradea, Romania, Department of Decorative Arts and Design, Faculty of Visual Arts, 2 Department of Electrical Engineering, Faculty of Electrical Engineering and Information Technology, 3 Department of Computer Science, Faculty of Electrical Engineering and Information Technology, 410087,1 , Universităţii Str., Oradea, Romania, E-Mail: [email protected] 1

Abstract – We have built a simulator named, CDLR SPEC 2000 and a simulator named Cache for simulating the operation of a memory hierarchy. CDLR SPEC 2000 simulator is based on traces for systems with cache memories. With CDLR SPEC 2000 program we can study, for a memory hierarcy, the next parameters: mapping function, block size, writing strategies, replacement algoritm, size of cache memory, number of cache sets (for the set associative caches), number of words form a block. The simulator Cache can also be used for the study of cache memory behaviour. Also the CDLR SPEC 2000 program, introduce the calculus of CDLR of a memory hierarchy.

transferring data). On the other extreme, there is possible to write back data to lower level only when needed (for instance based on the amount of data in a level). This method yields a more reduced reliability (as data stays longer in a less reliable level), but better performance (less overhead for transferring data). At last, the third possibility is the delayed write, when data is written from level L to level L+1 after a time denoted delayL. So delayL is the age of the data before it leaves level L and is written to level L+1. We can observe that delayL, monotonically increase with L .

cache memories; memory hierarchy; Keywords: replacement algorithm; writing policy;

II.1 Construction objectives of cache simulators

I. INTRODUCTION Memory hierarchy is very important in a computer system. Practically if a computer system has a fast and efficient processor, if the memory hierarchy is slow and inefficient, memory access bottlenecks will arise and the overall system performance will be low. For this reason in any system design an adequate attention should be paid to the design of memory hierarchy subsystems, such as memory interface, cache, paging mechanism, TLB, CPU memory hierarchy support registers. The memory devices of a computer system are organized in a hierarchy in order to achieve a good rate between performance (i.e. low access time) and cost per bit. Proceeding down the memory hierarchy, its reliability improves owing to the storage technology in different levels. There is of course a tradeoff between high reliability and performance, which is influenced beside the construction, by the transfer policy used among levels [1]. Transfer policy is important because it directly affects the reliability of the overall hierarchy. A straightforward possibility is to write through to the most reliable, lowest level every time a data is transmitted from the CPU. This policy offers good reliability, but bad performance (high overhead for

II. EXPERIMENTATION ENVIRONMENTS REPRESENTED BY CACHE SIMULATORS

II.1.1 Objective of experimental environment Cache The objective of experimental environment Cache is simulating the operation of a memory hierarchy. Taking into consideration what it is written in the literature of this area [2], [3], and [12] I considered that the most important parameters used in the construction of a simulator are: main memory size, number of cache levels and size of each level of cache, cache block size, mapping function, writing policies and replacement algorithm. Another important parameter used in the simulation is the trace file. To realize this simulation we have to configure the memory hierarchy. We will determine: the main memory size, the number cache levels, the cache block size, the type of cache mapping function, the writing policy selection and the replacement algorithm selection. Depending on the selected number of cache levels we will determine the size of each cache level. We will determine also the trace file used in the simulation. A program objective is to draw some graphics that represents hit rate and miss rate for each cache level and also global hit rate and miss rate. We used as a model for our simulator the simulator SMP Cache 2.0 [12]. Our CACHE simulator has a disadvantage, namely that it does not take into account the performance and reliability indices of caches [8], [9], [11].


II.1.2 Objective of experimental environment CDLR SPEC 2000 CDLR SPEC 2000 is also a simulator based on traces for caches memory hierarchy. The experimental environment doesn’t have an graphical interface, so the files execution is made in DOS window. The objective of experimental environment CDLR SPEC 2000 is the study of cache memory behavior by simulating the operation of a memory hierarchy and calculation of global CDLR (CDLR parameter of memory hierarchy). Unlike CACHE simulator, the experimental environment CDLR SPEC 2000 takes into consideration the performance and reliability indices by calculating the CDLR parameter. To realize the simulation, we first set the memory hierarchy. Thus we set: mapping function, block size, writing policy, replacement algorithm, cache size, number of cache sets (for set associative caches), number of words in a block. Next we have to specify for each level of our hierarchy the following parameters: memory size, MTTF, fetch time, reading time, writing time and miss penalty. After all parameters have been specified, we will load a trace file and we will choose this file, from the trace files of the SPEC 2000 benchmarks. Our final objective is the calculation of the simulator's global CDLR parameter (CDLR parameter of memory hierarchy) and the calculation of the overall hierarchy's MTTF. The DLR SPEC 2000 program is built as the SMP Cache 2.0 program for working with Traces. The difference between them is the fact the DLR SPEC 2000 program uses traces of SPEC 2000 benchmark. The parameters of the CDLR SPEC 2000 program can be modified, by the user [6], [7].

Another parameter to be configured is the number of levels of the memory hierarchy. The hierarchy can have: one, two, three or four cache levels. The mapping function is another parameter to be set and it can be: direct, set associative or fully associative. The next parameter to be set is the replacement algorithm. For this replacement algoritm we can choose between: LRU, LFU, FIFO or Random. Writing policy is another parameter to be configured. We can choose as writing policy, or write through or write back. If the number of cache levels is chosen, for example four, we must push Cache Setup button.

Figure 2 If Setup Cache button is pressed we will open the window that sets the size of the four cache levels, window shown in figure 1. The size of a level we can choose between 8 KB and 32768 KB, noted that the sum of the four levels cache size must not exceed the size of main memory. After finishing this selection the Simulation button will appear in the main window, as shown in figure 3.

III. CONSTRUCTION DETAILS. III.1 Constructive details of the simulator - CACHE The experimental environment Cache is written using Microsoft Visual C++ programming environment. The main program interface of experimental environment Cache it is presented in figure 2. This program allows the configuration of the memory hierarchy. The first setting that we can do is to choose the size of main memory, which can range from 256 KB to 1,048,576 KB. The next parameter of the program to be configured is the block size, size that can be: 1 B, 2 B, 4 B and 8 B.

Figure 1

Figure 3 After pressing Simulare button, another window will appear in which we must select one of the available traces. This window is shown in figure 4. In fig. 4 traces selection is made activating the Browse button. After pressing this button a window named Open will open, where the trace files are listed. In our case, this list includes the following files: CEXP.PRG, COMP.PRG, EAR.PRG, HYDRO.PRG, MDLJD.PRG, NASA7.PRG, SWM.PRG, UCOMP.PRG and WAVE.PRG [BYUC01]. Source codes of programs: Cache.cpp, CacheDlg.cpp, CacheDimensiune.cpp, MemCache.cpp and Simulare.cpp are not presented in this paper.


Figure 4 III.2 Constructive details of the simulator CDLR SPEC 2000 CDLR SPEC 2000 is an experimental environmental for evaluating performance/reliability parameters of caches. The experimental environment is developed using Windows operating system. On running this environment in Windows operating system a number of issues have emerged. One problem is very large computing time because the environment CDLR SPEC 2000 uses traces of SPEC 2000 benchmarks, which have very large sizes, hundreds of MB or even Gbytes. Other problem is that for certain cache configurations the simulation environment will block and we must reboot the system. Because these issues we used the UNIX operating system, resulting version CDLR SPEC 2000 [4], [5]. The environment consists of the following programs: main.c, read.h, run.h, data.h, write.h. We have created input files and output files wich are obtained with CDLR SPEC 2000 simulator. Therefore we have the input file - spec1win.in, where is presented the structure of a two-level memory hierarchies. It is noted that in this example on the first four lines we have set four parameters: mapping function (associative type 1), block size (4), writing policy (write back) and the replacement algorithm (Random). The next two lines specify that our our memory hierarchy has two levels and specify also the parameters of each level. The first level has a size of 1024 bytes, MTTF = 10000000 hours, fetch time = 1 ns, reading time = 3 ns, writing time = 5 ns and miss penalty = 10 ns. The second level has a size of 4096 bytes, MTTF = 900000000 ore, fetch time = 10ns, reading time = 20ns, writing time = 50ns and miss penalty = 150ns. On the last line we will load the tracefile and we can also specify the trace archives for which we run the simulation. In our case the trace file uploaded is fft1.prg. Output file is the file spec1win.out. CDLR SPEC 2000 program parameters can be modified by the user. In a run of the CDLR SPEC 2000 program we must specify both simulation parameters and context, this will be specified in an input file (file that will have .in extension.). Therefore we shall establish the mapping function and so the memory can be: direct mapped (-1), fully associative mapped (1) or can be set associative

(0). The next parameter that must be specified is the block size. The third parameter that must be specified is the writing policy, this may be: write back (1) or write through (2). The fourth parameter that must be specified is the replacement algorithm: LRU (1), LFU (2), Random (3) or FIFO (4). We must specify also for each level of the memory hierarcy the next parameters: memory size, MTTF, fetch time, read time, write timp and miss penalty. After all the parameters where specified we will load the trace file, and this file is selected from the trace files of SPEC 2000 benchmark [9]. The structure of this input file is presented below: - asociativitatea 4 - dimensiune_bloc 16 - politica_scriere back - algoritm_inlocuire fifo - nivel 4096 1000000 1 3 5 10 - nivel 16384 90000000 10 20 50 150 - nivel 32768 900000000 20 40 90 300 - sbcrun swim_m2b 1 21 In the above example we have the structure of spec1.in input file. It is noted that in this example on the first four lines we have set four parameters: mapping function (asociativitatea), block size (dimensiune_bloc), writing policy (politica_scriere) and the replacement algorithm (algoritm_inlocuire). The next three lines specify that our memory hierarchy has three levels, and what are the parameters of each level. The first level has a size of 4096 bytes, MTTF=1000000 hours, fetch time = 1 ns, reading time = 3 ns, writing time = 5 ns and miss penalty = 10 ns. The second level has a size of 16384 bytes, MTTF= 90000000 hours, fetch time = 10 ns, reading time = 20 ns, writing time = 50 ns and miss penalty = 150ns. The third level has a size of 32768 bytes, MTTF = 900000000 hours, fetch time = 20 ns, reading time = 40ns, writing time = 90ns and miss penalty = 300 ns. On the last line we will load the trace-file and we can also specify the trace archives for which we run the simulation. Launching in execution of the program is done with the command:: main input_file output_file The simulator load the dates from de input file, and after the computation we have an ouput file. For example, if we will use the input file presented above, spec1.in, we will get the ouput file, spec1.out, presented below: ----------------[ LEVEL 1 ]-------------------------MTTF Level : 1000000 Delay Average Level: 33516 Times Level: 519963 ----------------[ LEVEL 2 ]-------------------------MTTF Level : 90000000 Delay Average Level: 3034236 Times Level: 22642 ----------------[ LEVEL 3 ]-------------------------MTTF Level : 900000000 Delay Average Level: 31605791 Times Level: 1680 MTTF Ierarhie: 987925


CDLR ierarhie: 6.764735 CDLR statisic: 10.349626 The CDLR SPEC 2000 simulator uses the traces of SPEC 2000 benchmarks that were very large. Because of these very large size of traces (eg 10 Gbyte) is necessary to the use a specific compression algorithm for traces, which must reduce both the trace size and the simulation time. Therefore we have identified two conditions: - significant reduction of trace size; - reducing simulation time to obtain better results. The simulator uses the algorithm SBC (StreamBased Compression). [13]. The used algorithm achieves a very good compression ratio and decompression time for both instruction and data address traces. This algorithm can be easily implemented with high-level programming languages. In our case we have implemented this algorithm using the C language, and we have combined with compression algorithms to reduce the size of traces [13]. The algorithm is based on the use of the following three files: STF file (Stream Table File), SBIT file (Stream Based Instruction Trace) and SBDT file (Stream Based Data Trace) [13]. IV. CONCLUSIONS We have built an experimental environment Cache that can be used for the study of cache memory behaviour. Our CACHE experimental environment has a major disadvantage, he does not take into account the performance and reliability indices of the caches, so we have built another more powerful simulator called CDLR SPEC 2000, which takes account of these indices. CDLR SPEC 2000 is also a simulation environment based on traces for systems with cache memories. This experimental environment can be installed on personal computers that have installed Windows or Linux operating systems. With the experimental environment, we can study the following parameters: mapping function, block size, writing policy, replacement algorithm, cache size, number of cache sets (for set associative caches) and the number of words in a block. This simulator can be used to study the behavior of cache memories. Also, CDLR SPEC 2000 program introduces the calculation of the CDLR parameter of a memory hierarchy. The experimental environment doesn’t have an graphical interface, the execution of files is made in DOS window This environment it is designed, to work with trace files, but the major difference is that the CDLR SPEC 2000 program use traces of SPEC 2000 benchmarks, which have very large size of traces, hundreds of MB or even Gbytes. [10]. ACKNOWLEDGMENTS This work was cofinanced from the European Social Fund through Sectoral Operational Programme Human

Resources Development 2007-2013, project number POSDRU/89/1.5/S/56287 „Postdoctoral research programs at the forefront of excellence in Information Society technologies and developing products and innovative processes”, partner University of Oradea REFERENCES [1] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic Concepts and Taxonomy of Dependable and Secure Computing, IEEE Transactions on Dependable and Secure Computing, Vol.1, No.1, pp 11 - 33., JanuaryMarch 2004. [2] J. L. Hennessy, D. A. Patterson, Computer Architecture. A Quantitative Approach, 2nd Edition, San Mateo, Morgan-Kaufmann Publishing Co., 1996. [3] J. L. Hennessy, D. A. Patterson, Computer Arhitecture. A Quantitative Approach, 3rd Edition, San Mateo, CA, Morgan-Kaufmann Publishing Co., 2003. [4] Ovidiu Novac, M. Gordan, M. Novac, Data loss rate versus mean time to failure in memory hierarchies, Advances in Systems, Computing Sciences and Software Engineering, Proceedings of the CISSE’05, Springer, University of Bridgeport, USA, pp. 305-307, 2005. [5] Ovidiu Novac, M. Vladutiu, St. Vari-Kakas, M. Novac, M. Gordan, A Comparative Study Regarding a Memory Hierarchy with the CDLR SPEC 2000 Simulator, Innovations and Information Sciences and Engineering, Proceedings of the CISSE’06, University of Bridgeport, Springer, pp. 369-372, USA, 2006. [6] Ovidiu Novac, Şt. Vari-Kakas, O. Poszet, Aspects Regarding the use of Error Detecting and Error Corecting Codes in Cache Memories, International Conference on Engineering of Modern Electric Systems, EMES’07, University of Oradea, 2007. [7] Ovidiu Novac, M. Vlăduţiu,. Şt. Vari-Kakas, F. I. Hathazi,. M. Novac, Aspects Regarding the use of SECDED Codes to the Cache Level of a Memory Hierarchy, AIKED’08, University of Cambridge, 20-22 Feb, 2008. [8] Ovidiu Novac, M. Novac, Aspects regarding cache memory simulators, International Conference on Renewable Sources and Environmental ElectroTechnologies, RSEE 2002, University of Oradea, pp. 199 – 203, 2002. [9] Ovidiu Novac, M. Novac, V. Paşca, Theoretical and experimental study regarding the simulation of a memory hierarchy, EPE, Buletinul Institutului Politehnic Iaşi, Tomul LII (LVI), fascicola 5B, Electrotehnică, Energetică, Electronică, pp. 915 – 920, 2006. [10] Standard Performance Evaluation Corporation. SPEC CPU 2000 benchmarks, http://www.specbench.org/osg/cpu2000. [11] T.R.N. Rao, E. Fujiwara, Error-Control Coding for Computer Systems, Prentice Hall International Inc., Englewood Cliffs, New Jersey, USA, 1989. [12] M. A. Vega-Rodríguez, J. M. Sánchez-Pérez, J. A. Gómez-Pulido, An Educational Tool for Testing Caches on Symmetric Multiprocessors, Microprocessors and Microsystems, Elsevier Science, Vol. 25, No. 4, pp. 187 - 194, June 2001. [13] A. Milenkovic, M. Milenkovic, Exploiting Streams in Instruction and Data Address Trace Compression, Electrical and Computer Engineering Department, University of Alabama, 2003.


Reliability Increasing Method Using a SEC-DED Hsiao Code to Cache Memories, Implemented with FPGA Circuits NOVAC Ovidiu1, SZTRIK Janos2, VARI-KAKAS Stefan3, KIM Che-Soong4 University of Oradea, Romania, Department of Decorative Arts and Design, Faculty of Visual Arts, 3 Department of Electrical Engineering, Faculty of Electrical Engineering and Information Technology, 1, Universităţii Str., 410087 Oradea, Romania, E-Mail: [email protected] 1

University of Debrecen, Hungary, Department of Informatics Systems and Networks, Faculty of Informatics, Egyetem tér 1., 4032 Debrecen, Hungary, E-Mail: [email protected] 2

4

Sangji University, South Korea, Department of Industrial Engineering, Kangwon 220-7021, Wonju, South Korea, E-Mail: [email protected],

Abstract – In this paper we will apply a Hsiao code to the cache level of a memory hierarchy to increase the reliability of the memory. We have selected the Hsiao code from the category of SEC-DED (Single Error Correction Double Error Detection) codes. For correction of a single-bit error we use, a check bits generator circuit, a syndrome generator and a syndrome decoder. Implementation of SEC-DED code in the cache is made with FPGA Xilinx circuits. Keywords: SEC-DED; cache; FPGA circuits; HSIAO code; I. INTRODUCTION In the design of the computer systems an issue that raises particular problems is the slowly increase of memory speed compared with the increase of processor speed [1],[2]. The processor allocates, during the execution time, an increasing fraction of time, waiting for data to be brought from main memory. To reduce the gap between processor speed and memory speed, current processors allocates most of their hardware resources, to the cache level. For example Intel Itanium 2 processor allocates 86% of its transistors to L3 cache level [3],[5]. Cache memory is the fastest storage buffer for the central processing unit of a PC. In this paper we will use both names: cache and cache memory. In the efficiency analysis methods to increase the dependability of a memory hierarchy, the most vulnerable part to turn to critical applications in terms of reliability is the cache memory. A memory hierarchy is the solution to the need for programmers to have a large and fast memory. This

hierarchy is organized on several levels, each with less storage capacity, higher speed and cost per bit higher than the previous level. The objective that we are looking, in a memory hierarchy, is to obtain a memory system that have a cost almost as low as the cheapest level of memory and speed almost as great as fastest level of hierarchy [6]. Memory hierarchy is based on several fundamental properties of information storage technology. Different storage technologies have different access times. Faster technologies have a higher cost per bit than slower technologies, but those have a greater capacity to store the information. Figure 1 represents such a memory hierarchy [7]. The hierarchy has at the bottom level, the slowest, most expensive and the highest capacity storage. As we move to the top of the pyramid, we have increasingly faster levels, greater cost per bit and with less storage capacity.

Figure 1. Memory hierarchy


In the level situated on the top of the pyramid (L0), we have a small number of CPU registers, with low access time, because they are accessed by the CPU in one clock cycle. In the next, one or two levels, there is a medium size SRAM cache, which can be accessed in a few CPU clock cycles. On the next level there is a DRAM main memory, with large storage capacity. This memory can be accessed in tens or hundreds of clock cycles. Below are local disks, with very large dimensions and with the disadvantage that they are very slow. On the last level, some systems include an additional level disks or remote servers, which can be accessed via a network. A possible solution to increase the reliability of the cache level is the use of fault tolerant approach in the design. Traditionally this is realized by the introduction of information redundancy based on data coding. The widely used code for fault tolerant design is the Hamming code, based on multiple parity bit generation. We present and implement a more efficient design, with a modified version of this code. The resulted design was implemented and tested in an FPGA circuit. II. APPLYING HSIAO CODE TO CACHE MEMORY In modern computer systems, at the cache level of the memory hierarchy, we can successfully apply multiple error correction codes. This type of code for detection and correction of errors are added to memories to obtain a better reliability. In high speed memories the most used codes are Single bit Error Correcting and Double bit Error Detection codes (SEC-DED) [7]. This codes can be implemented in parallel as linear codes for this type of memories. We have chosen the Hsiao code, because of its properties. Hsiao code is a SEC-DED code preferred in computer technique due to its favourable recovery capacilty from multiple errors. The Hsiao code is a modified Hamming code, with an odd-weight-column, because every column contains an odd number of 1's [4],[9]. In the check matrix of the Hsiao code, another property is that no two columns are the same [8],[9],[10]. For the cache memory we use a (22,16,6), Hsiao code . For this code there are k = 6 control bits, u = 16 useful (data) bits and the total number of code bits is t = 22. In this case, for correcting a single bit error it is satisfied the condition 2k>u+k+1. Usually it is enough a number of k= 5 control bits, but we will use k = 6 control bits, in order to achieve a double bit error detection. Parity check matrix of the Hsiao code, is defined by matrix H presented below: 1 0  0 H  0 0  0

0 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1  0 0 1 0 0 1 1 0 0 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 1 1  0 0 0 0 1 0 0 1 1 1 0 0 0 0 1 0 1 1 1 1 1

We have generated the Hsiao matrix, so that the column vectors corresponding to useful information bits to be different one from other. A typical codeword, of this matrix has the folowing form: u=(c0c1c2c3c4c5u0u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15) It has parities in positions 1,2,3,4,5, 6 and data bits from position 7 to 22. The control bits are calculated with parity equations (2) : c0 = u0u1u2u3u5u6u10u11u12u13u14 c1 = u3u4u5u6u7u10u11u12u13u15 c2 = u0u4u6u7u8u10u11u12u14u15 c3= u0u1u5u7u8u9u10u11u13u14u15 (2) c4 = u1u2u8u9u10u12u13u14u15 c5 = u2u3u4u9u11u12u13u14u15 Decoding of a received vector uses the syndrome equations (3) : s0 =c0u0u1u2u3u5u6u10u11u12u13u14 s1 =c1u3u4u5u6u7u10u11u12u13u15 (3) s2=c2u0u4u6u7u8u10u11u12u14u15 s3 =c3u0u1u5u7u8u9u10u11u13u14u15 s4 =c4u1u2u8u9u10u12u13u14u15 s5 =c5u2u3u4u9u11u12u13u14u15 We will apply this SEC-DED code to the design of the error control part of a 64K x 16 bit cache memory. When the information is retrieved from the cache, we read the useful data bits (u0 u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 u11 u12 u13 u14 u15) and the control bits (c0 c1 c2 c3 c4 c5) too. We implement with XOR gates, the equations (2) and generate the control bits c0’ c1’ c2’ c3’ c4’ c5’ from data bits that we have read from the cache. For example, to generate the control bit c0’, we use equation (4): c0’= u0u1u2u3u5u6u10u11u12u13u14, (4) In order to implement this equation we use 10 XOR gates with two inputs, situated on four levels, as presented in figure 2. We do in the same mode to generate all control bits, c1’, c2’, c3’, c4’, c5’. The generated control bits (c0’c1’c2’c3’c4’c5’) are compared with the control bits that we have read from the cache (c0 c1 c2 c3 c4 c5), also with two input XOR gates, and we get as result syndrome equations: s0 = c0  c0’, s1 = c1  c1’, s2 = c2  c2’, s3 = c3  c3’, s4 = c4  c4’, s5 = c5  c5’. We connect one NOT gate on each syndrome line, and we construct with 16 AND gates with six inputs, the syndrome decoder. Equations (5) are used to build the syndrome decoder. u0,  s0  s1 · s 2 · s3 . s 4  s5

(1)

u1,  s0 · s1  s 2  s3  s4  s 5


In figure 2 we have designed an error correction scheme based on Hsiao (22,16,6) code, which can be implemented with FPGA Xilinx circuits.

u2,  s 0  s1  s 2  s 3  s4  s5

u  s0  s1  s 2  s3  s 4  s5 , 3

u4,  s 0  s1  s 2  s 3  s 4  s5

III. IMPLEMENTATION OF HSIAO CODE TO THE CACHE MEMORY.USING FPGA XILINX CIRCUITS

u5,  s 0  s1  s 2  s3  s 4  s5 u6,  s0  s1  s2  s3  s 4  s5 u7,  s0  s1  s 2  s3  s 4  s5 u  s0  s1  s 2  s3  s4  s5 , 8

(5)

u9,  s 0  s1  s 2  s 3  s4  s5 u10,  s0  s1  s2  s3  s4  s5 u11,  s 0  s1  s2  s 3  s 4  s5 u12,  s0  s1  s2  s3  s 4  s5 u13,  s 0  s1  s 2  s3  s 4  s5 u14,  s 0  s1  s 2  s3  s 4  s5 u15,  s 0  s1  s 2  s3  s 4  s5

The design process with FPGA Xilinx circuits is fast and efficient. The internal structure of an FPGA circuit contains a matrix composed from Configurable Logic Blocks (CLB) and Programable Switch Matrices (PSM), surrounded by I/O pins. The programable internal structure includes two configurable elements: Configurable Logic Blocks, with functional elements that implements the designed logical structure and Input Output Blocks (IOB), wich realises the interface between internal signals and the outside of circuit, using pins. The logical function realised by CLB is implemented by static configuration memory [7].

We present in figure 2 a scheme used for single error correction.

Figure 2. Design of Hsiao (22,16,6) code circuit with XILINX software tool

To correct the data bits we use 16 XOR gates with two inputs, following equations (6). u0cor= u0  u0’ u1cor= u1  u1’ u2cor= u2  u2’ u3cor= u3  u3’ u4cor= u4  u4’ u5cor= u5  u5’ u6cor= u6  u6’ u7cor= u7  u7’ u8cor= u8  u8’ u9cor= u9  u9’ u10cor= u10  u10’ u11cor= u11  u11’ u10cor= u10  u10’ u11cor= u11  u11’ u12cor= u12  u12’ u13cor= u13  u13’, u14cor= u14  u14’ u15cor= u15  u15’

Figure 3. Simulation with Logic Simulator of XILINX

Figure 3 shows the simulation that we have made through the Logic Simulator module of XILINX program.

(6)

Figure 4. Implementation of Hsiao matrix (1) with FPGA XILINX, XC4000XL circuits


To follow up the simulation we have introduced: input signals (u0-u15), control signals (c0-c5), output signals (u0cor-u15cor) and signals ERORR and DED (double error detected). We have injected errors and checked the bahaviour of the design. Figure 4 presents the implementation of HSIAO matrix (1), with FPGA XILINX, XC4000XL circuits. Analyzing the file Map Report we can conclude that only 44 CLB circuits have been used from a total of 64, meaning 68% of the total CLB circuits. IV. CONCLUSIONS We have subsequently applied this Hsiao code, to error detection and correction in the cache and we have implemented this code in a cache memory using FPGA programmable Xilinx circuits. We have determined the overhead due the additional circuits for error correction. This Hsiao code has the minimum number of 1’s in the matrix, which makes the hardware and the speed of the encoding/decoding circuit optimal. The Hsiao code (22,16,6) that we have used to the cache level of a memory hierarchy permits single error correction and double error detection. This code was implemented to a cache memory, with this implementation we have reduced the size of the syndrome generator and the cost of error correcting scheme compared to the traditional Hamming code based solution. Another advantage is that if we increase the number of data bits, the proportion of overhead is decreasing. This solution using a SEC-DED Hsiao code, increases reliability through fault tolerance, leading to low cost and low memory chip dimension, because this method solves the problem of faults by testing and correcting errors inside the chip. Results of this research were supported by Domus Hungarica.

REFERENCES [1] John L. Hennessy, David A. Patterson, “Computer Architecture. A Quantitative Approach”, Morgan Kaufmann Publishers, Inc. 1990-1996. [2] John L. Hennessy, David. A. Patterson, “Computer Arhitecture. A Quantitative Approach”, 3rd Edition, San Mateo, CA, Morgan-Kaufmann Publishing Co., 2003. [3] J. Chang, Şt. Rusu, J. Shoemaker, S. Tam, M. Huang, M. Haque, et al., “A 130-nm Trimple-Vt 9-MB Third-Level On-Die Cache for the 1.7-GHz Itanium 2 Processor”, Journal on Solid State Circuits, vol. 40, no. 1, pp. 195 – 203, 2006. [4] T.R.N. Rao, E. Fujiwara,”Error-Control Coding for Computer Systems ”, Prentice Hall International Inc., Englewood Cliffs, New Jersey, USA, 1989. [5] L. D. Hung, “Soft Error Tolerant Cache Architectures”, PhD Thesis, Department of Information Science and Technology, University of Tokyo, December 2006. [6] H. R. Zarandi, S. G. Miremadi, “A Highly Fault Detectable Cache Architecture for Dependable Computing”, M. Heisel et al. (Eds.), SAFECOMP 2004, LNCS 3219, pp. 45 – 59, 2004. [7] Ovidiu Novac, “Cercetări ale eficienţei metodelor de creştere a dependabilităţii la treapta cache a unei ierarhii de memorii”, PhD Thesis, ISBN: 978-973-625-593-9, Editura Politehnica, Timişoara, 2008. [8] P. L. Howard, “The Design Book: Techniques and Solutions for Digital Computer Systems”, Prentice-Hall Inc., Englewood Cliffs, N. J. 1990. [9] A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr, “Basic Concepts and Taxonomy of Dependable and Secure Computing”, IEEE Transactions on Dependable and Secure Computing, Vol.1, No.1, pp 11 - 33, JanuaryMarch 2004. [10] W. Huffman, V. Pless, Fundamentals of error-correcting codes, Cambridge University Press, ISBN 9780521782807, 2003.


A Study on Parallel RKAM for Raster FCNN Simulation SUKUMAR Senthilkumar*, MT PIAH Abd Rahni School of Mathematical Sciences, Universiti Sains Malaysia, 11800 USM Pulau Pinang, Malaysia. E-Mail:[email protected]* [email protected]* [email protected] *Corresponding author

Abstract – This paper addresses the raster fuzzy cellular neural network simulation for any kind as well as any size of input image using parallel arithmetic and geometric mean algorithms of type-1 and type-2 (2-parallel 2-processor 2-stage 3-order) by an efficient program fragment in order to exploit its latency properties effectively. Significant simulation outputs are displayed with respect to the methods and paradigm. Keywords: Parallel numerical integration techniques, Fuzzy cellular neural network, Raster scheme, Edge detection, Simulation, Ordinary differential equations. I. INTRODUCTION Increasing availability of parallel computers has recently spurred substantial amount of research concerned with the possibilities in exploiting parallelism in numerical solution of initial value problems (IVPs) for ordinary differential equations. The reason is being attributed to the fact that parallel numerical algorithms have tremendous computational speed and more versatile in nature. Cellular neural network is as simple as analog circuit, with an ability to real time information processing proposed by Chua [1] and Chua and Yang [2 - 3]. Zadeh [11] introduced the concept of fuzzy sets (FSs) theory, and different notions of higher-order FSs have been proposed by various researchers. The fuzzy CNN integrates fuzzy logic into logical structures of the traditional CNN and maintains local connectedness among cells. The FCNN has fuzzy logic between its template and input and / or output besides the sum of product operation [12, 13]. The FCNN is merely an extension of the classical CNN from classical to fuzzy sets. Unlike fuzzy neural networks [14], in which the fuzzy block is placed in series with the CNN block or embedded fuzzifier, fuzzy inference engine and defuzzifier into three different layers. The fuzzy block in the FCNN is placed in parallel with normal CNN block by introducing fuzzy feedback templates and fuzzy feed

forward templates. Furthermore, the FCNN [15-19] uses a planer structure to embed the fuzzier, fuzzy inference engine and defuzzifier for the purpose of easy VLSI implementation. It is understood that the characteristics of cellular neural networks (CNNs) are analog, time-continuous, non-linear dynamical systems and formally belong to the class of recurrent neural networks. Lee and de Gyvez [4] introduced Euler, improved Euler, predictor-corrector and fourth-order (quartic) Runge-Kutta algorithms in raster CNN simulation. The first widely used simulation system which allows the simulation of a large class of CNN especially suited for image processing applications was proposed by Chua and Roska [5], see also Roska et al. [6], Roska [7] and Gonzalez et al. [8]. Runge-Kutta (RK) techniques have become very popular for computational purposes [9]. RK algorithms are used to solve differential equations efficiently that are equivalent to approximating the exact solutions by matching ‘n’ terms of the Taylor series expansion. Parallel Runge-Kutta integration formulas have been obtained by Evans and Sanugi [10]. Nishizono and Nishio [26] studied image processing of gray scale images by fuzzy cellular neural network. Bader [20, 21] introduced RK-Butcher algorithm to find truncation error estimates, intrinsic accuracies and early detection of stiffness in coupled differential equations that arises in theoretical chemistry problems. Oliveira [22] introduced the popular RK-Gill algorithm for evaluation of effectiveness factor of immobilized enzymes. Senthilkumar and Piah [23] implemented parallel Runge-Kutta arithmetic mean algorithm in order to obtain a solution to a system of second order robot arm. Using the existing RK-Butcher fifth order method the problem of raster CNN simulation has been analyzed by Murugesh and Murugesan [24]. A detailed discussion on single layer / raster scheme for edge detection using cellular neural network paradigm by new fourth order four stage algorithm is given by Senthilkumar [27]. In this article, simulation for raster scheme is employed under FCNN environment


as an alternative approach to use parallel RK arithmetic mean algorithm and parallel RK geometric mean algorithm of type-1 and type-2 respectively. The rest of the paper is organized as follows. A brief idea is given in section 2 on fuzzy cellular neural networks (FCNNs). Section 3 deals with an overview of raster scheme and edge detection template. Parallel numerical integration techniques are given in section 4. Section 5 deals with FCNN raster performance simulation outputs. A brief conclusion and discussion is presented in section 6. Fig. 1 Circuit of cell Cij in fuzzy CNN.

II. FUZZY CELLULAR NON-LINEAR NETWORK Fuzzy set theory offers mathematical power to detain uncertainties in every phase of image processing. Fuzzy set theory is integrated into the CNN model to form a novel image processing framework. FCNN is a powerful tool for image processing problems as found in the literature. It is noticed that all structures and algorithms have been closely concerned with its realization by the state of the art of VLSI technology. The design of FCNNs includes local connectedness between neurons (cells) and simple cell structures. Fuzzy cellular neural network (FCNN) is a generalization of cellular neural networks (CNNs) by using fuzzy operations in the synaptic law computation, which allows us to combine the low level information processing capability of CNNs with the high level information processing capability, such as image understanding of fuzzy systems. Definition 1 Let X be a nonempty set. A fuzzy set A in X is characterized by its membership function  A : X  [0,1] and  A ( x) is interpreted as the degree of membership of element x in fuzzy set A for each x  X , i.e.

A  { x,  A ( x) : x  X }.

(1)

Definition 2 The r-neighbourhood of a cell Cij in an M × N FCNN is defined by

The state equation of the cell Cij is defined by the following equation,

c

 1  k  M ,1  l  N }





1 vxij (t )  Rx c ( k ,l )N



A(i, j; k , l )v ykl (t ) 

r (i , j )

B(i, j; k , l )vukl (t )

~

 I ij  F ACkl N r (i , j ) A f (i, j; k , l )v ykl  ~

F BCkl N r (i , j ) B f (i, j; k , l )vukl (3) Output and input equations are identical to those of CNN. Af(i, j; k, l) and Bf(i, j; k, l) are the synaptic weights are employed by fuzzy operations, ~

~

F ACkl Nr (i , j ) (.) and F BCkl Nr (i , j ) (.); Af and Bf are selected in such a way, according to the application ~

~

needs. FA (.) and FB (.) indicates the two fuzzy local operators defined in Nr(i,j) which may be any fuzzy ~

logical expression combined by fuzzy OR  and fuzzy ~

AND  . Hence, the equation (3) can be recasted as follows:

(2)

where r denotes a positive integer. Nr(i,j) has a symmetry property in the sense that if CijNr(k,l) then CklNr(i,j). The architecture of fuzzy CNN has almost identical to CNN. The cell in the ith row and jth column are denote as cell Cij. The circuit of Cij is shown in Fig. 1, where the suffixes u, x and y denote input, state and output, respectively.

dt

c ( k ,l )N r (i , j )

c

N r (i, j )  {Ckl / max( k  i , l  j )

dvxij (t )

dxij (t ) dt





1 xij (t )  Rx c ( k ,l )N



A(i, j; k , l ) ykl (t ) 

r (i , j )

B(i, j; k , l )ukl (t )

c ( k ,l )N r (i , j )

 I ij 



c ( k ,l )N r (i , j )



c ( k ,l )N r (i , j )





( A f max (i, j; k , l )  ykl (t )) 

c ( k ,l )N r (i , j )



c ( k ,l )N r (i , j )

( A f min (i, j; k , l )  ykl (t )) 

( B f min (i, j; k , l )ukl (t )) 

( B f max (i, j; k , l )ukl (t )).

(4)


where Afmin, Afmax, Bfmin, Bfmax are the feedback MIN, feedback MAX, feedforward MIN, feedforward MAX templates, respectively. The input equation of Cij is given by (5) uij  Eij  0, 1 ≤ i ≤ M; 1 ≤ j ≤ N. The output equation of Cij is given by 1 yij  f ( xij )  (  xij  1  xij 1 ) ,  2  for 1 ≤ i ≤ M; 1 ≤ j ≤ N. The constraints / conditions are given by

(6)

Afmax(i,j;k,l) = Afmin(k,l;i,j); Afmax(i,j;k,l) = Afmax(k,l;i,j); Bfmax(i,j;k,l) = Bfmin(k,l;i,j); Bfmax(i,j;k,l) = Bfmax(k,l;i,j);

(7)

found that the states of all FCNN processors have converged to steady-state values, and the outputs of its neighbor cells are saturated [2,3]. This whole simulating approach is referred to as raster simulation. This raster approach implies that each pixel of the image is mapped onto a FCNN processor. It has an image processing function in the spatial domain that is expressed as g(x,y) = T(f(x,y))

where g(.) is the processed image, f(.) is the input image, and T is an operator on f(.) defined over the neighborhood of (x,y). It is an exhaustive process from the view of hardware implementation. For practical applications in the order of 250,000 pixels, the hardware would require a large amount of processors which would make its implementation unfeasible. An alternative option to this scenario is to multiplex the image processing operator.

xij (0)  1; uij (0)  1; A(i, j; k , l )  A(k , l; i, j ); for 1 ≤ i ≤ M; 1 ≤ j ≤ N. where Afmin(i, j; k, l), Afmax(i, j; k, l), Bfmin(i, j; k, l) and Bfmax(i, j; k, l) are elements of fuzzy feed-back MIN template, fuzzy feed-back MAX template, fuzzy feedforward MIN template, and fuzzy feedforward MAX template, respectively. A(i, j; k, l) and B(i, j; k, l) are elements of feedback template and feedforward template, respectively. Obviously, the set of controlling parameters in FCNN is {A, B, Afmin, Afmax, Bfmin, Bfmax, C, R, Iij=I}. (8) III. RASTER FCNN SCHEME Raster FCNN simulation is an image scanningprocessing technique for solving the system of difference equations of FCNN. The equation (1) is space invariant, which means that A(i,j;k,l) = A(i-k,j-1) and B(i,j;k,l) = B(i-k,j-1) for all i, j, k, l as same as CNN. Therefore, the solution of the system of difference equations can be seen as a convolution process between the image and the FCNN processors. The fundamental approach is to imagine a square subimage area centered at (x,y), with the subimage being the same size of the templates involved in the simulation. The center of this subimage is then moved from pixel to pixel starting, say, at the top left comer and applying the A and B templates at each location (x,y) to solve the differential equation. This procedure is repeated for each time step, for all the pixels in the image. An instance of this image scanning-process is referred to as “iteration”. The processing stops when it is

(9)

A.Edge Detection Template From the literatue study, edge detection is one of the most significant and complicated steps in image processing and also in pattern recognition systems [25]. Its value arises from the fact that an edge frequently gives an indication of the physical extent of an object within the image. Edge provides sufficient information about the image so that the size of the image data can be reduced to the size that is more suitable for image analysis. It is understood that the performance of the tasks after edge detection, such as image segmentation, boundary detection, object recognition and classification, and image registration are dependent on the information obtained exclusively on the edge. But, on the other hand, noise is a common problem in acquisition, transmission and processing of image, which will decrease an image quality seriously. In addition, it will lead to unexpected outputs when images are processed with noise using classical edge detection operators, such as Roberts, Sobel, Prewitt and LOG operators. Edge can be shown as the character of high frequency (more acute and clear) in space [8]. For instance, the general input matrix format for edge detection of FCNN is given by

 xi 1, j 1  uij =  xi , j 1  xi 1, j 1 

xi 1, j xi , j xi 1, j

xi 1, j 1   xi , j 1  xi 1, j 1 

The control template of B is given by

bi 1, j 1 bi 1, j  Bij =  bi , j 1 bi , j bi 1, j 1 bi 1, j 

bi 1, j 1   bi , j 1  bi 1, j 1 


The feedback matrix A is given by

 ai 1, j 1  A ij =  ai , j 1  ai 1, j 1 

ai 1, j ai , j ai 1, j

C.

ai 1, j 1   ai , j 1  ai 1, j 1 

A parallel Runge-Kutta 2-stage 3-order geometric mean formula of type–II is of the form

k1  hf ( xn , yn ), k3  hf ( xn 

and the threshold is set to be I = Iij. IV. PARALLEL NUMERICAL INTEGRATION TECHNIQUES The FCNN can be described by a system of nonlinear differential equations. Hence, it is necessary to discretize the differential equation for performing behavioral simulation. A.

Parallel Runge-Kutta 2-stage 3-order arithmetic mean algorithm

A parallel Runge-Kutta 2-stage 3-order arithmetic mean technique is one of the simplest technique to solve ordinary differential equations. It is an explicit formula which adapts Taylor’s series expansion in order to obtain the approximation. A parallel Runge-Kutta 2stage 3-order arithmetic mean formula is of the form

k1  hf ( xn , yn ), k 1 k2  hf ( xn  , yn  1 )  k2* , 2 2 k3  hf ( xn  k1 , yn  k1 )  k3*.

(10)

Its final integration is a weighted sum of three calculated derivatives per time given by

1 yn 1  yn  [k1  4k2  k3 ]. 6 B.

(11)

Parallel Runge-Kutta 2-stage 3-order geometric mean algorithm of type-I

A parallel Runge-Kutta 2-stage 3-order geometric mean formula of type–I is of the form

k1  hf ( xn , yn ), 2k 2 k2  hf ( xn  , yn  1 )  k2* , 3 3

k3  hf ( xn  k1 , yn  k1 )  k3*.

(12)

Thus, its final integration is a weighted sum of two calculated derivates per time step given by 1

3

yn1  yn  k1 4 k2 4 .

Parallel Runge-Kutta 2-stage 3-order geometric mean algorithm of type-II

(13)

k1 k , yn  1 ). 6 6

(14)

Its final integration is a weighted sum of two calculated derivates per time step given by

yn1  yn  k14 k33 .

(15)

V. FCNN PERFORMANCE EVALUATION AND COMPARISONS We present an example which shows that an averaging template followed by an edge detection template applied to the original images (see Figs. 2(a) 3(a) and 4(a), respectively). The same procedure has been adapted in getting the results as shown in Figs. 3(b) and 3(c), 4(b) and 4(c), respectively. It is observed from Figs. 2, 3 and 4 that edges obtained by parallel arithmetic mean RK-algorithm are better compared to those obtained by parallel geometric mean RKalgorithm of type-1 and type-2, respectively. Since speed is one of the major concerns in the simulation, determining the maximum step size (Δt) that still yields convergence for a template can be helpful in speeding up the system. The speed up can be achieved by selecting an appropriate (Δt) for that particular template. Even though maximum step size may vary slightly from one image to another, values in Fig. 5 still serve as good references for three different templates from three parallel numerical integration techniques. These results were obtained by trial and error over more than 100 simulations on a cameraman figure.


Fig. 2. (a) Original cameraman image, (b) After averaging template, (c) After averaging and edge detection templates by adapting parallel 2-stage, 3rd-order RKGM (type-1).

Fig. 6 shows the importance of selecting an appropriate time step size (Δt). If the chosen step size (Δt) is too small, it might take many iterations, hence longer time, to achieve convergence. But, on the other hand, if the step size (Δt) is too large, it might not converge at all or it would converge to erroneous steady state values; the latter remark can be observed in the case of Euler algorithm. The results in Fig. 6 were obtained by simulating a small sized image of 256*256 pixels using averaging template on a cameraman figure. For a given step size (Δt), parallel RK arithmetic mean algorithm takes lesser simulation time in comparison with the other parallel RK numerical integration algorithms i.e. parallel RK geometric mean of type-2 and parallel RK geometric mean of type-1, respectively. 4

3.5

Parallel Arithmetic Mean Runge-Kutta Parallel Geometric Mean (Type-1) Runge-Kutta Parallel Gemetric Mean (Type-2) Runge-Kutta

3

2.5

2

1.5

1

0.5

0 1

2

Edge detection

Fig. 3. (a) Original cameraman image, (b) After averaging template, (c) After averaging and edge detection templates by adapting parallel 2-stage, 3rd-order RKGM (type-2).

3

Averaging

Connected component

Fig.5. Maximum step size (∆t) yields the convergence for three different templates by three different parallel numerical integration techniques.

40 Parallel Arithmetic Mean Runge Kutta 35

Parallel Geometric Mean(Type-1) Runge-kutta Parallel Geometric Mean(Type-2) Runge-Kutta

Simulation time in seconds

30

25

20

15

10

5

0 1

Fig.4. (a) Original cameraman image, (b) After averaging template, (c) After averaging and edge detection templates by adapting parallel 2-stage, 3rd-order RKAM.

2

3

4

5

Step size (t) step size (Δt)

6

7

8

Fig.6. Comparison of three parallel numerical integration techniques using averaging template.


VI. CONCLUSION A 2-parallel, 2-processor, 2-stage, 3rd-order RungeKutta arithmetic mean algorithm gives better results in comparison with a 2-parallel, 2-procesor, 2-stage, 3rdorder Runge-Kutta geometric mean algorithm of type-I and a 2-parallel, 2-procesor, 2-stage, 3rd-order RungeKutta geometric mean algorithm of type-II under fuzzy cellular neural network paradigm. It is noted that raster FCNN simulation can be performed for any kind as well as any size of input image and it is possible to explore the potential behavior of the paradigm. VII. ACKNOWLEDGEMENT The first author would like to extend his sincere gratitude to Universiti Sains Malaysia for supporting this work under its post-doctoral fellowship scheme. Much of this work was carried out during his stay at Universiti Sains Malaysia in 2011. He wishes to acknowledge Universiti Sains Malaysia’s financial support. REFERENCES [1] L. O. Chua, CNN: A Paradigm for Complexity, World Scientific, Singapore, 1998. [2] L. O. Chua and L. Yang, “Cellular neural networks: theory”, IEEE Transactions on Circuits and Systems, vol. 35, pp. 1257 - 1272, 1988. [3] L. O. Chua and L. Yang, “Cellular neural networks: applications”, IEEE Transactions on Circuits and Systems, vol. 35, pp. 1273 - 1290, 1988. [4] C. C. Lee and J. P. de Gyvez, “Single-layer CNN simulator”, IEEE International Symposium on Circuits and Systems, vol. 6, pp. 217 - 220, 1994. [5] L. O. Chua and T. Roska, “The CNN universal machine part 1: the architecture”, in Int. Workshop on Cellular Neural Networks and their Applications (CNNA), pp. 1 10, 1992. [6] T. Roska et al., CNNM Users Guide, Version 5.3x, Budapest, Hungary, 1994. [7] T. Roska, “CNN software library”, Hungarian Academy of Sciences, Analogical and Neural Computing Laboratory, 2000,[Online]. Available:http://lab.analogic.sztaki.hu/Candy/csl.html, 1.1 [8] R. C. Gonzalez, R. E. Woods and S. L. Eddin, Digital Image Processing using MATLAB, Pearson Education Asia, Upper Saddle River N.J, 2009. [9] J. C. Butcher, The Numerical Analysis of Ordinary Differential Equations, John Wiley & Sons, U.K , 2003. [10] D. J. Evans and B. B. Sanugi, “A parallel Runge-Kutta integration method”, Parallel Computing, vol. 11, pp. 245 251, 1989. [11] L. A. Zadeh, “Fuzzy sets”, Information and Control, vol. 8, pp. 338 - 353, 1965. [12] L. A. Zadeh, K. S. Fu, K. Tanaka and M. Shimura, Fuzzy Sets and their Applications to Cognitive and Decision Processes, Academic Press, London, 1975. [13] R. R. Yager and L. A. Zadeh (eds), An introduction to Fuzzy Logic in Intelligent Systems, Kluwer, Boston MA, 1992. [14] A. Kandel, Fuzzy Techniques in Pattern Recognition, Wiley, New York, 1982.

[15] T. Yang, L. B. Yang, C. W. Wu and L. O. Chua, “Fuzzy cellular neural networks: theory”, in Proceedings of IEEE International Workshop on Cellular Neural Networks and Applications, pp. 181 - 186, 1996. [16] T. Yang and L. B. Yang, “The global stability of fuzzy cellular neural networks”, IEEE Transactions on Circuit and Systems I, vol. 43, pp. 880 - 883, 1996. [17] T. Yang and L. B. Yang, “Fuzzy cellular neural network: a new paradigm for image processing”, International Journal of Circuit Theory and Applications, vol. 25, pp. 469 - 481, 1997. [18] T. Yang and L. B. Yang, “Application of fuzzy cellular neural networks to Euclidean distance transformation”, IEEE Transactions on Circuits and Systems I, CAS-44, pp. 242 - 246, 1997. [19] T. Yang and L. B. Yang, “Application of fuzzy cellular neural network to morphological grey-scale reconstruction”, International Journal of Circuit Theory and Applications, vol. 25, pp. 153 - 165, 1997. [20] M. Bader, “A comparative study of new truncation error estimates and intrinsic accuracies of some higher order Runge-Kutta algorithms”, Computers and Chemistry, vol. 11, pp. 121 - 124, 1987. [21] M. Bader,“A new technique for the early detection of stiffness in coupled differential equations and application to standard Runge-Kutta algorithms”, Theoretical Chemistry Accounts, vol. 99, pp. 215 - 219, 1998. [22] S. C. Oliveira, “Evaluation of effectiveness factor of immobilized enzymes using Runge-Kutta-Gill method: how to solve mathematical undetermination at particle center point?”, Bio Process Engineering, vol. 20, pp. 185 187, 1999. [23] S. Senthilkumar and A. R. M. Piah, “Solution to a system of second order robot arm by parallel Runge-Kutta arithmetic mean algorithm”, In TechOpen, pp. 39 - 50, 2011. [24] V. Murugesh and K. Murugesan, “Comparison of numerical integration algorithms in raster CNN simulation”, LNCS 3285, pp. 115 - 122, 2004. [25] H. Li , X. Liao, C. Li, H. Huang and C. Li, “Edge detection of noisy images based on cellular neural networks”, Communication Nonlinear Science Numerical Simulation, vol. 16, issue 9, pp. 3746 - 3759, 2011. [26] K. Nishizono and Y. Nishio, “Image processing of gray scale images by fuzzy cellular neural network”, in 2006 RISP International Workshop on Nonlinear Circuits and Signal Processing (NCSP'06), pp. 90 - 93, 2006. [27] S. Senthilkumar, New Embedded Runge-Kutta Fourth order Four Stage Algorithms for Raster and TimeMultiplexing Cellular Neural Networks Simulation, Ph. D Thesis, Department of Mathematics, National Institute of Technology [REC], Tiruchirappalli, Tamilnadu, INDIA, 2009.


A CNN Based Algorithm for Medical Images Correlation ŢEPELEA Laviniu, GAVRILUŢ Ioan and GACSÁDI Alexandru University of Oradea, Romania, Department of Electronics and Telecommunications, Faculty of Electrical Engineering and Information Technology, Universităţii Str. 1, 410087, Oradea, Romania, E-Mail: {ltepelea, gavrilut, agacsadi}@uoradea.ro

Abstract – This paper propose a CNN algorithm to process medical images for diagnosis. The algorithm is based on image correlation in CNN manner. Image correlation generally needs a lot of time to get a result. In CNN domain most of operations are achieved by parallel processing, so we can compute the correlation coefficients between two images in parallel manner. We can use such algorithm in a medical diagnosis assistance system, to process and analyze computer tomography images or in a system developed to assist people with visual impairments in moving. An analog chip for CNN computing is very expensive, but an FPGA hardware is not so expensive and is more popular. Now we can implement the correlation algorithm in CNN manner on FPGA. Keywords: image correlation; medical images; Field Programmable Gate Array (FPGA); cellular neural networks (CNN). I. INTRODUCTION Lately more and more people suffer from cancer due to stress, food, polluted air and electromagnetic radiation, nuclear radiation, etc. Over 8% of women have breast cancer during their lifetime [1][2]. For diagnosis and treatment of cancer, medical image processing was used during the past few decades. This research field became interdisciplinary, attracting expertise from applied mathematics, computer sciences, engineering, statistics, physics, biology and medicine. Since the discovery of X-rays, several imaging methods have been developed to visualize anatomy and tissue morphology of the human body: computer tomography (CT), magnetic resonance imaging (MRI), Doppler ultrasound, and various imaging techniques based on nuclear emission PET (positron emission tomography), SPECT (single photon emission computed tomography) for detection and diagnosis of disease [6]. A technique with very important role in medical image processing is segmentation. A good segmentation has great effect on other post processing tasks, such as image analysis and extraction. Other image processing technique like denoising, edge detection and pattern recognition are used to improve the readability of medical images [3].

The presence of noise in medical images makes the analysis more difficult. To eliminate the noise there are a series of algorithms available, some of them quite sophisticated [4][5]. The performances of segmentation techniques are difficult to evaluate. There isn’t a specific general method of segmentation to produce acceptable results for all types of medical images. Each of these methods has their advantages and disadvantages [7]. Therefore is a major need for new mathematical techniques for medical image processing. All of images processing technique are very useful to make better analyze and diagnose of diseases. Such methods are already included in sophisticated medical systems and assist the doctor in diagnosis. To determine the stages of diseases is useful to measure more precisely the area and volume of malign region from a radiographic image. But in a tomographic analyze there is a lot of images used like slices of a volumetric radiography and it is very important to compare the malign region from the same slices. There are several CNN algorithms which proved to be useful for image processing: noise extraction, image segmentation, contour determination [9][10]. For image segmentation, the latest technique uses the active contour method, which is usually classified either as energy-based. CNN processing methods in pattern recognition are fast because of parallel processing, but an analog CNN chip is very expensive. Another way to compute in parallel is to use a FPGA hardware which is not so expensive and is more popular. On a FPGA board we can emulate a CNN algorithm or even can compute in FPGA manner [8][11]. This paper proposes an image processing technique to determine if an object from a template small image is included in another big image. II. IMAGE CORRELATION METHOD We can use different metrics to analyze the degree of match between two grayscale images: Euclidean distance, Sum of Absolute Difference (SAD) (eq.1), Mean Absolute Difference (MAD) (eq. 2) and Normalized Cross Correlation (NCC) (eq. 3): P

SAD(i,j) 

Q

 K(p,q)  (p,q) p 1 q 1

(1)


P

MAD(i,j) 

Q

 K(p,q)  (p,q) p 1 q 1

(2)

For image correlation the normalized cross correlation (NCC) is used on large scale like this: (3) P

where, K(p,q) represents the template image or correlation kernel K: R2R, and K={(p,q): p[1,P], q[1,Q], P and Q  R+} respectively, (p,q) represents the current image compared to the template image, : R2R, and ={(p,q): p[1,P], q[1,Q], P and Q  R+}. Some of these metrics are relatively easy to implement on hardware, but they require high processing time. The first of the above metrics, are preferred instead of computing the correlation coefficients, mostly because of the computational cost reduction, but sometimes in detriment of accuracy. The CNNs (Cellular Neural Networks) proved to be very useful regarding real-time image processing [12]. For SAD and MAD metrics were developed algorithms that were implemented using CNN, on either analog CNN chip, or implemented on FPGA (Field Programmable Gate Array) in an emulated digital CNNUM (CNN-Universal Machine) [11][13]. There was developed an assisting system called Bionic Eyeglass [14], which uses the CNN implementation of image processing. Compared with a regular camera, Bi-i is a system completely independent. Central processing element is a Digital Signal Processor (DSP) produced by Texas Instruments. Bi-i V301F version integrates a Xilinx FPGA module used for fast image preprocessing. Visual sensor used is a highresolution CMOS sensor with 40-MHz pixel rate (grayscale or color). This integrated environment takes into account the recommendations of visually impaired persons and it have the following functions: recognition of clothes based on color and texture detection, recognition of pedestrian crossings, bill recognition, recognition of public transport signs and transport route numbers, recognition of the elevator sings and recognition of escalators movement directions. The development of the Bionic Eyeglass assistance system has taken into consideration the existence of three types of environments in which assistance is offered: at home, at work and the traveling between these two. One of the most difficult tasks of detection system is to assist pedestrian crossings due to conditions that interfere with their detection, such as partially covering with shadows, too illuminated areas, especially the existence of other pedestrians or passing vehicles. At testing were used also images no containing crossings but containing similar textures, and images with shadows left by a fence or images with stone slabs, to observe the algorithm effectiveness. Algorithms have shown effective use although there were some wrong decisions due to disturbances conditions, like shadows left by a fence and the authors are not fully satisfied with the result. The correlation coefficient is a metric that expresses the similarity (the matching) between two images (the template image and a region of the test image).

Q

 K(p,q)  K  (p,q)   

PQ

p 1 q 1

Corr(i,j) 

P

Q

 K(p,q)  K 2  p 1 q 1

P

Q

 (p,q)   

2

p 1 q 1

where, K(p,q) represents the template image, (p,q) represents the current image compared with the template image with central coordinates (i,j). K is the mean intensity of the template image,  is the mean intensity of the test image in the region centered at (i, j). The correlation coefficient has the value Corr(i,j)=1 if the two images are absolutely identical, Corr(i,j)=0 if they are completely uncorrelated, and Corr(i,j) = -1 if they are completely anti-correlated, for example, if one image is the negative of the other. The big image is processed pixel-by-pixel with the template image to test the degree of matching like in Fig. 1. Test Image - (m,n)

Correlation Image - Corr(i,j) Correlation

pixel(i,j) (p,q) Compared Image

K(p,q) Template Image Correlation Kernel

pixel(i,j)

Fig. 1. The image correlation principles.

III. CNN ALGORITHM USED FOR IMAGE CORRELATION The computing time for obtaining a correlation coefficient is dependent on the template image’s dimensions, increasing proportionally with them. But it is necessary a large enough size of the template image to contain relevant information. We can reduce the computing time to calculate the correlation coefficient between two images by using a CNN parallel processing algorithm and the time will not increase proportionally with the template size. Relation (3) can be rewritten as: Corr(i,j) 

K(p,q)  K  (p,q)    K(p,q)  K 2  (p,q)   2

(4)

When calculate the correlation coefficient between two images there are several operations that must be made: mediation, adding (subtracting), multiplying, operations which are achievable in the CNN field, but there are two operations that cannot be made direct in CNN domain: division and square root extraction. Given the significance and value domain of the resulted correlation coefficients, there are applications where the eq. (4) can be written in the form:


K(p,q)  K  (p,q)   

2

Corr 2 (i,j) 

(5)

K(p,q)  K 2  (p,q)   2

Using equation (5), the square root operation can be eliminated by replacing it with a multiplication of two images, operation that can be done in CNN domain. In this case we must say that, high values of correlation coefficient will result even in the situation where the two images are completely anti-correlated, but in this case are considered also correlated. Initially we used synthetic images for correlation and after that we used a set of real CT images.

(b4)

(a4)

(c4)

IV. TESTING THE CNN ALGORITHM FOR IMAGE CORRELATION We used the following simulation environments to test the proposed CNN algorithm for image correlation: InstantVision Integrated Software Environment [16], CadetWin (CNN Application Development Environment and Toolkit under Windows) [15] and Matlab Development Environment. In this section is presented the results of simulation of proposed algorithm using the relation (5) in computation.

(a6)

(a1)

(b1)

(b2)

(b3)

(c6)

(b7)

(c7)

(c2) (a8)

(a3)

(b6)

(c1)

(a7)

(a2)

(c5)

(b5)

(a5)

Fig. 3. Detection of malign region: (a4) test image; (a5) image preprocessed; (b4, b5) template image; (c4, c5) correlation image

(b8)

(c8)

(c3) (a9)

Fig. 2. Detection of brain malign region: (a1, a2, a3) test image; (b1, b2, b3) template image; (c1, c2, c3) correlation image

(b9)

(c9)

Fig. 4. Detection of malign region in abdominal slices: (a6, a7, a8, a9) test image; (b6, b7, b8, b9) template image; (c6, c7, c8, c9) correlation image


Processing the algorithm it could be several regions of interest characterized by correlation coefficients that pass the sensitivity threshold. In these cases the correlation coefficients are calculated for each of these regions by using the equation (5) and we can use an adaptive threshold to determine the biggest correlation coefficient and there is more accurate location of correlation. Using an FPGA board we can reduce the time of computing significantly and can make a few correlations in a second which can be considered a real time processing [8]. It is necessary to be a fast image processing because is need to analyze hundred of slices from a volumetric tomography to get the opportune slice to compare a malign region and to determine the evolution of a tumor in time. V. CONCLUSIONS Medical image processing is very used today to determine the diagnostic of disease by doctor or with help of an assisting system. Such a system extracts the noise from the image, applies a filter and makes segmentation or detects the edges to make more visible the region of interest for diagnostic. There are a lot of images to be fast processed in computer tomography. In this paper the authors presents a set of tests made on real CT images, with the proposed CNN algorithm for computing the correlation coefficients between two images. The image correlation technique is very useful to determine the stages of disease. The system can measure precisely the area and the volume of a malign region by comparing them with the same slice used before, detected by the correlation. Generally the computing time for correlation coefficient calculation is high and dependent on the size of the template image, increasing proportionally with it. But is necessary a large enough template image to contain relevant information. Computing the correlation coefficient using the proposed equation (5), the majority of operations included in the proposed algorithm are achievable by parallel processing in CNN manner. But on a FPGA board all the operations from equation (5) can be made. The results of tests are promising. Problems appear when is several regions of interest with several coefficients that passes the sensitivity threshold, but in these cases we can use an adaptive threshold to determine the biggest coefficient and there is more accurate the location of correlation between that images. The CNN correlation algorithm proposed in this paper and tested in simulation will be integrated in a complex system to assist medical diagnosis in computer tomography. On the other hand, the algorithm will be used also in a system to assist visually impaired persons, based on different type of sensors. ACKNOWLEDGMENT This work and publishing of paper was supported by CNCSIS – UEFISCSU, project number PNII – IDEI 668/2008.

REFERENCES [1]

[2]

[3]

[4]

[5]

[6] [7]

[8]

[9]

[10]

[11]

[12] [13]

[14]

[15]

[16]

Fact sheet No. 297: “Cancer”. World Health Organization.February2006.http://www.who.int/mediace ntre/factsheets/fs297/en/index.html. Retrieved 2009-0326. E. Supriyanto et al., “Abnormal Tissue Detection of Breast Ultrasound Image using Combination of Morphological Technique”, Proceedings of the 15th WSEAS International Conference on Computers, Corfu Island, Greece, pp. 234-239, July 15-17, 2011 D. L. Pham, C. Xu, J. L. Prince, “A Survey of Current Methods in Medical Image Segmentation”, Annual Review of Biomedical Engineering, Vol. 2, pp.315-338, 2000 J. Suri, D. Wilson, and S. Laxminarayan, “Handbook of Biomedical Image Analysis. Volume I: Segmentation Models Part A”, Kluwer Academic / Plenum Publishers, New York, ISBN 0-306-48550-8, 2005 S. Satheesh, K. Prasad, “Medical image denoising using adaptive threshold based on contourlet transform”, Advanced Computing: An International Journal ( ACIJ ), Vol.2, No.2, pp. 52-58, March 2011 H. Maras et al., “An overview of medical image processing methods”, African Journal of Biotechnology Vol. 9(24), pp. 3666-3675, 14 June, 2010 B. Alhadidi, M. Zu’bi, H. Suleiman, “Mammogram Breast Cancer Image Detection Using Image Processing Functions”, Information Technology Journal 6 (2): 217221, ISSN: 1812-5638, 2007 T. Yokotat, M. Nagafuchit, Y. Mekadat, T. Yoshinaga, K. Ootsut and T. Babat, “A Scalable FPGA-based Custom Computing Machine for a Medical Image Processing”, Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02), 2002 Z. Shi and L. He, “Application of Neural Networks in Medical Image Processing”, Proceedings of the Second International Symposium on Networking and Network Security (ISNNS ’10), Jinggangshan, P. R. China, pp. 023-026, ISBN 978-952-5726-09-1, April. 2010 M. Moghaddam and H. Zadeh, “Medical Image Segmentation Using Artificial Neural Networks”, Artificial Neural Networks - Methodological Advances and Biomedical Applications, InTech, Kenji Suzuki (Ed.), ISBN: 978-953-307-243-2, 2011 Z. Nagy, Zs. Vörösházi, P. Szolgay, “Emulated Digital CNN-UM Solution of Partial Diferential” Int. Journal of Circuit Theory and Applications Vol. 34, Issue 4, pp.445-470, 2006. T. Roska, L.O. Chua, “The CNN Universal Machine: An Analogic Array Computer”, IEEE Trans. on Circuits and Systems, Vol. 40, pp.163-173, 1993. Z. Kincses,, L.Orzó, Z. Nagy, G. Mező and P. Szolgay,, “High-Speed, SAD Based Wavefront Sensor Architecture Implementation on FPGA”, Journal of Signal Processing Systems, Springer, New York, 06 May, 2010. T. Roska, D. Balya, A. Lazar, K. Karacs, and R. Wagner, “System aspects of a bionic eyeglass,” in Proc. IEEE International Symposium on Circuits and Systems (ISCAS’06), 2006. *** CadetWin - CNN application development environment and toolkit under Windows, Version 3.0, Analogical and Neural Computing Laboratory, Comp. and Aut. Inst. Hungarian Academy of Sciences, Budapest,1999. *** Bi-i V301F-Vision System, InstantVision Integrated Software Environmentv 3.1, User's Manual, AnaLogic Computers Ltd.


Genetic Algorithm Based High Performance Control for Rigid Robot Manipulators ZERROUKI Nadjib 1, GOLÉA Noureddine 2 and BENOUDJIT Nabil 1, 1

University of Batna, Algeria, Department of Electronics, Faculty of Engineering Sciences, 05000 Batna, Algeria, E-mail: [email protected], E-mail: [email protected] 2 University of Oum El Bouaghi, Algeria, Department of Electronics, Faculty of Sciences and Technology, 04000 Oum El Bouaghi, Algeria, E-mail: [email protected]

Abstract – In this paper, we address an evolutionary based computed torque control for rigid robot manipulator. Genetic algorithm approach is proposed for solving the formulated multi-objective problem, the objectives are the minimization of tracking error for each joint with constraints imposed on control input torque, and the problem arising was solved with NSGA II algorithm which presents a large set of nondominance solutions for experiment user to making best solution. Finally, we show simulation results to demonstrate the effectiveness of the proposed approach. Keywords: Robot control, Computed torque, Genetic Algorithms, Multi-Objective optimization. I. INTRODUCTION Numerous control approaches in automatic based on the resolution of optimization problems. In the case of the nonlinear systems of high dimension, these problems of optimization under constraints become difficult to solve in a reasonable time. Moreover, the majority of the performance criteria (generally in conflict with each other) such as overshoot, response time, accuracy and stability margins are non-convex functions, not differentiable, even not analytical, their optimization leads to solving a Multi-Objective Problem. Consequently, the use of traditional methods of optimization to solve this problem is problematic. A class of approached methods, often stochastic, the metaheuristics ones which are based on a random prospecting of the solutions space using probabilistic rules of transition (evolutionary algorithms of any sort (i. e. , genetic algorithms [1], evolution strategies [2], evolutionary programming [3] or genetic programming [4]), particle swarm optimization [5] and ant colony [6]…) seems an interesting way for solving these

difficult problems of optimization, and thus a track for the determination of control laws. A wide variety of MOEA are available in literature for solving Multi-Objective problems. These techniques go from aggregated forms of single-objective Evolutionary Algorithms to true Multi-Objective approaches based on Pareto ranking such as MOGA [7], NPGA [8], NSGA [9], PAES [10], NSGA-II [11], SPEA [12], SPEA2 [13]) and their extensions. Those approaches have been employed in the field of robot control, for example by Ortmann and Weber [14] who have used evolution strategy with a linear combination of weights to optimize the trajectory of a robot arm. Another category of approaches for robot control applying Artificial Intelligence methods has been successfully tested, this category include both neural networks and fuzzy Logic control [15, 16]. Although the control of most current industrial robots is still designed starting from the linear automatic, more advanced methods, taking account of the nonlinear character of the articulated structures have been suggested for the applications which require high dynamic performances (speed, accuracy), most of them can be seen as a variant of computed-torque controller which allows at the same time the linearization as well as decoupling by the include of the dynamic model in the feedback loop. In this paper, a class of derived Computed-torque control is proposed. First, a PD-plus gravity controller is adopted, and then parameters of controller are optimized based on function of the tracking error. Second, a filtered tracking error is used for replacing standard tracking error in the control law, which yields a sliding mode control strategy. In both control strategy multiple objectives with constraints are considered that can be formulated as a multi-objective problem. To deal with the above problem, a genetic algorithm approach is proposed for solving multi-objective problem. The


objectives are the minimization of the tracking errors of robot manipulator. The organization of this paper is as follows. In section 2, definition of multi-objective optimization problems is introduced. In section 3, Robot arm dynamics is formulated. In section 4, a non-linear control of the robots is discussed. The proposed algorithm is shown in section 5. In section 6, the implementation of the proposed algorithm is described. In section 7, the simulation results and discussions are presented to show the effectiveness of the proposed algorithm. Finally, conclusions are given in section 8.

first three degrees-of-freedom of the majority of the industrial robots. We assume that the masses m1 and m2 of the links are represented by point masses at the end of the links. The link lengths are l1 and l2 , respectively. The variables q1 and q 2 are the joint angles. The dynamics of an n-link rigid robot manipulator can be expressed in the Lagrange form [18]. So when we apply Euler-Lagrangian equation to the robot manipulator with two revolute joints, we obtain the following dynamic equations: M (q )q + N (q, q) + G (q ) + H (q ) = G

II. MULTI-OBJECTIVE OPTIMIZATION PROBLEM A general multi-objective optimization problem is defined [17] as minimizing (or maximizing)

Where q = [q1 q 2 ]T

is the joint variable vector,

T

q = [q1 q2 ] is the joints velocity, q = [q1 q2 ]T joint

acceleration, and G = [G1

F (x ) = ( f1(x ), . . . , fk (x ))

(2)

G2 ]T is the control input

torques vector. The inertia matrix is given by: é a + a cos(q 2) a + a2 .cos(q )ù 2 3 2 ú ê 1 2 ú, M (q ) = êê ú a êa 3 + 2 .cos(q 2 ) ú a3 êë úû 2

Subject to gi (x ) £ 0, i = {1,..., m}, h j (x ) = 0, j = {1,..., p}, x Î W

A multi-objective optimization problem solution minimizes (or maximizes) the components of a vector F (x ) where x is a n-dimensional decision variable vector x = (x1, . . . , x n ) from some universe W . It is noted that g i (x ) £ 0 and h j (x ) = 0 represent

The Coriolis and Centrifugal matrix is given by: é q 2 ù ê-(a2 sin(q 2 ))(q1 .q2 + 2 )ú ê 2 ú N (q, q) = ê ú, 2 ê ú q1 ê ú (a2 sin(q 2 )) êë úû 2

constraints that must be fulfilled while minimizing (or maximizing) F (x ) and W contains all possible x that can be used to satisfy an evaluation of F (x ) .

The gravity vector is given by

According to this definition, it's obvious that the optimum is not any more one simple value as for the problems with a single objective, but a set of solutions (decision vectors) , called the nondominated solutions or Pareto optimal solutions, for which no improvement in any objective function is possible without degradation at least one of the other objective functions.

And the effects of Coulomb and viscous friction are given by:

Later on, we will adopt the following formulation, equivalent to the preceding one, but more often used in current work:

Where

ìï min fm (x ) m = 1,..., M ; ïï ïï g ³ 0, j = 1,..., K ; ï j í j = K + 1,..., L ïïh j = 0, ïï ïï x il £ x i £ x iu i = 1,..., n ïî

(1)

III. ROBOT ARM DYNAMICS A robotic arm with two revolute joints ( RR kinematics), implemented as shoulder and elbow, is our test bed for modeling and Control. The kinematics of this arm is very common in industry: the second and third joint in

éa 4 cos(q1 ) + a5 cos(q1 + q 2 )ù ú, G(q ) = êê ú a5 cos(q1 + q 2 ) êë úû

éV1 .q1 + V2 .sign(q1 ) ù ú H (q) = êê ú êëV3 .q2 + V4 .sign(q2 )úû

a1 = l12 (m1 + m2 ) + l22m2 ,

a2 = 2l1l2m2 ,

2

a 3 = l2 m2 , a 4 = (m1 + m2 )l1g , a5 = m2l2g denote inertia and gravity parameters. V1,V3 and V2 ,V4 denote coulomb and viscous frictions parameters.

IV. ROBOTS NON-LINEAR CONTROL Since the field of the non-linear theory of control is large, we must limit our attention to a method which appears quite applicable to the robots manipulators. Consequently, the principal goal of this section will be a special method called computed torque method which is employed to cope with the difficulty caused by the complexity of the dynamic model of robot manipulators. The mathematical dynamic model that characterizes the


behavior of robot manipulators is in general, composed of nonlinear functions of the state variables (joint positions and velocities) and has strong couplings between joints, which may be considered as a nonlinear multivariable system with n-DOF and cannot be evaluated using only a single criterion. To begin our discussion of the non-linear techniques, which are applied to the control of robot manipulators, we return to the robot manipulator modeled above.

A. 2 PID Computed-Torque Control From classical control theory we know that in the presence of unknown disturbances, PD control gives a nonzero steady-state error. Consequently, we extended PD-Plus-Gravity Controller to a PID Computed-Torque controller. q

Γ

+

Robot

+

q

+

A. Control laws Several algorithms of control which incorporate dynamics have been developed. Many of the latter are variations of the method of computed torque which is similar to the method of linearization by feedback employed for the control of the non-linear systems [19]. In this section, PD-Plus-Gravity controller will be first introduced, and then, PID computed-torque control, will be introduced later.

Kv

-

+

e

qd

A.1 PD-Plus-Gravity Controller

(3)

Fig. 1 PD-plus gravity controller.

 M (q )

Γ

q

Robot

q

+

  N(q,q)+G(q)

+ + +

(4)

Where qd is the reference signals vector.

Thus, the two objectives to be minimized are:

qd + qd

Because each joint actuator has a torque limit, the input torque is limited by its maximum value G max . Therefore, we can define the multi-objective problem as

1 s

e

-

e

+

-

Fig. 2 PID computed torque controller

(5)

Where e1 is the position-tracking error according to the first joint, e2 is the position-tracking error according to the second joint.

Kp

Ki

Kv

The system control block diagram can be drawn out according to the system foundation equations, Fig. 1.

ïìï min f1(x ) = e1T e1 ïï ïïmin f2 (x ) = e2T e2 ïí ïï G1 £ G1 max ïï ïï G £ G 2 max ïî 2

+ +

qd +

With the tracking error defined as

ìï min f1(x ) = e1T e1 ïï í ïïmin f2 (x ) = e2T e2 ïî

-

+

A useful controller in the computed-torque family was first proposed by Takegaki and Arimoto [20], which is now called PD-Plus-Gravity controller. This scheme requires only knowledge of the gravity term in Lagrange's equation of motion at the desired position. Then, selecting PD feedback for u(t ) yields

e = qd - q .

Kp

e

qd

G = K ve + K pe + G (q )

 G(q)

+

The outer-loop control is proportional integral and derivative (PID) feedback; this can also be given by: u = -K ve - K pe - K i e

(7)

e = e

It yields the arm control input G = M (q )(qd + K ve + K pe + K i e) + N (q, q ) + G (q )

(8) (6)

with e(t ) the integral of the tracking error e(t ) . Thus additional dynamics have been added to the linear outerloop compensator. A block diagram of the PID computed-torque controller appears in Fig. 2.


 M (q )

+

Γ

Robot

+

qd +

parameter [11]. The NSGA II was advanced from its origin, NSGA. In NSGA II as shown in the procedure below, a nondominated sorting approach is used for each individual to create Pareto rank, and a crowding distance assignment method is applied to implement density estimation. In a fitness assignment between two individuals, NSGA II prefers the point with a lower rank value, or the point located in a region with fewer points if both of the points belong to the same front. Therefore, by combining a fast nondominated sorting approach, an elitism scheme and a parameter less sharing method with the original NSGA, NSGA II claims to produce a better spread of solutions in some testing problems. After calculating the fitness of each individual i , NSGA II adopts a special selection process. That is, between two individuals with nondomination rank ri and rj ,

q

q

+

  Nqq ( , )+Gq ()

+ + +

Kp

Ki

Kv

1 s

S

+

Λ

+

qd +

e

-

qd

e

+

and crowding distance di and d j , we prefer the solution

Fig. 3 PID computed torque controller using sliding surface When modifying the tracking error by sliding surfaces define as S = Le + e

(9)

with the lower (better) rank. Otherwise we select the better one depending on the following conditions: i d j ))

Procedure: NSGA II

Where L is a positive-definite gains matrix. We obtain the torque control law:

Input: the objective fk (vi ) of each chromosome vi , k = 1, 2,..., q, "i Î popSize

G = M (q )(qd + K v S + K pe + K i x ) + N (q, q ) + G(q )

Output: fitness value eval (vi ), "i Î popSize

(10)

Begin

with x(t ) the integral of the sliding surface S (t ) .

set P ¬ {vi } ; // nondominated-sort (r )

The system control block diagram can be drawn out according to the system foundation equations, Fig. 3.

rank ¬ 1;

while P ¹ Æ do

Thus, the two new objectives to be minimized are ïìï min f3 (x ) = S1T S1 ïí ïïmin f4 (x ) = S 2T S 2 ïî

for i = 1 to popSize (11)

if vi is nondominated solution then ri ¬ rank ;

Where S1 is the sliding surface according to the first joint and S 2 is the sliding surface according to the second joint.

P ¬ P \ {vi | ri = rank }; rank ¬ rank + 1;

Therefore, we can define the multi-objective problem as ì min f3 (x ) = S1T S1 ï ï ï ï ï min f4 (x ) = S 2T S 2 ï ï í ï G1 £ G1 max ï ï ï ï G £ G2 max ï ï î 2

ri = 0; "i Î popSize // distance-assignment (d )

For k = 1 to q (12)

{j } ¬ sort{P | max(z ki ),| vi Î P };

dk1 = dkpopSize ¬ ¥; dkj ¬

V. NON-DOMINATED SORTING GA (NSGA II): Deb et al. [21] suggested a nondominated sorting-based approach, called non-dominated sorting Genetic Algorithm II (NSGA II), which alleviates the three difficulties: computational complexity, nonelitism approach, and the need for specifying a sharing

dkj +1 - dkj -1 , j = 2,..., popSize - 1; max{zki } - min{zki } i

q

i

di ¬ å d , "i Î popSize i k

k =1

Output: eval (vi ) = {ri , di }, "i End


VI. IMPLEMENTATION OF NSGA II This section introduces the implementation of the proposed multiobjective genetic algorithm, the nondominated sorting genetic algorithm II (NSGA-II), for solving the optimization problems previously definite. The decision variables of the problems, which represent the controller parameters vector (k p1, kv 1, k p 2 , kv 2 ) for PD-plus gravity controller and (k p1, kv 1, ki 1, k p 2 , kv 2 , ki 2 ) for PID computed torque controller, are encoded as binary within their specified ranges (table 1). Table 1: Parameter setting for the optimization algorithm Decision variable

Lower bound

Upper bound

Kp

0, 01

250

Kv

0, 01

250

Ki

0, 01

Actual lengths: l1 = 0.432 (m ), l2 = 0.432 (m ), T

The initial conditions are: qi (0) = éê1.5 -0.5ùú rad, ë û A. Initial population The genetic algorithm starts with a group of individuals, randomly generated, known as the population (P 0) . At first the population contains a set of 100 individuals called chromosomes which are potential solutions of the problem. Each chromosome consists of a set of genes coded in binary. Then the mathematical formulas for binary encoding and decoding are given as follows: Encoding: For each real variable x coded by l bits, one associates an integer y such as: y=

The constraints impose the following restrictions to the range of torques (table 2). Table 2 : Limiter Joint torques range

i

(13)

i

i =0

250

In this study, the cost functions to be minimized correspond to the position-tracking errors and sliding surfaces. A quadratic penalty method, where a weighted sum of the squares of constraint violation values of an individual, is added to the objective value to handle constraints.

l -1

åb 2

Where bi is the i’th bit in the binary encoding and l represents the length of the chromosome. Then the formula of encoding is represented by: y=

x - x min y max x max - x min

(14)

Where x min and x max are the minimal and maximal values of x , and y max is the maximum value represented by l bits. Decoding, (15)

Link

Joint range (Nm)

x

1

-250 £ G1 £ 250

2

-120 £ G2 £ 120

If x is encoded by 10 bits, x min = 0.01 , and x max = 250 , then the binary number 0011100111 = 231 is decoded as follows x

=

=

x min + y (x max- x min) / y max

0.01 + 231 (250 - 0.01) /1023 » 56.46 . 9

In the simulation, the parameters of the proposed genetic algorithm based approach are given in table 3. Table 3: NSGA II parameters

B. Fitness functions For multi-objective optimization problems, algorithms are typically designed with two objectives which are defined by equation 5 for the PD-plus gravity controller problem, and equation 11 for PID computed-torque controller problem.

NSGA II parameters

Value

Population size

100

number of generation

100

Crossover rate

0. 7

Mutation rate

0. 02

C. Constraints handling

replace proportion

0. 1

Both control problems have two constraints that must be satisfied during the control period. Thus we implemented a quadratic penalty method to handle constraints, So that a weighted sum of the square of the constraint violation value of an individual is added to its objective value to produce the new cost function.

Parameters for the considered two-link manipulator: Actual mass: m1 = 15,91 kg, m2 = 11,36 kg . ˆ 1 = 15, 81 kg, m ˆ 2 = 11,26 kg . Estimation: m

m min Fm (x ) = fm (x ) + Fpenalty (x )

m = 1, 2

(16)


m Here, the function of penalty Fpenalty is defined in general

form by the equality:

generated random number is smaller than a probability of mutation of a single bit which equal to 0.02 .

L

(m ) F penalty (x ) = cm (t )å P(dj (x ))

(17)

j =1

Where dj (x ) indicate the measurement of the violation degree of the j 'th constraint. The coefficients cm (t ) are (m ) different for various objectives in order to return Fpenalty

in the same order magnitude as J m .

D. Evaluation After initialization, the next step is to calculate objective functions and constraints violations for all individuals in the population (P 0) , if there is any violation, then a weighted sum of the square of the constraint violation value of an individual is added to its objective value to produce the new cost function. After this calculation is done, the population (P 0) is sorted using the nondominated sorting approach to create Pareto rank. Then the following steps are applied to the population (P 0) to create a child population (Q 0) :

H. Elitism Replacement Crossover and mutation operators can assign the best individuals of a generation. The elitism has the advantage of drawing aside the possibility of losing those individuals with copying the best individuals of each generation in the population of the next generation. This model can accelerate the speed of domination exerted by those super individuals on the population [22]. In our case a specified proportion, pe = 0.1 , of top individuals are retained and the rest, N (1 - pe ) individuals are replaced by the top individuals in the new population, where N is the population size. VII. RESULTS & DISCUSSION In this section, we presented the results of simulation that focusing on computation time and the position-tracking performance of the two-link robot manipulator. For each controller used two different cases are considered to demonstrate the effectiveness of the proposed algorithm.

E. Selection We decided to use a binary tournament selection, because it is easy to implement, requires very little computing time, and is controlled by only a few parameters. In this approach, where two individuals are selected at random, only one is selected, and is added to the mating pool, which has the size of the number of parents to be selected. Selection is based on two criteria. First and foremost is the rank or the front in which the solutions reside. Individuals with lower rank are selected. Secondly if the ranks of the two individuals are the same then, the crowding distance is compared. Individual with greater crowding distance is selected. Tournament selection is carried out until the pool size is filled. F. Crossover Two-point crossover with probability equal to 0.7 is used, where two individuals are selected at random from the mating pool. Next, two crossover points are selected at random and the substrings lying in between the two points are swapped to produce the two new offspring.

A. Computation time The proposed algorithm NSGA II was coded in matlab 7. 0 and implemented on Intel Pentium-IV Dual Core CPU with 3. 6 GHz and 1GB of RAM under Windows Xp. We have run the proposed genetic algorithm based approach with the parameters (table 1, table2 and table3): population size=100, crossover rate = 0.7, mutation rate =0.02, replace proportion=0. 9 and we picked the range [0.01, 250] for each decision variable. Finally, we change the maximum number of generations from 50 to 200 and then we run the algorithm several times. We notice that the maximum number of generations which gives the best results equal to 100, less than 100 gives results of lower quality, and upper than 100 the algorithm has no improvement in terms of results’ qualities . Therefore, we have decided to use 100 as the maximum number of generations, and we invested the main advantage for the proposed algorithm to generate a set of optimal solutions in a single run. After running the algorithm for the two controllers PD-plusgravity and PID computed torque controller. Parameters optimization requires 43 minutes for the first controller and 45 minutes for the second.

G. Mutation This operator is applied to each chromosome resulting from the crossover process. We used a procedure that consists of iteration over all genes, where the bit in a gene is flipped from 1 to 0 , and visa versa if a

B. PD-Plus gravity controller Figures 4 to 7 show the results of PD-plus gravity controller. On the one hand, with taking account of the constraints, we can see rather long response times in transitory mode and a little tracking error especially at


the points where there is change of direction. This behavior is due to the nonlinearities of the robot, defined by the inertia of the robot and the forces the Centrifugal ones and of Coriolis, which are not taking into consideration in the control law and to an offset from the initial position. On the other hand, without taking into account constraints, torques are amplified and the system behavior remains the same one in spite of this increase because the inputs are saturated.

of complexity of the reference signal in combination with the offset from initial position. Obtained results of parameters and the resulting non saturated torques using both control law are summarized in Table 4. Table 4: Obtained results for non-saturated torques Kv

Kp

Ki

G min

G max

PD-Plus gravity controller without constraints C. PID computed torque controller using sliding surface

q1

116. 57 123. 91

-113. 92 247. 89

Analysis of results of the PID computed torque controller in sliding mode (showed by fig 8,9,10 and 11) allows judging the use of the sliding surface. By imposing constraints to the input torques, we can see a very interesting improvement in terms of rapidity and accuracy compared with the results of the control law applied previously (very small response time in transitory mode and a perfect tracking position in permanent mode). This behavior is due to the use of a control law which reacts according to filtered errors and to the taking into account of the various nonlinearities of the robot, such as the inertia, and Centrifugal and Coriolis forces.

q2

89. 69

-109. 04 119. 82

When we do not hold constraints, we obtain high values of input torques which are limited with the saturations setup. The behavior of the system remains the same one as the behavior of the system with constraints, but the initial torques display a very fast variation which can involve the destruction of the engine.

71. 61

PD-Plus gravity controller constraints q1

243. 158 249. 267

-114. 37 468. 72

q2

135. 391 111. 687

-182. 25 222. 58

PID computed torque using sliding surfaces controller without constraints q1

0. 01

140. 52

250. 00 -324. 15 1559. 2

q2

234. 12

238. 76

214. 81 -143. 55 584. 63

PID computed torque using sliding surfaces controller with constraints q1

4. 90

54. 50

119. 02 -105. 44

243. 56

q2

246. 34

69. 66

247. 80 -91. 74

106. 97

This control law gives better results in terms of tracking position errors, and avoiding initial instability, in spite 2

θ1(rad)

1 0 -1 -2 Actual Desired

-3 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

300 200

Γ1(Nm)

100 0 -100 -200 -300

Fig. 4 Tracking Performance –joint 1 with taking account of the constraint.


Actual desired

4

θ2(rad)

3 2 1 0 -1 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

150 100

Γ2(Nm)

50 0 -50 -100 -150

Fig. 5 Tracking Performance –joint 2 with taking account of the constraint.

2

θ1(rad)

1 0 -1 -2 Actual Desired

-3 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

300 200

Γ1(Nm)

100 0 -100 -200 -300

Fig. 6 Tracking Performance –joint 1 without taking account of the constraint.


Actual desired

4

θ2(rad)

3 2 1 0 -1 0

0.4

0.8

1.2

1.6

2

2.4

2.6

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

150 100

Γ2(Nm)

50 0 -50 -100 -150

Fig. 7 Tracking Performance –joint 1 without taking account of the constraint.

2

θ1(rad)

1 0 -1 -2

Actual

-3

Desired 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

300 200

Γ1(Nm)

100 0 -100 -200 -300

Fig. 8 Tracking Performance – joint 1 with taking account of the constraint.


Actual

4

desired

θ2(rad)

3 2 1 0 -1 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

150 100

Γ2(Nm)

50 0 -50 -100 -150

Fig. 9 Tracking Performance – joint 2 with taking account of the constraint.

2

θ1(rad)

1 0 -1 -2

Actual

-3

Desired 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

300 200

Γ1(Nm)

100 0 -100 -200 -300

Fig. 10 Tracking Performance – joint 1 without taking account of the constraint.


Actual

4

desired

θ2(rad)

3 2 1 0 -1 0

0.4

0.8

1.2

1.6

2

2.4

2.8

3.2

3.6

4

0

0.4

0.8

1.2

1.6

2 Time(Sec)

2.4

2.8

3.2

3.6

4

150 100

Γ2(Nm)

50 0 -50 -100 -150

Fig. 11 Tracking Performance –joint 2 without taking account of the constraint. VIII. CONCLUSION This paper proposed the NSGA II algorithm as a method for solving multi-objective problem in control field. So, with an aim of improving the performances of the robot manipulator used, we presented two control laws well known in robotics and requiring the precise knowledge of the dynamic model of the robot, the parameters of the controllers can be determined by solving a multi-objective problems using NSGA II algorithm. The performance of the PID computed torque controller using sliding surfaces has been tested against complexity of the reference signal and inaccurate placement of the manipulator at ready position, and moreover compared to PD-plus gravity controller with the same reference signal. Simulation results have demonstrated that a light modification in the control law, taking account of uncertainties of the model can significantly improve the tracking performance, and indicate that NSGA II algorithm employed in this work, have demonstrated a very efficient algorithm for this type of problems in control field. REFERENCES [1]

D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Company, Reading, Massachusetts, 1989.

[2]

H. -P. Schwefel. Evolution and Optimum Seeking. John Wiley & Sons, New York, 1995.

[3]

L. J. Fogel. Artificial Intelligence through Simulated Evolution. John Wiley, New York, 1966.

[4]

J. R. Koza. Genetic Programming. On the Programming of Computers by Means of Natural Selection. The MIT Press, Cambridge, Massachusetts, 1992.

[5]

J. Kennedy and R. C. Eberhart. Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco, California, 2001.

[6]

M. Dorigo and T. Stützle. Ant Colony Optimization. The MIT Press, 2004. ISBN 0-262-04219-3.

[7]

C. M. Fonseca and P. J. Fleming. Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. In S. Forrest, editor, Proceedings of the Fifth International Conference on Genetic Algorithms, pages 416–423, San Mateo, California, 1993. University of Illinois at Urbana- Champaign, Morgan Kaufmann Publishers.

[8]

J. Horn, N. Nafpliotis, and D. E. Goldberg. A Niched Pareto Genetic Algorithm for Multiobjective Optimization. In Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, volume 1, pages 82–87, Piscataway, New Jersey, June 1994. IEEE Service Center.

[9]

N. Srinivas and K. Deb. Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms. Evolutionary Computation, 2(3):221–248, fall 1994.

[10] J. D. Knowles and D. W. Corne. Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy. Evolutionary Computation, 8(2):149–172, 2000. [11] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA–II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, April 2002.


[12] E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: A Comparative Case Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Computation, 3(4):257–271, November 1999. [13] E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the Strength Pareto Evolutionary Algorithm. In K. Giannakoglou, D. Tsahalis, J. Periaux, P. Papailou, and T. Fogarty, editors, EUROGEN 2001. Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, pages 95–100, Athens, Greece, 2001. [14] M. Ortmann and W. Weber. Multi-Criterion Optimization of Robot Trajectories with Evolutionary Strategies. In Proceedings of the 2001 Genetic and Evolutionary Computation Conference. Late-Breaking Papers, pages 310–316, San Francisco, California, July 2001. [15] H. A. TALBI, R. V. PATEL, K. KHORASANI, “A neural network controller for a class of non-linear nonminimum phase systems with applications to the flexiblelink manipulator”, ASME, J. of Dyn. Syst. , Measurements and Control, Vol. 127, p. 289-294, 2005.

[16] A. JNIFENE, W. ANDREWS, “Experimental study on active control of a single-link manipulator fuzzy logic and neural networks”, IEEE Transactions on Instrumentation and Measurement, Vol. 54, p. 12001208, 2005. [17] C. A. Coello Coello and G. B. Lamont, editors. Applications of Multi-Objective Evolutionary Algorithms. World Scientific, Singapore, 2004. ISBN 981-256-106-4. [18] F. L. Lewis, D. M. Dawson and C. T. Abdallah (2004) : Robot Manipulator Control Theory and Practice. Marcel Dekker. [19] J. J. Craig, Introduction to Robotics, Reading, Mass. : Addison-Wesley, 1989 [20] M. Takegaki and S. Arimoto. A new feedback method for dynamic control of manipulators. ASME J. Dynam. Syst. Meas. Control 103:119-125, 1981. [21] K. Deb, (2001). Multiobjective Optimization Using Evolutionary Algorithms, Chichester:Wiley. [22] M. Cerrolaza and W. Annicchiarico (1999). “Genetic algorithms in shape optimization: finite and boundary element applications", 283-323.

Genetic Algorithm Based High Performance Control ...

Genetic Algorithm Based High Performance Control ...

Suggest Documents

Performance of Genetic Algorithm Based Intelligent

Genetic Algorithm Based Closed Loop Control for

High Performance Direction Finding Algorithm Based ...

High-Performance, Parallel, Stack-Based Genetic ... - Faculty

A Genetic Approach to Analyze Algorithm Performance Based on the

Real-Coded Genetic Algorithm for Rule-Based Flood Control ...

genetic algorithm based optimal load frequency control in two-area ...

Genetic Algorithm Based Software Testing

Speed Control of Induction Motor Using Genetic Algorithm-based PI ...

Fuzzy-PID Control via Genetic Algorithm-Based ... - Engineering Letters

High-speed FPGA-based Implementations of a Genetic Algorithm

High Performance Control

CBFS: High Performance Feature Selection Algorithm Based ... - PLOS

High Performance Comparison-Based Sorting Algorithm on Many ...

A High Performance Algorithm for Fast Block Based Motion Estimation

Optimization of Genetic Algorithm Performance ... - [email protected]

Genetic Algorithm Optimization and Performance Comparative ...

Maximising Performance of Genetic Algorithm ... - Engineering Letters

Genetic Algorithm Applied to State Feedback Control

GENETIC ALGORITHM TOOLS FOR CONTROL SYSTEMS ...

Implementation of Genetic Algorithm in Control

GENETIC ALGORITHM OPTIMIZATION OF HIGH-EFFICIENCY WIDE ...

Morphological Control for High Performance

Feedback Control for a MEMS-Based High-Performance Operational