Computers and Software

International Review on

Computers and Software (IRECOS) Contents Privacy-Preserving Distributed Collaborative Filtering Using Secure Set Operations by Chongjing Sun, Yan Fu, Hui Gao, Junlin Zhou

2309

A Technique to Mine Clusters Using Privacy Preserving Data Mining by J. Anitha, R. Rangarajan

2316

Secured and Encrypted Data Aggregation with Message Authentication Code in Wireless Sensor Networks by A. Latha, S. Jayashri

2327

Hybrid Approach for Energy Optimization in Wireless Sensor Networks Using ABC and Firefly Algorithms by T. Shankar, S. Shanmugavel

2335

Distributed Relay Node Selection and Assignment Technique for Cooperative Wireless Networks by S. Sadasivam, G. Athisha

2342

Improving Network Life Time of Wireless Sensor Network Using LT Codes Under Erasure Environment by V. Nithya, B. Ramachandran

2349

On the Performance of MANET Using QoS Protocol by B. Nancharaiah, B. Chandra Mohan

2356

Performance Analysis of Modulation and Coding to Maximize the Lifetime of Wireless Sensor Network by M. Sheik Dawood, G. Athisha

2363

Energy Aware Zone Routing Protocol Using Power Save Technique AFECA by Ravi G., K. R. Kashwan

2373

Design of Vertical Handoff Initiation and Decision Algorithm in Heterogeneous Wireless Networks by S. Aghalya, A. Sivasubramanian

2379

Analysis of Depth Based Routing Protocols in Under Water Sensor Networks by J. V. Anand, S. Titus

2389

Study of Energy Efficient Protocols Using Data Aggregation in Wireless Sensor Network by Nagendra Nath Giri, G. Mahadevan

2403

EECPS-WSN: Energy Efficient Cumulative Protocol Suite for Wireless Sensor Network by Nagendra Nath Giri, G. Mahadevan

2414

A Case Study of Using RM-ODP in Mobile Cloud Computing Applications by M. Jebbar, A. Sekkaki, O. Benammar

2428

Spam Detection and Elimination of Messages from Twitter by Sajin S. Chandran, Murugappan S.

2438 (continued)

.

Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Improving Search Results Through Reducing Replica in User Profile by P. Srinivasan, K. Batri

2444

An Evaluation of the Movie Song Browser System Among IT and Non-IT Users by Munauwarah, Nazlena Mohamad Ali, Hyowon Lee

2453

An Access Control Model of Web Services Based on Multifactor Trust Management by R. Joseph Manoj, A. Chandrasekhar

2460

Performance Evaluation of the Hearing Impaired Speech Recognition in Noisy Environment by C. Jeyalakshmi, V. Krishnamurthi, A. Revathy

2467

SEVALERPS a New EX-ANTE Multi-Criteria Method for ERP Selection by Abdelilah Khaled, Mohammed Abdou Janati-Idrissi

2477

A Novel Expert System in Hospital Location Analysis with the Aid of Adaptive Artificial Bee Colony (AABC) by K. Janaki, N. Radhakrishnan

2486

Design of High Speed Serial-Serial Multiplier for OFDM Applications by N. Saravanakumar, A. Nirmal Kumar, K. N. Vijeyakumar, M. K. Ananda Moorthy

2495

Feature Based Image Retrieval Using Fused Sift and Surf Features by V. Vijayarajan, M. Dinakaran

2500

A New Multibiometric Identification Method Based on a Decision Tree and a Parallel Processing Strategy by Kamel Aizi, Mohamed Ouslim

2507

Computed Tomography Images Restoration Using Anisotropic Diffusion Regularization by Faouzi Benzarti, Hamid Amiri

2515

Secure Medical Image Retrieval Using Dynamic Binary Encoded Watermark by A. Umaamaheshvari, K. Thanushkodi

2521

Microarray Gene Expression and Multiclass Cancer Classification Using Improved PSO Based Evolutionary Fuzzy ELM Classifier with ICGA Gene Selection by T. Karthikeyan, R. Balakrishnan

2532

Comparative Analysis of Intrusion Detection System with Mining by S. Vinila Jinny, J. Jayakumari

2540

Enhanced Distributed Text Document Clustering Based on Semantics by J. E. Judith, J. Jayakumari

2545


International Review on Computers and Software (I.RE.CO.S.), Vol. 8, N. 10 ISSN 1828-6003 October 2013

Privacy-Preserving Distributed Collaborative Filtering Using Secure Set Operations Chongjing Sun, Yan Fu, Hui Gao, Junlin Zhou Abstract – At present, collaborative filtering has been wildly used in many fields such as e-commerce, search engineering, and etc. To produce a better recommendation, many data owners want to collaborative with each other to build a shared model. Considering the privacy problem, the data owner is reluctant to reveal its data to others. To solve this problem, we present a privacy-preserving approach using the secure set operations and encryption methods. In our method, firstly the private set intersection cardinality protocol is adopted to compute the user similarities. Then our method uses the homomorphic encryption to compute the predicted rating values for the unrated items. Finally, the model recommends the top-k unrated items to each user. We show that the distributed collaborative filtering based on our approach can provide zero loss of accuracy in the recommendation while preserving the privacy of different data owners. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Privacy Preserving, Set Operations, Collaborative Filtering

Nomenclature U

The set of all users

I

The set of all items

Iu

The set of items that user u rated

rui

Rating value of user u rated to item i

suv

Similarity of users u and v

t O 

The tth participant in the distribute system

t ui 

t The ith user of the participant O 

t I i 

t The item set rated by ui 

q

The integer from 0 to q1

  

Random permutation function

H  

Hash function

E pk

Encrypt a plaintext with public key pk

Dsk

Decrypt a ciphertext with private key sk

I.

Introduction

Nowadays, the explosive growth of the information on the Web leads to the information overload problem, which makes people get lost in all of the massive information. To provide better services to users and make more benefit from the product selling, many information filtering and recommendation techniques have been proposed [1], such as the classic collaborative filtering, the location-based recommendation service [2], and the context-dependent recommendation [3].

Manuscript received and revised September 2013, accepted October 2013

2309

This can help people filtering out the redundant information, shortening the searching time, and finding the personalized items which they are most interested in. Recommender System plays an important role in filtering the information, and many works have been proposed. Among them, the Collaborative Filtering (CF) is a classic technique and widely be used in many e-commerce sites. Usually, some small or start-up companies do not have enough data to provide satisfying recommendations to their customers. They want to collaborative with other companies to build a shard recommender system which can provide better recommendations. But the problem is that the other companies do not reluctant to share their data considering the privacy of their customers. For example, some customers buy some private products and do not want others to know. The data sharing may violate the privacy of these users. Under this condition, we consider how to build a shared recommender system without disclosing the privacy. In this paper, we focus on the binary user-item ratings, such as buy or not buy a product. Hence the privacy is defined as whether a user bought or rated an item. Based on the secure set operations and homomorphic encryption, we propose an algorithm which can build the shared recommender model without disclosing the privacy while having the zero accuracy losing. The user-based CF mainly has two steps, the similarities computation and rating scores prediction. In the first step, we adopt the private set intersection cardinality protocol to compute the similarity between users without revealing the true ratings for each user. In the second step, we design an approach based on the homomorphic encryption to generate the predicted ratings for the unrated items. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

C. J. Sun, Y. Fu, H. Gao, J. L. Zhou

Finally, the model selects the top-k unrated items to each user based on the predicted ratings. To solve the privacy and secure problem, the corresponding techniques have been developed very fast, especially in the mobile [4] and RFID [5] areas. In this paper, we put our focus on the privacy problem under the distributed CF, which conducts the algorithms on the rating data stored in the multiple responsories. Plot and Du [6] proposed privacy-preserving algorithms on the Collaborative Filtering recommendation under the horizontally or vertically partitioned data. Yakut and Polat [7] presented privacy-preserving schemes to make the item-based predictions on the arbitrary distributed data. Both works cause the accuracy loss when recommending items to users. Plot and Du [8] designed methods on the privacy- preserving CF under the vertically distributed data, which select all the users as the targeted user’s neighbours. In our work, we only select the top-k users as its neighbours. Considering the distributed CF techniques, Kaleli and Polat [9] achieved the privacy-preserving on a model- based CF (the naive Bayesian classifier-based CF) recommendation. Yakut and Polat [10] also gave a solution on the privacy-preserving model-based CF (the SVD- based CF) recommendation. In our work, we solve the privacypreserving problem on the memory-based CF. In our work, the secure set operations and homomorphic encryption techniques are combined for designing a new privacypreserving scheme, which can conduct the shared userbased CF recommendation without any accuracy losing. The rest of this paper is organized as follows. In Section 2, we introduce the preliminaries and define the research problem in this paper. In Section 3, we devise the privacy-preserving distributed collaborative filtering approach, and evaluate it in Section 4. Finally, we conclude this paper in Section 5.

II.

Preliminaries and Problem Definition

In this paper, we focus on the recommender systems based on the collaborative filtering technique, which is one of the most successful technologies. Specifically, this work solves the privacy-preserving problem concerning the user-based CF model, in which there is a list of users U = u1 ,u2 ,...,un  and a list of items I = i1 ,i2 ,...,im  . All the binary ratings can be summarized in a user-item table, which contains the rating scores rui provided by user u for item i . rui is set to 1 if u has rated item i ; otherwise 0. Each user u has a list of rated items I u  i | i  I ,rui  0 . Many metrics have been proposed to compute the similarity between two uses. Suppose the rating vector for user u and v are ru and rv respectively. Similarity measures for the binary ratings are listed in Table I. For the binary ratings, Cosine measure is equivalent to Slaton’s measure. As our work adopts the secure set operations, we put emphasis on the measures based on set


operations. After the similarities between any two users are obtained, the predicted rating score for the item can be calculate by Formula (1):

rui 

 vN

u

suv rvi

 vN

suv

u

(1)

where Nu denotes the top-k most similar users of the target user u . TABLE I SIMILARITY MEASURES [11] Name of measure method

Formula of similarity suv ru  rv

Cosine

ru

rv

2

2

Iu  Iv

Salton

Iu Iv

Iu  Iv

Jaccard

Iu  Iv 2 Iu  Iv

Dice

Iu  Iv Iu  Iv

LHN-I

Iu Iv

The secure computation technique [12] needs to be designed for the computation between partners without leaking the information. In our work, we adopt the secure set operations and homomorphic encryption to attain our task, the privacy-preserving collaborative filtering. The secure set operations allow one party to compute the result of a set operation with another party, such that they learn nothing about the inputs of each other beyond the result of the set operation. From the previous analysis, we only need to get cardinalities of the set intersection and set union. Some works on the Private Set Intersection Cardinality (PSI-CA) and Private Set Union Cardinality (PSU-CA) have been learned to make sure that the parties are only allowed to learn the magnitude of set intersection or union. Emiliano [13] proposed the solutions on this problem which can achieve the complexities linear in the size of input sets. Homomorphic Encryption is a technique which allows certain operations on the ciphertext. Given two messages m1 and m2 , the additively homomorphic encryption schemes satisfy the following properties:





Dsk E pk  m1  E pk  m2   m1  m2



Dsk E pk  m1 

m2

  m m 1

2

(2) (3)

We adopt the Paillier cryptosystem [14] in our method, a classic additively homomorphic encryption system.

International Review on Computers and Software, Vol. 8, N. 10

2310


Problem definition. Suppose that there are p parties and m items in the distributed system, and the data is horizontally partitioned. Each party has a number of users

It is reasonable for small or medium companies to build a shared collaborative filtering model under the semi-honest model, in which they want to get more benefit from the data sharing without the invasion of other’s privacy.

t who rate the items, i.e. O  has users as follow:



t t t t  U   = u1  ,u2  ,...,unt



III. New Privacy-Preserving Distributed Collaborative Filtering

t t User ui  has an item rating vector Ri  , and the item

t 

In this section, we combine the secure set operation with the encryption technique to design the privacypreserving CF schemes under the distributed system.

t 

set rated by ui is represented by I i . The overall architecture of the privacy-preserving distributed CF system is depicted in Fig. 1.

Private similarity computation In order to recommend items to a target user, we need to compute the similarities between this user and all the others. As the other users are distributed in different parties, we need to design the Private Similarity Computation (PSC) which can compute the similarity between users without leaking the personal ratings. To explain our method clearly, we give the PSC on the data distributed on the two-parties, which can be easily extended to multi-parties. Suppose that two parties Alice and Bob have na and

Fig. 1. The distributed collaborative filtering infrastructure

The p parties cooperate with each other to establish a better shared CF model while preserving the original privacy information about their preferences, i.e., the ratings for items. We design the protocols under the semi-honest model [15], which means that the participant follows the protocol strictly, but can keep the intermediate calculating data to analyze more information.

nb users respectively. Then we simplify representation of similarity matrix of these users as:  AC  S T  C B 

the

(4)

Protocol 1. Private set intersection cardinality a

a

a

b b b Alice:InputI1 ,I 2 ,...,I n Bob:InputI1  ,I 2  ,...,I n  a

a

r1

a

  q ,r2

b

b 

  q r1

a

b 

  q , r2

 q

b 

x  g r1  y  g r1

  HI    H  I   HI  H  I    RH     HI    i1  i  n DR     RH         DR  DR  

b b i1  i  na j1  j  nb I j    I j  a

i

 b 

a

a

i

b

j

i

a

a r2 

a a x,RH1  ,...,RH n 

j

a

a

i

a

a

i

i

a

i

b r2 

a

i

b r 

2 b   b   x r1   HI   b   j1  j  nb RH j j  

a

a

 ,...,DR  y ,DR 1 na   b   H'  RH   b   i1  i  na   DH b j j   ,...,DH  b DH   1 nb a 1 / r2  mod q

a   a  =y r1   DR   a   TR i i  

a

 DH i

  a    H'  TR i  

  a   DH   b  | for i1  i  n andj1  j  n  Output :| DH i j a b



2311


Here A and B are na  na and nb  nb matrices respectively, which represent the similarities of users belong to Alice and Bob respectively. C is an na  nb matrix representing the similarity between two users, in which one is from Alice and the other is from Bob. To select the top-k most similar users to a targeted user, Alice only needs to know the matrix A and C , while Bob only needs to know B and C . Therefore, we advance the problem on how to compute the matrix C in a secure manner without revealing the detailed rating score. Based on the PSI-CA [13], we propose the private similarity computation as shown in Protocol 1. Alice has na users, and each user has a rated item set

From Protocol 1, we can see that Alice learns nothing about which items are the intersection as Bob shuffled the set of Alice. The similar privacy proof can be found in [13]. Correctness. For the i-th user of Alice and j-th user of b a Bob, the rated item sets I i  and I j  are processed by

Protocol 1 as follows:

a I i  . Alice and Bob share the common primes p and q

with q | p  1 , of which p can be set as 1024 or 2048 and q be 160 or 224. The protocol is conducted on a generator g of subgroup of size q , and two hash functions, *

*

k

H : 0,1  *p and H' : 0 ,1  0 ,1 , where k is the

a  a a  a  1 / r2 mod q  r1      DH i  H ' y   DRi       

  a  b  a  H '  g r1 r1  H  I i   

  

b r2 

b b    b   H '  x r1  ( HI   b  )r2   DH i j   b r2    a  b  b  H '  g r1 r1  H  I i   

  

   

   

security parameter. Protocol 2. Secure top-n recommendation t t O  recommends top-n items toui 

t i Party O  llother partiesO 

 S ,Id   ,Id     top_neigs u  ,k  P

t

U

i

 pk ,sk   Paillier_crypt  

   a   C, f   reorder  Id   

t U t a    aggregate Id   ,O   t p    E pk

t P U pk , p  ,Id   ,Id  

t

P

0 t s    p  

For j  1 :| C |  x  f 1  j  x O  does 



x U x a    aggregate Id   ,O  



 

j x j 1 s   E pk a    s  

 f 1  j 1 j sendss   to O



 End  O

f

  a

1

f

|C|  1

does 

|C| 

 U  f 1 |C|   aggregate  Id   ,O   

  f 1 |C|   |C|1 |C| |C| s |C| r t   1 / sum  S   Dsk s    s   E pk  a  s ui   Returntop-nunrated items

 





2312


a a b a Therefore, if two values i    i   and i    I i  , b b a  a i    I i  , then there must exist two values d    DH i

b   b  , with d  a   d  b  . Then Alice learns the and d    DH j

set intersection cardinality by counting the number of matching pairs. Suppose that the number of items rated by each user can be shared with all parties. Then the similarity shown in Table I can be directly computed after the set intersection cardinality is securely learned. Matrix C in Formula (4) can be computed as Formula (5):

Cij 

a b | I i   I j  | a b | I i   I j  |

 a



 | DH i

(5)

b 

 |  DH j

a b   a   DH   b  | | I i  |  | I j  |  | DH i j

Other similarity measure can be computed similarly. Finally Alice shares the matrix C with Bob. If there are p parties in the distributed system, then each pair of them need to cooperate with each other to securely compute their matrix C . III.1. Secure Top-N Item Recommendation For a targeted user, the top-k neighbors are selected according to their similarities, and the rating predictions are produced by aggregating the ratings of its neighbors. t Suppose that the party O  wants to recommend items to

its customer ui  . Protocol 2 shows the secure top-n item recommendation. The aggregate function in the protocol is defined in Formula (6): t



aggregate Id

U 

,O

 x

 =



x v u j  |jId U 

 su v  rv t

(6)

i

The party O  first selects the top-k most similar users which usually distributed in different parties. The protocol t

P gives the party indices Id   and the selected user indices

Id   in each party. Then O  generates a pair of public and secret keys using the Paillier cryptosystem. According to the Formula (2), the multiplication of the encrypted ratings can be decrypted as the summation of the ratings. Hence, the summation can be calculated without U

t

disclosing the rating value of each user. Finally, O  decrypts the ciphertext and gets the predicted ratings to each item, and selects the top-n items to the targeted user. t

conclude that the protocols have zero losing on the accuracy. Therefore, in this section we show the improvement on the recommendation when the parities cooperated with others. The datasets include Epinions [16] and Friendfeed[17]. We sample the original data, and finally Epinions contains 4726 users, 3907 items and 164221 ratings in total. As compared, Friendfeed contains 3133 users who collected 4956 items and 92351 ratings. The metrics evaluated the recommendation are precision, recall, F1 and HD. The definitions of them can found in [18]. The first experiment illustrates the improvement when each party cooperated with all the others. We divide each dataset into 5 parties. For the users in each party, we recommend the top-n items by analyzing the k most similar users of this party (isolated) compared with the users of all parties (cooperated). Figs. 2 and 3 show the results of experiments conducted on two datasets respectively. The x-axis represents the i-th party. Clearly on both dataset, the recommendation results on the cooperated data are better than the results on the isolated data. Next, we measure how much the improvement can be obtained when a party cooperates with different number of parties. We recommend the top-n items to the user of first party by analyzing the k most similar users of this party compared with the users from p cooperated parties, where p from 2 to 5. Figs. 4 and 5 show the results on datasets Epinions and Friendfeed. The trend is that the values of measures increase when the number of cooperated parties increases. But when the number is 4 on Epinions, the values decreased a little which means that some noise users exist in this party to the users of first party. We will focus on this problem in the future work to avoid the decreasing.

V.

Conclusion

In this paper, we focus on the privacy problem concerning how to build a shared collaborative filtering model without disclosing any user’s privacy. For this problem, we designed a solution under the semi-honest model. Theory analysis supported that our scheme combining secure set operations with the encryption technique can preserve the privacy while maintain the accuracy of the rating prediction. The experimental results show that the recommendation accuracy can be improved by cooperating with others. But some noise party may results in the accuracy decreasing. Next, we will put our emphasis on this problem.

Acknowledgements IV.

Experimental Evaluation

By analyzing our privacy-preserving protocols, we can Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

This research work was supported by National Natural Science Foundation of China under Grant No.61003231. International Review on Computers and Software, Vol. 8, N. 10

2313

0.08

0.08

0.07

0.07

0.07

0.06

0.06

0.06

0.05

0.05

0.8 0.05 0.6

0.04

HD

0.04

1

F1

Recall

Precision


0.04

0.03

0.4

0.03

0.03

0.02

0.02

0.02 Isolated Cooperated

0.01 0

1

2

3

Isolated Cooperated

0.01

4

0

5

1

2

3

4

0

5

0 1

2

3

4

5

Isolated Cooperated 1

2

3

4

5

(d) Different parties

(c) Different parties

(b) Different parties

(a) Different parties

0.2

Isolated Cooperated

0.01

Figs. 2. Recommendation results on Epinions 0.06

0.07

0.05

0.06

1

0.04

0.8

0.03

0.6

0.05

0.02

0.04

HD

0.03

F1

Recall

0.04

Precision

0.05

0.03

0.4

0.02

0.02 Isolated Cooperated

0.01 0

1

2

3

4

Isolated Cooperated

0.01 0

5

(a) Different parties

1

2

3

Isolated Cooperated

0.01

4

0

5

(b) Different parties

1

2

3

0.2

4

0

5

Isolated Cooperated 1

2

3

4

5

(d) Different parties

(c) Different parties

Figs. 3. Recommendation results on Friendfeed 0.07

0.04

0.06

0.035

0.04

0.025

0.04

F1

Recall

Precision

0.02

0.03

0.6

0.02

0.4

0.015 0.02

0.01

0.01 0.01 0

0.8

0.03

0.05 0.03

1

HD

0.05

1

2

3

4

0

5

(a) Number of cooperated parties

0.2

0.005 1

2

3

4

0

5

(b) Number of cooperated parties

0 1

2

3

4

5

1

2

3

4

5

(d) Number of cooperated parties

(c) Number of cooperated parties

Figs. 4. Cooperation with different number of parties on Friendfeed 0.07

0.07

0.07

0.06

0.06

0.06

0.05

0.05

0.05

0.04

0.04

0.04

1

HD

0.03

0.03

0.02

0.02

0.02

0.01

0.01

0.01

0.03

0.6

F1

Recall

Precision

0.8

0.4

0.2

0

1

2

3

4

0

5

1

2

3

4

0

5

0 1

2

3

4

5

(c) Number of cooperated parties

(b) Number of cooperated parties

(a) Number of cooperated parties

1

2

3

4

5

(d) Number of cooperated parties

Figs. 5. Cooperation with different number of parties on Epinions

References [1]

[2]

[3]

Sneha, Y.S., Mahadevan, G., Parvathi, R.M.S., Recommender system based on user ratings: A comprehensive study and future challenges, (2013) International Review on Computers and Software (IRECOS), 8 (7), pp. 1624-1635. Wu, J., Wu, Z., Mobile location-aware personalized recommendation with clustering-based collaborative filtering, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2231-2238. Yao, L., Yang, W., A context-aware recommender for trustworthy


[4]

[5]

service, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 3354-3359. Tripathy, P.K., Biswal, D., Multiple server indirect security authentication protocol for mobile networks using elliptic curve cryptography (ECC), (2013) International Review on Computers and Software (IRECOS), 8 (7), pp. 1571-1577. M. Eslamnezhad Namin, F. Badihiyeh Aghdam, M. Hosseinzadeh, A Secure and Efficient RFID Mutual Authentication Protocol, (2011) International Journal on Communications Antenna and Propagation (IRECAP), 1 (5), pp. 429-433.


2314


[6]

[7]

[8]

[9]

[10]

[11]

[12] [13]

[14]

[15] [16]

[17]

[18]

H. Ploat, W. Du, Privacy-preserving Top N Recommendation on Distributed Data, Journal of American society for Information Science and Technology, Vol.59, pp.1093-1108, 2008. I. Yakut, H. Polat, Arbitrarily Distributed Data-based Recommendations with Privacy, Data and Knowledge Engineering, Vol. 72, pp.239-256, 2012. H. Ploat, W. Du, Privacy-preserving Collaborative Filtering on Vertically Partitioned Data, Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (Page: 651-658 Year of Publication: 2005 ISBN: 3-540-29244-6). C. Kaleli, H. Polat, Providing Naive Bayesian Classifier-based Private Recommendations on Partitioned Data, Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (Page: 515-522 Year of Publication: 2007 ISBN: 978-3-540-74975-2). I. Yakut, H. Polat, Privacy-preserving SVD-based Collaborative Filtering on Partitioned Data, International Journal of Information Technology and Decision Making, Vol. 9, pp.473-502, 2010. L. Egghe, New Relations between Similarity Measures for Vectors Based on Vector Norms. Journal of the American Society for Information Science and Technology, Vol. 60, n. 2, pp. 232-239, 2009. Novel Secure Code Encryption Techniques Using Crypto Based Indexed Table for Highly Secured Software E.D. Cristofaro, P. Gasti, G. Tsudik, Fast and Private Computation of Cardinality of Set Intersection and Union, Proceedings of the 11th Int. Conf. on Cryptology and Network Security (Page: 218-231 Year of Publication: 2012 ISBN: 978-3-642-35403-8). P. Paillier, Public Key Cryptosystem Based on Composite Degree Residuosity Classes, Proceedings of the 17th Int. Conf. on Theory and application of cryptographic techniques (Page: 223-238 Year of Publication: 1999 ISBN: 3-540-65889-0). O. Goldreich, Foundations of Cryptography: Volume II, Basic Applications. (Cambridge: Cambridge University Press, 2004). P. Massa, P. Avesani, Trust Aware Bootstrapping Recommender Systems, Proceedings of the 17th European Conference on Artificial Intelligence Workshop on Recommender Systems (Page: 29-33 Year of Publication: 2006 ISBN: 1-58603-642-4). F. Celli, F. L. Lascio, M. Magnani, and etc, Social Network Data and Practices: the Case of Friendfeed. Proceedings of the 3rd International Conference Social Computing, Behavioral Modeling and Prediction (Page: 346–353 Year of Publication: 2010 ISBN: 978-3-642-12078-7). D.C. Nie, M.J. Ding, and etc, Social interest for user selecting items in recommender systems, International Journal of Modern Physics C, Vol. 24, n. 4, 1350022, 2013.

Yan Fu received the M.E. in Computer Science from University of Electronic Science and Technology of China in 1988. She is now a professor and Ph.D. supervisor in University of Electronic Science and Technology of China. She has published more than 50 research papers in international conferences and journals. Her research interest includes data mining and intelligence computing. Hui Gao received the PhD degree in computing science from the University of Groningen (the Netherlands) in 2005. He is now a professor and Ph.D. supervisor in the School of Computer Science and Engineering, University of Electronic Science and Technology of China. He has published more than 30 papers in international conferences and journals. His research interest includes data mining, privacy preserving and parallel programming. Junlin Zhou received the Ph.D. degree in Computer Science from University of Electronic Science and Technology of China in2010. He is now an associate professor in University of Electronic Science and Technology of China. He received the CSC scholarship in 2007 and visited University of Minnesota in 2008 for one year. His main research interest includes data mining and recommender system.

Authors’ information Web Science Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China. Chongjing Sun was born in Shandong, China on April 24, 1986, and received the B.S. degree in Mathematics and Information Science from Yantai University in 2008. He is currently working as a Ph.D. student in the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His main research interests include data mining, privacy preserving, and complex network.



2315


A Technique to Mine Clusters Using Privacy Preserving Data Mining J. Anitha, R. Rangarajan Abstract – In recent years privacy preserving data mining problem has gained considerable importance, due to the vast amount of personal data about individuals that are stored at different commercial vendors and organizations. Privacy preserving clustering is not intensively studied as other data mining techniques, such as rule mining, sequence mining, etc. In this paper, we obtain privacy by anonymization, where the data is encrypted from the original data, along with the secure key and the secure key is obtained by the Diffie Hellman key exchange algorithm. In order to perform clustering on the anonymize data Fuzzy C Means clustering algorithm is used. The Fuzzy C means clustering algorithm is suitable for clustering data where the boundaries are ambiguous. However in this paper initially distance matrix is calculated and using which similarity matrix and dissimilarity matrix is formed. Similarity matrix calculates the similarity among the data point with the cluster centroids and dissimilarity matrix calculates the dissimilarity among the data point with the cluster centroids. The membership matrix is constructed from the above matrices is used to cluster the anonymized data. The experimental result of the proposed algorithm is compared with K-Means algorithm in terms of running time, memory usage, and accuracy and it is proved that the efficiency of the proposed algorithm is better in terms of accuracy and execution time. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Privacy

Preserving, Similarity Matrix, Anonymization, Dissimilarity Matrix, Ciphertext, Data Containers, Third Party

Nomenclature n Dmi r q

DC j

TP p g a km

I.

Total number of data containers Data matrix Total number of objects Total number of attributes Data container Third party Large prime number Primitive root of prime number p Secret number Secret key of data container of DC j for Cipher text Concatenated cipher text Number of cluster centroid Cluster centroid

dm  xy

Distance matrix

Sm Dssm S xy

Matrix,

Introduction

Data mining technique has been facing several new challenges in these recent years [1]-[30]. The data mining research leads way to deal with the extraction of potentially useful information from large collections of data with a variety of application areas, such as market basket analysis, customer relationship management and bioinformatics [1]. Data mining has clearly shown us how efficient the tools are used in revealing the knowledge that is locked up within huge data base. But now data mining requires developing the methods that restrain the power of these tools to guard the privacy of individuals [2]. The original purpose of data mining is to reveal information about individuals rather than generalizing across population. Operation by evaluating individual data which is related to privacy concerns is the main drawback in data mining. So the reality is the problem is not data mining but the way how data mining is done [3][4][30]. The concept of privacy preserving data mining (PPDM) aimed at alleviating the conflict between data mining and privacy has been proposed by several researchers. As a first introduction of PPDM, the idea of [5] is to perturb individual data values [5]. When examining the details of the perturbation techniques, the two most important phases are; introducing noise and reconstructing the original distribution emerging [3], [6], [7].

Dmi  r  q  cpti Ccpt v Cj

Similarity

Membership matrix Similarity matrix Dissimilarity matrix Similarity value where x denote data, y denotes cluster centroid


2316


J. Anitha, R. Rangarajan

Random noise from a known distribution is added to privacy sensitive data, during the noise addition phase [8]. By the perturbation, only the randomized values are exposed, hiding the original data [5]. Certain task is performed to group similar items in a given data set into clusters, for achieving the goal of minimizing an objective function. The sum of the squares of the distances between the points in the database to their nearest cluster centers is defined as the objective function of error-sum-of-square (ESS) [9]. Vaidya and Clifton proposed privacy preserving techniques for clustering over vertically partitioned data [10]. The Kmeans algorithm as the process of clustering data is widely used because of its ease and ability to congregate very quickly in practice [11]. Using cryptographic techniques, privacy preserving data mining was developed. This branch became enormously popular [12] mainly for two reasons: One: Cryptography offers a welldefined model for privacy, which includes methodologies for proving and quantifying it. Second: Cryptography constructs implementations for privacy-preserving data mining algorithms, where a vast toolset of cryptographic algorithms exists [2]. In [12], [13] the authors have faced cryptographic techniques based on K-means algorithm, for the privacy-preserving clustering protocols [9]. In [12], [13], the authors attempt to solve the clustering problem to in a two-party setting suitable for deploy techniques, based on secret sharing. But during the clustering algorithm it [12] suffers from a problem, where a division operation is misinterpreted as multiplication by inverse which is not correct [11]. Few researches has pointed out that cryptography does not protect the output of a computation, instead it protects the privacy leaks in the practice of computation. So this leads to falls short of providing an absolute result to the problem of privacy preserving data mining [2]. A genuine solution to guard the privacy sensitive data of users is deploying cryptographic protocols. The kanonymity gives the assurance that no information can be linked to groups of less than k-individuals. The kanonymity model of privacy was considered intensively in the context of public data releases [16], [17], [18], [19], when the database owner wishes to guarantee that no one will be able to link the information gleaned from the database to individuals from whom the data has been collected [2]. Therefore k-anonymity has been proposed to reduce the risk of this type of attack [20]. So the main objective of k-anonymization is to guard the privacy of the individuals to whom the data pertains. Conversely, it is also essential that the released data remain as “useful” as possible subjecting to this constraint [21], [22]. Lots of recoding models have been proposed in the literature for k-anonymization continuously [19]. In distributed architecture, the numbers of data containers are connected with the single third party that knows the clustering procedure. The purpose of clustering data is that the data containers need to send their data to the third party and at the same time they need to keep privacy on data.


We anonymize the data by encrypting the original data with the secure key before clustering the data, in order to achieve privacy. The secure key was made by the data container and the third party because both are semi trusted; this secure key was obtained from the Diffie Hellman key exchange algorithm. The third party generating the public key sends the key publicly to the network with hiding some important values. After they receive the public key from the third party, the data containers generate their private key. Each data containers encrypt their data by XOR, with their private key. This leads to make the cipher text (anonymized data) and sends to the third party to cluster the data. The third party doesn’t know the original data, but receives only the cipher, where the main aim of the third party is to cluster all the received data. So for this purpose, it needs to combine all the cipher text into one. To make the distance matrix, the third party selects any data point as cluster centroid and the distance matrix helps to make the similarity matrix and dissimilarity matrix. To calculate the similarity and dissimilarity matrix, we need distance between the cluster head and the remaining data points. The third party groups the value from the membership matrix and derives it from the similarity matrix and dissimilarity matrix. For every grouping the cluster centroid gets updated and repeats the distance matrix. This process continues until the group has the same data points. The organization of the paper is as follows: The review of related research is given in section 2. The problem statement is described in section 3 and the contribution of this paper is given in section 4. The proposed technique to mine clusters using privacy preserving data mining is described in section 5 and obtaining the privacy is discussed in section 6. The experimental results and its discussion are presented in section 7 and the conclusions are summed up in section 8.

II.

Related Works

Ali Inan et al. [1] have proposed certain methods for constructing dissimilarity matrix of objects on the basic of Privacy Preserving Clustering on Horizontally Partitioned Data from distinguished sites, which was used for privacy preserving clustering, record linkage, database joins, and other operations that require pairwise comparison of individual private data objects horizontally distributed to multiple sites. By conducting experiments over synthetically generated and real datasets, they showed communication and computation complexity of their protocol. Their experiments were also performed for a baseline protocol which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and their protocol. Slava Kisilevich et al. [23] proposed a method Kanonymity of Classification Trees Using Suppression


2317


(kACTUS) for achieving k-anonymity, which are efficient multi-dimensional Suppression for KAnonymity. Instead of manually-producing domain hierarchy trees, efficient multi-dimensional suppression kACTUS is performed, where their values are suppressed only on certain records depending on their other characteristic values. Thus kACTUS identifies the characteristics that have less influence on the classification of data records and they suppress them in order to fulfill k-anonymity. This kACTUS technique was analyzed on ten separate datasets to evaluate its accuracy as compared to other k-anonymity generalization and suppression-based methods. Anna Monreale et al. [24] have derived Movement Data Anonymity through Generalization and proposed a method for achieving true anonymity in a dataset of published trajectories, by defining a transformation of the original GPS trajectories based on spatial generalization and k-anonymity. This proposed technique offers a quantified, theoretical upper bound to the probability of re-identification, which is a formal data protection safeguard. They provided strong empirical evidence that their anonymity techniques achieve the conflicting goals of data utility and data privacy, by conducting a systematic study on a real-life GPS trajectory dataset. Geetha Jagannathan et al. [25] proposed Communication-Efficient Privacy–Preserving Clustering, where they made a number of contributions. At first, they designed a simple, deterministic, I/O – efficient kclustering algorithm, with the goal of enabling an efficient privacy - preserving version of the algorithm. The presented algorithm used only sequential access to the data and examines each item in the database only once. Their experiments show that this algorithm produces cluster centers that are, on average, more accurate than the ones produced by the well known iterative k-means algorithm, and compares ell against BIRCH. Secondly they present a new clustering algorithm, by distributed privacy-preserving protocol for k-clustering. This protocol applies to databases that are horizontally partitioned between two parties. Their protocol is efficient in terms of communication and does not depend on the size of the database. Jinfei Liu et al. [26] addressed the problem of twoparty privacy preserving DBSCAN clustering. This proposed method provides analysis of the performance and proof and privacy of their solution. They presented that two protocols for privacy preserving DBSCAN clusters over horizontally and vertically partitioned data respectively and then are extended to arbitrarily partitioned data.

III. Problem Description There are n number of data containers, where n  2 each of which has horizontal partition of the data matrix represented as Dmi where 1  i  n . Each Dmi consists of a set of attributes Dmi   r  q  where q corresponds


to the maximum number of attributes and r leads to the maximum number of objects. These data containers DC j desire to cluster their data matrix Dmi to others in the distributed network. The data container DC j needs a third person to cluster their data matrix and the third party is denoted as TP . In the distributed network, each DC j and TP is semi trusted, where the data becomes known to all. In the process of clustering the data matrix, we anonymize the data before clustering, for the purpose of protecting the data matrix Dmi of all data container. One of the standard methods to anonymiz the data is encryption with high security key. Here, the key is very important aspect of the encryption process so, the key generation process is handled by both TP and DC j because both are semi trusted, where TP is used partially in the process of key generation and it concerns fully in construction of dissimilarity matrix and cluster, and DC j is partially, used in key generation and fully participate in anonymize their data matrix. In this paper, the secure key is one of the main aspects to achieve the privacy of the data since the key needs high protection. The well known Diffie Hellman key generation algorithm is use for key generation. Here the third party TP selects the large prime number p and selects the primitive root g of prime number p and also select the secret number a , the value of a should be between 0 to p  1 . With the help of p, g , a , TP calculates the public value u  g a mod  p  . Here, the value of p and u are sent publicly through network by TP and it keeps the value of a . The data containers in the network, receives the value of u and a , with the help of each DC j generating their secret key by this formula km  u b mod  p  where, the value of b is selected by the data, the container DC j and the value of b between 0 to p  1 also it kept confident. After the selection of key the attribute values of each data matrix is encrypt with the key as generated as the following equation cpti  km  Dmi  r  q  where, 2  q  n . The result of the encryption process is called as cipher text cpt . Likewise each of the data matrixes has converted into the cipher text cpti finally; the collection of cipher text is sent to the third party to cluster the data. The third party only receives the collection of cipher texts which are sent by the data container and the third party is not able to cluster the data in the form of received, since the third party combine all the cipher text into one; namely concatenated cipher text Ccpt . This Ccpt will help to the third party to make the clusters easily. The Ccpt data by the Fuzzy C- means Cluster algorithm, clusters the TP .


2318


IV.

Contribution of the Paper

number of attributes. Consider two data containers DC1 and DC2 which are connected with the third party TP to

 Public key: to give suggestion to the data container, to selects the prime number and, limits to select the secret number and primitive root of prime number.  Concatenated ciphertext: third party combines all the received cipher text into one, to make cluster process easy.  Distance matrix: to find the distance between the data points and cluster centroids  Similarity matrix: to find the attractive force of the cluster centroid with data points.  Dissimilarity matrix: to find the repulsive force of the cluster centroid with data points.  Membership matrix: to group the data based on the high member ship function of the cluster centroid.

V.

cluster their data matrix Dm1 and Dm2 .

Fig. 1. The distributed architecture of data container and third party

Proposed Technique to Mine Clusters Using Privacy Preserving Data Mining

The data matrix Dm1 and Dm2 9  8 has the number of objects r as 9 and number of attribute of q as 8. The data matrix has the medical data of the patients as shown in Fig. 2.

With the aim of achieving, the privacy preserving data clustering in a distributed architecture, we need to protect the data before clustering the data, for that, we have to anonymize the original data. We anonymize the original data with the help of encryption technique. The secure key fulfills the encryption process in order to achieve the secure key; where we use the key generation algorithm namely Diffie Hellman algorithm. In this paper, we attain privacy of cluster by the following two steps: 1. Anonymize the original data with the secure key, 2. Cluster the anonymize data by the Fuzzy C Means cluster algorithm. V.1.

V.1.1. Anonymize the Original Data with the Secure Key Each data container needs to cluster their data, the cluster algorithm is available in the third party but the third party and the other data containers are semi trusted. So, if the data is send directly to the third party then whole data may be known by all other data container and third party. Since, there is a necessity to anonymize the original data before sending the data to the third party. Here, we have to encrypt the entire original data with secure key to anonymize the original data. That secret key is important aspect for achieving the privacy of data. With the help of the Diffie Hellman key generating algorithm, we attain the secure key.

Distributed Architecture

The privacy preserving clustering process is the dealing with the distributed architecture. It consists of n number of data container DC and single third party TP . The each data container DC j consists of data matrix

Dmi and the data matrix consists of the p number of attributes and

r

number of objects

r  q .

V.2.

The

Here, we anonymize the data by using encryption technique with secure key and it’s obtained from the Diffie Hellman key generation algorithm [27], [29]. The construction of key is developed in third party and in data containers since both is semi trusted.

distributed architecture is given in Fig. 1. Each data containers have their data matrix Dmi and it consists of

r  q

where r is the number of objects and q is a

ID Age Sex BP 101 102 103 104 105 106 107 108 109

70 67 57 64 74 65 56 59 60

1 0 1 1 0 1 0 1 1

130 115 124 128 120 120 130 110 140

Cholesterol Heart Heart Sugar mg/dl rate patient 322 1 109 1 564 0 160 2 261 1 141 1 263 0 105 2 269 0 121 2 177 0 140 2 256 1 142 1 239 0 142 1 293 0 170 1

Diffie Hellman Key Generation

ID Age Sex BP 201 202 203 204 205 206 207 208 209

60 63 59 53 44 61 57 71 46

1 0 1 1 1 1 0 0 1

140 150 135 142 140 134 128 112 140

Cholesterol Heart Heart mg/dl Sugar rate patient 293 0 170 2 407 0 154 2 234 0 161 1 226 0 111 1 235 0 180 1 234 0 145 2 303 0 159 1 149 0 125 1 311 0 120 2

Fig. 2. The data matrix of the data container



2319


The third party generates one public key for all data container that will send the network publicly with hiding some important value. Every data container calculates a new private key by the received public value from the third party. By the use, of the private key we anonymize the original data matrix. The generation of the secure key has two steps mainly they are: 1. Public key generation in third party, 2. Private key generation in data containers. V.2.1.

Public Key Generation

The third party selects the large prime number p and it also selects the primitive root g which already selected prime number p . Third party selects one secret key a for calculating the public key, the value of a should be selected between 0 and p  1 . The following equation (1) is used to find the public key u of the third party: u  g a mod  p  (1) Consider the third party, where it selects the prime number p as 1117 and the prime number has some number of primitive roots that starts from 2 to 1115 totally of 360 in values. From here, the third party selects only one value as a primitive root g to calculate the public key u . Third party selects the primitive root g as 577 consequently it selects the secret number a as 729 between 0 to 1116. The third party generates the public key by plotting the values of p,g ,a into the equation:

u  577 729 mod 1117 

(2)

The Eq. (2) becomes 577 729 mod 1117  157 . The value of public key u (157) and the value of the prime number p (1117) both values are send to the network publicly. There is no need to send the value of primitive root g (577) and the value of secret number a (729). V.2.2.

Private Key Generation

k1  157 369 mod 1117 

(4)

k2  157 850 mod 1117 

(5)

The value of k1 (1093) is the secret key for data container1 and, k2 (1031) is the secret key for the data container2. These keys are used to anonymize the original data of the data matrix. V.2.3. Anonymization of the Data The data containers needs to cluster their data’s since they need third party, while clustering the data matrix, the data becomes easily hacked by the third party so privacy of the data becomes loss. In order to prevent that, we anonymize the each value of the attributes in the data matrix before it clusters, since the third party cannot identify the original values of data matrix. One of the standard ways of anonymize the data is to encrypt with the secure key. The secure key is obtained from the Diffie Hellman key generation algorithm. Each value in the data matrix is XOR with the related key km and resultant value is called as cipher text. After the encryption process, each data matrices is switched into cipher text. The following Fig. 3 shows cipher text of the data matrix as given in the above Fig. 2. Consider the data container DC1 having private key k1 , each data in the data matrix Dm1 is XOR with the private key k1 and we get the result as cipher text cpt1  1093  Dm1 where 2  q  9 . After the encryption of all data matrix we get the number of cipher texts. Now all the values of the data matrices are changed. This collection of cipher texts is sent to the third party, to cluster the data but this time the third party get confused about the data. V.3.

Third party suggests the data containers to select the prime number and primitive root by sending the values of u and p. If the third party does not generates the public value then each of the data container selects their own prime number, primitive root and secret number, which takes more time to calculate their private key and it reduces the efficiency of the system. After getting the value of u and p all the data containers has the same prime number and primitive root number. The data containers select their secret value b from 0 to p-1 and keep it very confidential. Each data container calculates their private key by the following Eq. (3):

km  u bi mod  p 

Consider the two data containers, that has received the value of u (157) and p (1117), each data container selects their secret number b1as 369 and b2 as 850 between 0 to p-1 and they calculate the secret key by plotting the value of b1,b2, u, p through, the above equation (3) we get:

Cluster the Anonymize Data by the Fuzzy C Means Cluster Algorithm

Third party only gets the ciphertext from all the data containers in the network. Third party uses the Fuzzy CMeans algorithm to cluster the data, in order to cluster the data by the FCM, third party needs to find the membership matrix for each concatenated cipher text however it takes more computation time. With the aim of reducing the computational complexity of the third party it combines all the cipher text into a single matrix called concatenated cipher text. The concatenation cipher text is also a matrix which consists of n number of objects and q number of attributes  n  q  .

(3)



2320


A1 101 102 103 104 105 106 107 108 109

A2 1027 1030 1148 1029 1039 1028 1149 1150 1149

A3 1092 1093 1092 1092 1093 1092 1093 1092 1092

A4 1223 1078 1081 1221 1085 1085 1223 1067 1225

A5 1287 1649 1344 1344 1352 1268 1349 1194 1376

A6 1092 1093 1092 1092 1093 1093 1092 1093 1093

A7 1064 1253 1224 1224 1084 1225 1227 1227 1263

A8 1092 1095 1092 1092 1095 1095 1092 1092 1092

A1 201 202 203 204 205 206 207 208 209

A2 1083 1080 1084 1074 1067 1082 1086 1088 1065

A3 1030 1031 1030 1030 1030 1030 1031 1031 1030

A4 1163 1169 1152 1161 1163 1153 1159 1143 1163

A5 1314 1424 1261 1253 1260 1261 1320 1170 1328

A6 1031 1031 1031 1031 1031 1031 1031 1031 1031

A7 1197 1181 1190 1128 1203 1174 1176 1146 1151

A8 1029 1029 1030 1030 1030 1029 1030 1030 1029

Fig. 3. The ciphertext of the data matrix Pseudo code

Consider the two cipher texts cpt1 and cpt2 as shown in above Fig. 3. The third party receives only the number of cipher texts. With the intention of making the cluster, the received format cipher text is incompetent for the third party. In fact, the third party combines all the cipher texts into the single matrix called concatenated cipher text. While combining objects of the both cipher text the cipher text cpt1 and cpt2 are merged together as shown in the following Fig. 4. In the FCM algorithm the data’s are clustered based on the membership function matrix. The third party selects the number of cluster to group the concatenated cipher text; next the third party selects any of data as centroid. Here, the data matrix has the medical data to group the patients based on the heart disease hence consider the number of cluster is 2. One cluster has the number of patients who is affected by the heart disease, another cluster has number of patient who is not affect by the heart disease. Here the internal data of the data matrix is not shown to the third party. With the aim of cluster the concatenated cipher text, the third party has three main steps: 1. Find the membership matrix for the selected centroid data point. 2. Cluster based on the membership function. 3. Update the centroid The above three steps are successively repeated until all the clusters have same data.

Input : collection of data matrix Output : clustered data Assumptions DC j = data container TP = Third party Dmi  r  q  = data matrix r belongs to objects q belongs to attributes p = prime number g = primitive root of selected prime number a = secret number selected by TP u = public key km = private key of DC j for Dmi  r  q  cpti = cipher text

Ccpt = concatenated cipher text v = number of cluster centroid C j = cluster centroid

dm = distance matrix

 xy = Membership matrix Begin 1. Set of data container

DC  j

where 2  j  n

2.

DC j  TP

3.

Dmi  r  q   DC j where i=j

4.

TP selects p , g , a

5.

TP calculates u  g a mod  p 

6.

TP sends the value of u and p to DC j

7.

DC j selects b

8.

DC j calculates km  u b mod  p  for Dmi  r  q  where

j=m=i 9.

cpti  km  Dmi  r  q 

10.

DC j sends cpti to TP

11.

TP calculates Ccpt 

A1 101 102 103 104 105 106 107 108 109 201 202 203 204 205 206 207 208 209

n

 cpt

i

1

12.

Select v

13.

Select C j

14. 15. 16. 17.

TP calculates dm Calculate Sm Calculate Dssm Derive  xy

18. 19.

Group based on  xy value Update C j

20.

Repeat step 12 to 17 until C j = C j

A2 1027 1030 1148 1029 1039 1028 1149 1150 1149 1083 1080 1084 1074 1067 1082 1086 1088 1065

A3 1092 1093 1092 1092 1093 1092 1093 1092 1092 1030 1031 1030 1030 1030 1030 1031 1031 1030

A4 1223 1078 1081 1221 1085 1085 1223 1067 1225 1163 1169 1152 1161 1163 1153 1159 1143 1163

A5 1287 1649 1344 1344 1352 1268 1349 1194 1376 1314 1424 1261 1253 1260 1261 1320 1170 1328

A6 1092 1093 1092 1092 1093 1093 1092 1093 1093 1031 1031 1031 1031 1031 1031 1031 1031 1031

A7 1064 1253 1224 1224 1084 1225 1227 1227 1263 1197 1181 1190 1128 1203 1174 1176 1146 1151

A8 1092 1095 1092 1092 1095 1095 1092 1092 1092 1029 1029 1030 1030 1030 1029 1030 1030 1029

Fig. 4. The concatenated cipher text

End



2321


V.3.1. Construction of Membership Matrix The third party selects the v number of centroid first consequently the third party selects centroid data cj randomly from the concatenated cipher text and find the membership value for each data point with the selected centroids. In order to find the membership function, the third party calculates the three matrices, the first one is Similarity matrix and second one is Dissimilarity matrix. Distance matrix The distance matrix is used to find the distance among all data points with selected cluster centroids. This distance matrix helps the third party in calculating the similarity matrix and dissimilarity matrix in an easy way.

d11

…

…

d1 j

…

…

d1n

. .

. . …

. . …

. .

. . …

. . …

. .

. . …

. . …

. . …

. . …

d i1 . .

d m1

di j . .

dm j

d jn . .

dm n

Fig. 5. The distance matrix

The dissimilarity matrix consists of distance among all the data points to the selected cluster centroid since the third party can fetch the value of the distance at any time during the calculation of similarity matrix and dissimilarity matrix. This leads to consume the execution time of the third party while clustering. The well known Euclidian distance is used here to find the distance between the data point and the cluster centroid.

centroid. The data point moves to the cluster centroid which has the highest similarity value among them.

s11

…

…

s1 j

…

…

s1n

. .

. . …

. . …

. .

. . …

. . …

. .

. . …

. . …

. . …

. . …

si1 . .

sm1

si j . .

sm j

sjn . .

sm n

Fig. 6. Illustrates the similarity matrix

Dissimilarity matrix The dissimilarity matrix is also [n×v] matrix which consists of dissimilarity value of the data point with the cluster centroid. The dissimilarity value describes how much distance is required to the data point go away from the cluster centroid. The following equation (7) is used to find the dissimilarity value of data points with each cluster: (7) D i j  max  d  ch j   di j In the above equation i corresponds, to the data point and j corresponds to the cluster centroid. Now we have to find the maximum distance of each cluster and subtract with the data point. This result is dissimilarity value of the data point. With the help of the dissimilarity value, the third party can calculates dissimilarity matrix by the following Eq. (8):

Di j

Dssxy 

(8)

n

  Di j  j 1

Similarity matrix With the help of the distance matrix, we can find the similarity matrix. The similarity matrix is [n×v] matrix where n is number of data points and, v is the selected cluster centroid. The matrix consists of similarity value of each data point that moves to the cluster centroid. The following Eq. (6) is used to find the similarity value of data points with each cluster:

S xy 

 di j  n   di j 

Dss11

…

…

Dss1 j

…

…

Dss1n

. .

. . …

. . …

. .

. . …

. . …

. .

. . …

. . …

. . …

. . …

Dssi1 . .

(6)

j 1

Dssm1

Dssi j . .

Dssm j

Dss j n . .

Dssm n

Fig. 7. The dissimilarity matrix

From the above equation the x in S xy denotes the data point and y denotes the cluster centroid. Based on the distance from the cluster centroid to the data points, third party calculates the similarity value of each data point. The similarity value of the data point declares how much the data point is closer with correspond cluster


Membership matrix Third party calculates the membership matrix with the help of similarity matrix and dissimilarity matrix. Each data points has the similarity value and dissimilarity value. Similarity value describes the attractive force of the data point with the cluster centroid


2322


and dissimilarity force describes the repulsive force of the data point with the cluster head. The third party cannot identify which one is important for clustering the data points since it gives equal importance to both values. The following equation is used to find the membership matrix. The membership function is found out by the equation (9) where  dpi,cj is a membership function of ith data point and jth centroid. After finding the membership, make it as a matrix called member ship matrix [n×v] consist of n number of data points and v number of cluster centroids:

x y 

S xy  Dss xy 2

(9)

where:  x y  Membership functions for ith datapoint with jth centroid. v  total number of centroids k  other centroids except the jth centroids

Cluster 2

101, 103, 107, 108, 109, 203, 204, 205, 207,208 102, 104, 105, 106, 201, 202, 206, 209

Obtain of Privacy

VI.1. Secret Key

V.3.2. Cluster Based on the Membership Function Each data point has the membership value for each centroid. The data point moves to the centroid which has the highest membership value among them. Consider the data points’ dp1, dp2, dp3… dpn and the two centroids C1, C2… Cn selected by the third party. For every data point n numbers of membership mm1, mm2, mm n. the data point moves to the centroid which has high membership value among them. Finally each clusters having the same amount of data points is a cluster but this time we are clustering the data by using the assumed centroid but there is need to find the exact centroid. Because using of that exact centroid, we can cluster the data in accurate manner. The following Eq. (10) helps to find the centroids: N

 dpi cj  dpi N

Cluster 1

The third party only gets the concatenated cipher text of the data container. The unique way to find the data by the third party is to find the secret key. Here, the privacy of the data is obtained by two ways first one is based on the secret key and second one is Concatenation of cipher text.

Fig. 8. Membership matrix

x 1

Update the value of the centroid and recluster the membership function. This is an iteration process and this iteration process is complete when the entire clustering has same data points in the next iteration. By considering the concatenated cipher text illustrated in Fig. 4 the third party makes the 2 clusters. The cluster A has the patient’s id in which the patients are not affected by heart disease and cluster B has the patient’s id in which the patients are affected by heart disease.

VI.

  11 . . .  1 cj . . .  1 cv    . . . . .  .  . . . .    m    dpi,1 . . .  dpi,cj . . .  dpi,cv    . . . .  . . . . . .      dpn,1 . . .  dpn,cj . . . dpn,cv 

Cy 

V.3.3. Update Centroid

(10)

 dpi

The third party sends the public key u and prime number p to all data containers. The value of prime number p is to help, the data container to choose the secret number b easily. Each of the data containers selecting their secret number b is different since, the secret key ki is also varied for all data containers. This is a main aspect of construct the privacy, because finding the value of p is carried out by trial and error methods and, the key is also varied from different data container. VI.2. Concatenation of Cipher Text After the encryption process, each data matrices is changed into cipher text. By using this collection of cipher text, the third party is not able to make the dissimilarity matrix for those, since it takes more time to calculate for all. Hence, the third party combines all cipher text into one, which is called as concatenated cipher text. This concatenated cipher text is sent to the third party via network for cluster. The third party gets confused with, which data is encrypted with which key and the concatenated cipher text helps the third party to make the dissimilarity matrix easily. At the same time third party cannot identify the original data of the data container. Hence we can obtain the privacy of the data.

x 1



2323


VII.

Results and Discussion

The experimental results of the proposed technique to mine clusters using privacy preserving data mining is described in this section. The comparative analysis of the clustering with the previous algorithm K-Means algorithm is presented for the real world datasets. VII.1.

Experimental Design

The proposed approach for effectual mining of sequential patterns is programmed using Java (jdk 1.6). The experimentation has been carried out using the synthetic datasets as well as the real datasets with i3 processor PC machine with 4 GB main memory running a 32-bit version of Windows XP. We have taken this real world data EMG physical Action Dataset from the UCI machine learning repository [28]. This dataset comprise of 8 activities (attributes), the dataset consists of 10 normal, and 10 aggressive physical actions. The normal physical action consists of following actions Bowing, Clapping, Handshaking, Hugging, Jumping, Running, Seating, Standing, Walking, Waving and the aggressive consists of following actions Elbowing, Front kicking, Hamering, Headering, Kneeing, Pulling, Punching, Pushing, Side kicking, Slapping. Each of the actions consist nearly 1000 instances and, the physical actions are considered as a cluster. We can give the EMG physical Action Dataset as an input data to our algorithm to cluster the data without knowing to the third party and we can evaluate the accuracy of our algorithm by comparing the result of our algorithm with the input data. VII.2.

matrix and dissimilarity matrix to calculate the exact membership function of the data point until the proposed approach does not exceeds the time taken for K-means algorithm but the K-means has only taken the distance measure for clustering the data

Fig. 9. The time taken of K-Means and Proposed algorithm while clustering

VII.4.

Performance Evaluation Based on the Memory Usage

The memory usages of both algorithms are evaluated by giving the different size of input data. The memory usage of the proposed system is higher than the K-Means algorithm because the proposed algorithm needs space to store the similarity matrix, dissimilarity matrix and membership matrix but the K-means algorithm does not need to store such matrix where else, the proposed algorithm need more memory space. Based on the above three matrix the proposed approach clusters the input data with high accuracy.

Evaluation Metrics

The performance of the proposed technique to mine clusters using privacy preserving data mining evaluated by means of three evaluation measures They are: 1) Running time- the time taken to execute the algorithm and it typically grows with the number of input size and 2) Memory usage- the memory utilized by the algorithm to finish the clustering process. 3) Accuracy- the result of the approach is compared with the original dataset based on the size of the input data. VII.3.

Performance Evaluation Based on Running Time

The proposed approach of the privacy preserving clustering is compared with the K- Means algorithm with the part of time need for clustering the input data. We given with the same amount of both algorithm, after that we evaluated the time taken to cluster their input data. This process is repeated for the different amount of input data. The Fig. 9 shows the time taken by both algorithm to cluster the input data. When we see the graph, the running time of the proposed algorithm is less than K-Means algorithm. The way of clustering the proposed approach is different by means of similarity


Fig. 10. The memory need of the K-Means and Proposed algorithm

VII.5.

Performance Evaluation Based on the Accuracy

The proposed approach is evaluated by comparing the result of the K- means algorithm and proposed algorithm is compared with real world data set “EMG physical Action Dataset”. The graph in Fig. 11 depicts that the accuracy of the proposed approach is better than the K-Means algorithm.


2324


The proposed approach clusters the input data by means of similarity matrix and dissimilarity matrix and membership matrix but the K-Means algorithm uses only the similarity matrix and hence the accuracy is improved in the proposed approach.

References [1]

[2]

[3]

[4] [5]

[6]

Fig. 11. Comparing the accuracy of the K- Means and Proposed algorithm

[7]

VIII. Conclusion

[8]

In this paper we have presented a technique to mine clusters using privacy preserving data mining. Here, our ultimate aim is to obtain the privacy while clustering. Our method is based on the membership matrix construction using the similarity matrix and dissimilarity matrix. Here, we have given equivalent importance to both matrices since the dissimilarity matrix deals with repulsive force with the clustering centroid and the similarity matrix deals with the attractive force with the clustering centroid. The main advantage of this method is that we make the privacy before the clustering process, since the third party cannot identify the original data, we anonymize the data by the secret key that was obtained by the Diffie Hellman algorithm. As we are dealing with the distributed architecture, each data containers has the different key to anonymize their data because it is very difficult to identify the key as well as the original data for the third party. By using our proposed approach, we made the privacy on original database and without the original data known to the third party, we can make the clusters. We have modified the Fuzzy C Means algorithm to improve the accuracy of clustering and we have proved the accuracy of the proposed approach is better than the K-Means algorithm. Future enhancement In this paper, we can make the privacy on entire database since the running time for anonymization process gets increased, in order to reduce that we have planned to select the sensitive attributes from the original medical database. We can make the anonymization only for the selected sensitive attributes, but the selection of sensitive attributes is not an easy task because each patient has different sensitive attributes.


[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17] [18]

[19]

[20]

[21]

Ali İnan, Selim V. Kaya, Yücel Saygin, Erkay Savas , Ayca A. Hintoglu, Albert Levi,” Privacy Preserving Clustering On Horizontally Partitioned Data,” Data and knowledge engineering , vol. 63, no. 3, pp. 646-666, 2007. Arik Friedman, Ran Wolff, Assaf Schuster, "Providing kanonymity in data mining," The International Journal on Very Large Data Bases, vol. 17, no.4, 2008. R. Agrawal and R. Srikant,” Privacy-preserving data mining,” in proceedings of the ACM SIGMOD conference on management of data, pp. 439-450, 2000. Y. Lindell and B. Pinkas” Privacy preserving data mining,” In proceedings of CRYPTO, pp. 36-54, 2000. Jinlong Wang, Congfu Xu, Yunhe Pan,”An Incremental Algorithm for Mining Privacy-Preserving Frequent Itemsets, ” proceedings of international conference on machine learning and cybernetics , pp.1132-1137, 2006. D. Agrawal and C. C. Aggarwal,”On the Design and Quantification of Privacy Preserving Data Mining Algorithms, ” In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp.247-255, 2001. H. Kargupta, S. Datta, Q. Wang,and K. Sivakumar. On the Privacy Preserving Properties of Random Data Perturbation Techniques. In Proceedings of the Third IEEE International Conference on Data Mining, pp. 99-106, 2003. Li Liu, Murat Kantarcioglu, Bhavani Thuraisingham, “The applicability of the perturbation based privacy preserving data mining for real-world data,” journal of data & knowledge engineering, vol.65, no.1, pp.5-21, 2008. G. Jagannathan, K. Pillaipakkamnatt, and R.N. Wright, “A New Privacy-Preserving Distributed k-Clustering Algorithm,” in Proceedings of the Sixth SIAM International Conference on Data Mining, 2006. J. Vaidya, C. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 639-644, 2002. Zekeriya Erkin, Thijs Veugen, Tomas Toft, Reginald L. Lagendijk,”Privacy-preserving user clustering in a social network,”Proceedings of IEEE international Workshop on Information Forensics and Security, pp.96-100, 2009. Geetha Jagannathan and Rebecca N. Wright, “Privacy-preserving distributed k-means clustering over arbitrarily partitioned data,” In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp.593– 599, 2005. P. Bunn and R. Ostrovsky, “Secure two-party k-means clustering,” In Proceedings of the 14th ACM conference on computer and communications security, pp. 486–497, 2007. W. Du and Z. Zhan,”Building Decision Tree Classifier on Private Data,” In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining, pp.1-8 2002. Murat Kantarcioˇglu, Jiashun Jin, and Chris Clifton. When do data mining results violate privacy? In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 599–604, 2004. Gagan Aggarwal, Tomás Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu,”Approximation algorithms for k-anonymity,” In Journal of Privacy Technology (JOPT), 2005. Roberto J. Bayardo Jr. and Rakesh Agrawal,”Data privacy through optimal k-anonymization,” InICDE, pp. 217–228, 2005. Elisa Bertino, Beng Chin Ooi, Yanjiang Yang, and Robert H. Deng. Privacy and ownership preserving of outsourced medical data. InICDE, pages 521–532, 2005. Kristen LeFevre,David J. DeWitt and Raghu Ramakrishnan, “Mondrian Multidimensional K-Anonymity,” 22nd international conference on data engineering, 2006. L. Sweeney,”K-anonymity: A model for protecting privacy,” International Journal on Uncertainty, Fuzziness, and Knowledgebased Systems, vol.10, no.5, pp.557–570, 2002. K. LeFevre, D.DeWitt, and R. Ramakrishnan. “Incognito: Efficient full domain k-anonymity,”In Proceedings of the ACM


2325


[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

SIGMOD International Conference on Management of Data, pp.49-60, 2005. A. Meyerson and R. Williams,”On the complexity of optimal kanonymity,” InProceedings of the 23rd ACM SIGACT-SIGMODSIGART Symposiums on Principles of Database Systems, pp.223228, 2004. Slava Kisilevich, Lior Rokach, Yuval Elovici, Bracha Shapira,”Efficient Multi-Dimensional Suppression for KAnonymity”, IEEE transaction on knowledge and data engineering, vol.22, no.3, pp.334-347, 2010. Anna Monreale, Gennady Andrienko, Natalia Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, Stefan Wrobel, “Movement Data Anonymity through Generalization,”in journal of Transactions on Data Privacy, vol.3, no.2, 2010. Geetha jagannathan, krishnan pollaipakkamantt, rebecca N. Wrighg, Daryl Umano,"Communication-Efficient Privacy – Preserving Clustering," journal of transaction on data privacy, vol.3, no.1, 2010. Jinfei Liu, Jun Luo, Joshua Zhexue Huang, Li Xiong,”Privacy Preserving Distributed DBSCAN Clustering,” join conference of EDBT/ICDT, 2012. W. Diffie and M. E. Hellman, “New Directions in Cryptography IEEE Transactions on Information Theory”, vol. IT-22, 1976, pp: 644–654, 1976. Dataset from UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/EMG+Physical+Action+Dat a+Set Zhang, F., Li, Z., Gan, L., Computational soundness of DiffieHellman key exchange against active attackers, (2012) International Review on Computers and Software (IRECOS), 7 (7), pp. 3507-3512. Qian, P., Wu, M., Privacy preserving in data aggregation of WSN, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2489-2494.


Authors’ information J. Anitha obtained her Bachelor’s degree in Computer Science and Engineering from Bharathiar University in the year 2000. She has also obtained her Master’s degree in Software Engineering from Anna University, Chennai. She is currently working as Assistant Professor in the Department of Information Technology at Sri Ramakrishna Engineering College. She is pursuing her current research in the field of data mining, specializing in the area of privacy preserving data mining. Her research interests also include data security, distributed data mining. Dr. R. Rangarajan obtained his Bachelor’s degree in Electronics and Communication Engineering in the year 1972 and Master’s degree in Power Systems from Madras University in the year 1982 respectively. He received his doctoral degree from Bharathiar University in 2006. From 1975 to 2007, he has held various positions as Assistant professor, Professor and Head of the department in Department of Electronics and Communication Engineering at Coimbatore Institute of Technology, Coimbatore and from 2007 to 2010, as the Principal at Sri Ramakrishna Engineering College, Coimbatore. He is currently, the Principal at Indus College of Engineering, Coimbatore. His research interests include VHDL and Verilog, Low Power VLSI Design and High Performance Communication Networks. He is the life member of ISTE, Fellowship member of The Institute of Engineers, the Institution Engineer and Member of Bio-Medical Engineering Society.


2326


Secured and Encrypted Data Aggregation with Message Authentication Code in Wireless Sensor Networks A. Latha, S. Jayashri Abstract – In wireless sensor networks, most of the application requires the transmission of data to base station in a secured manner. The existing techniques necessitate multi-hop forwarding that increases the cost and power utility. Also there is a chance for the malicious nodes to inject the false data during data aggregation and data forwarding. In order to overcome these issues, in this paper, we propose a secured and encrypted data aggregation with message authentication code in wireless sensor network. Initially the aggregator nodes are chosen based on the nodes connectivity. When the data aggregation is in process in the aggregation phase, where the encryption key and the verification key is assigned to the nodes while transmitting to the data aggregator, the MAC is also calculated by aggregator and the monitoring nodes. This helps in verifying the data integrity and false data detection. The monitoring node selection is done by the aggregators. By simulation results, we show that the proposed approach improves the data privacy and also reduces the load and power consumption. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Wireless Sensor Networks (WSN), Data Aggregation, Message Authentication, Power Consumption

Nomenclature EKi VKi Ac Di MNi α1, α2 Fi Q BS CH RSS

Encryption key Verification key Aggregator node Data Monitoring nodes Random numbers Modulus function Group of data aggregators Base station Cluster head Received signal strength

I.

Introduction

Wireless Sensor networks (WSN) is an emerging technology and have great potential to be employed in critical situations. Basically, sensor networks are application dependent. Sensor networks are primarily designed for real-time collection and analysis of low level data in hostile environments with low-power, low-cost, small-size devices using sensors to cooperatively collect information through infrastructure less ad-hoc wireless network. In sensor network, the sensor nodes are deployed in large numbers and they collaborate to form an ad-hoc network capable of reporting to a data collection sink. Factor that affects the WSN performance is secure communication instability, if two sensors that have same aggregator node start sending packets at the Manuscript received and revised September 2013, accepted October 2013

2327

same time, conflicts will occur near the aggregator node and the transfer process will fail. One of the major challenges wireless sensor networks face today is security in data aggregation. Security plays a fundamental role in many wireless sensor network applications. Due to these limitations data aggregation is an important consideration for sensor networks. In sensor networks, the data gathering is a basic capability expected of any wireless sensor network. The usual means of data gathering is to have all nodes send their measurements to a particular node i.e. an aggregator; these aggregators can either be special nodes or regular sensors nodes. So the challenge is to facilitate per-hop as well as end-to-end security. Since data aggregation has been put forward as an essential paradigm for routing in wireless sensor networks. After the data gathering and during transmissions to the base station, each node along the routing path cooperatively integrates and should secure the data. Considering the distributed nature of WSNs, data integrity is of concern. Since integrity implies that any aggregate result is made up of only legitimate data without inclusions and that corrupted sensors cannot interfere with aggregation operations. So without some security scheme, messages can easily be injected into the network or be modified along routing paths [1]-[9]. I.1.

Advantages

 By data aggregation the efficiency and effectiveness


A. Latha, S. Jayashri

of the sensor network will significantly increases [6].  Data aggregation provides Availability, Confidentiality and Flexibility of the sensor network [10].  Since the sensor networks are self-configurability, autonomy, and easiness of deployment and that make them extremely useful for a variety of applications in environmental monitoring, home automation, medical applications, wildfire detection, traffic regulation and many others [10][11][12][13]. I.2.



Areas to be Concentrated in Sensor Networks [1][2][3][20]

 Data Integrity: Data integrity in sensor networks is needed to ensure the reliability of the data and refers to the ability to confirm that a message has not been tampered with, altered or changed. Even if the network has confidentiality measures, the integrity of the network will be in trouble when a malicious node present in the network injects false data and unstable conditions due to wireless channel cause damage or loss of data.  Data Confidentiality: Confidentiality is the ability to conceal messages from a attacker so that any message communicated via the sensor network remains confidential. This is the most important issue in network security. A sensor node should not reveal its data to the neighbors.  Data Authentication: Authentication ensures the reliability of the message by identifying its origin. Data authentication is achieved through symmetric or asymmetric mechanisms where sending and receiving nodes share secret keys. Due to the wireless nature of the media and the unattended nature of sensor networks, it is extremely challenging to ensure authentication.  Data Availability: Availability determines whether a node has the ability to use the resources and whether the network is available for the messages to communicate. However, failure of the base station or cluster leader’s availability will eventually threaten the entire sensor network. Thus availability is of primary importance for maintaining an operational network.  Data Accuracy: major outcome of any aggregation scheme is to provide an aggregated data as accurately as possible since it is worth nothing to reduce the number of bits in the aggregated data but with very low data accuracy. A trade-off between data accuracy and aggregated data size should be considered at the design stage because higher accuracy requires sending more bits and thus needs more power. I.3.



Kinds of Attacks on WSN Aggregation [1][3][21][23][24]

 Denial of Service Attack (DoS): it is a standard attack on the WSN by transmitting radio signals that Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved







interfere with the radio frequencies used by the WSN which is normally called jamming. In the aggregation context, the DoS can be an aggregator that refuses to aggregate and prevents data from traveling into the higher levels. Node Compromise: it is where the adversary is able to reach any deployed sensor and extract the information stored on it which is sometimes called supervision attack. Considering the data aggregation scenario, once a node has been taken over, all the secret information stored on it can be extracted Sybil Attack: is where the attacker is able to present more than one identity within the network. Here an adversary may create multiple identities to generate additional votes in the aggregator election phase and select a malicious node to be the aggregator and the aggregated result may be affected if the adversary is able to generate multiple entries with different readings. Selective Forwarding Attack: it is assumed in the WSN that each node will accurately forward receive messages. But it is up to the adversary that is controlling the compromised node to either forward the received messages or not. In the aggregation context, any compromised intermediate nodes have the ability to launch the selective forwarding attack and this subsequently affects the aggregation results. Replay Attack: here an attacker records some traffic from the network without even understanding its content and replays them later on to mislead the aggregator and consequently the aggregation results will be affected. Stealthy Attack: The adversary aims to inject false data into the network without revealing its existence. In a data aggregation scenario, the injected false data value leads to a false aggregation result. A compromised node can report significantly biased or fictitious values, and perform a Sybil attack to affect the aggregation result.

II.

Literature Review

Prakash G L et al., [4] have proposed privacypreserving data aggregation scheme for additive aggregation functions. The Cluster-based Private Data Aggregation (CPDA) leverages clustering protocol and algebraic properties of polynomials. The goal of the proposed approach is to bridge the gap between collaborative data collection by wireless sensor networks and data privacy. And this approach has the advantage of incurring less communication overhead. And the authors will try to include designing private-preserving data aggregation schemes for general aggregation functions in their future work. Jacques M. Bahi et al., [9] have proposed a secure end-to-end encrypted-data aggregation scheme which is based on elliptic curve cryptography that exploits a smaller key size. Additionally, it allows the use of higher number of operations on cypher-texts and prevents the International Review on Computers and Software, Vol. 8, N. 10

2328


distinction between two identical texts from their cryptograms. These properties permit to proposed approach to achieve higher security levels than existing cryptosystems in sensor networks. And this proposed approach permits the generation of shorter encryption asymmetric keys, which is so important in the case of sensor networks. The advantage of this approach is that it reduces computation and communication overhead compared to other works, and can be practically implemented in on-the-shelf sensor platforms. Claude Castelluccia et al., [14] have proposed a simple and provably secure encryption scheme that allows efficient additive aggregation of encrypted data. The security of this scheme is based on the indistinguishability property of a pseudorandom function (PRF), a standard cryptographic primitive. To protect the integrity of the aggregated data, authors have constructed an end-to-end aggregate authentication scheme that is secure against outsider-only attacks, also based on the indistinguishability property of PRFs. And the advantage of this proposed approach is that aggregation based on this scheme can be used to efficiently compute statistical values, such as mean, variance, and standard deviation of sensed data, while achieving significant bandwidth savings. Shih-I Huang et al., [15] have proposed secure encrypted-data aggregation scheme for wireless sensor networks. This proposed approach for data aggregation eliminates redundant sensor readings without using encryption and maintains data secrecy and privacy during transmission. Conventional aggregation functions operate when readings are received in plaintext. If readings are encrypted, aggregation requires decryption creating extra overhead and key management issues. And the advantage of the proposed scheme provides security and privacy, and duplicate instances of original readings will be aggregated into a single packet. And also scheme is resilient to known-plaintext attacks, chosen-plaintext attacks, ciphertext-only attacks and man-in-the-middle attacks. Suat Ozdemir et al., [16] have proposed a Data Aggregation and Authentication protocol, called DAA, to integrate false data detection with data aggregation and confidentiality. And to support data aggregation along with false data detection, the monitoring nodes of every data aggregator also conduct data aggregation and compute the corresponding small-size message authentication codes for data verification at their pairmates. And supports the confidential data transmission, the sensor nodes between two consecutive data aggregators verify the data integrity on the encrypted data rather than the plain data. Advantage of DAA is, it detects any false data injected by up to compromised nodes, and that the detected false data are not forwarded beyond the next data aggregator on the path. Xiaodong Lin et al., [17] have proposed multidimensional privacy-preserving data aggregation scheme for improving security and saving energy


consumption in wireless sensor networks (WSNs). The proposed scheme integrates the super-increasing sequence and perturbation techniques into compressed data aggregation, and has the ability to combine more than one aggregated data into one. Compared with the traditional data aggregation schemes, the proposed scheme not only enhances the privacy preservation in data aggregation, but also is more efficient in terms of energy costs due to its unique multidimensional aggregation. Zhijun Li et al., [18] have proposed a succinct and practical secure aggregation protocol 5by combining HMAC (associated with a cryptographic hash function) with Bloom filter, which then is defined as secure Bloom filter. This proposed approach, firstly is an effective aggregation protocol which is suitable for a specific but popular class of aggregation in wireless sensor networks. The advantage from secure Bloom filter, the protocol, without any unrealistic assumptions, fulfills the fundamental security objective of preventing outside adversaries and compromised inside nodes from harming the overall network result.

III. Secured and Encrypted Data Aggregation III.1. Overview In this paper, we propose a Secured and Encrypted Data Aggregation Technique with Message Authentication Code in Wireless Sensor Networks. Initially the nodes deployed in the network are grouped into clusters and the cluster heads are selected based on the node connectivity. The cluster heads acts as aggregator nodes. When any cluster member wants to transmit the data to the aggregator, it encrypts the data using the data encryption technique. During data aggregation process, MAC value is computed by the aggregator node and monitoring nodes which help in verifying the data integrity and detection of false data. Here monitoring nodes are selected by the aggregators. III.2. Proposed Architecture Fig. 1 demonstrates the proposed architecture of clustered sensor network. C1 and C2 represent clusters and CH1 and CH2 represents the cluster heads respectively. These cluster heads act as the aggregator nodes for collecting the information from the sensor nodes and transmitting it to the base station (BS). The clustering process is described below. The sensor nodes in the network perform the cluster head selection based on the nodes connectivity. The nodes containing higher connectivity when compared with its 2-hop neighbors are initially chosen as CHi. The selected cluster heads then broadcast an advertisement message to all its surrounding nodes. The advertisement message includes the cluster-head ID and location information of the cluster head.


2329


Ei  Di   Di  f  EKi   EKi || EK i  VK i

(1)

where || indicates data concatenation. 3) Ni then transmits Ei (Di) to Ac : E D



i i Ni  Ac

4) In case, Ac wants to transmit the data to BS, it encrypts the data with its verification key VK c  VK c 1c and sends it to BS. 5) BS decrypts the data with the hash function f and h and the respective verification key. Following the data encryption, the data aggregation and authentication protocol (DAA) is executed as described in the following section. Fig. 1. Architecture of Sensor Network

The non cluster head nodes first record all the information from cluster heads within their communication range. Each non-cluster head node chooses one of the strongest Received Signal Strength (RSS) of the advertisement as its cluster head and transmits a member message back to the chosen cluster head. The information about the node’s capability of being a cooperative node, i.e., its current energy status is added into the message. The message also includes information related to consistency value, consistent sensing count and inconsistent sensing count of the node. If an advertisement message signal is obtained at a CHi from another cluster head CHj, which has the RSS value greater than a threshold, then CHj will be considered as the neighbor cluster head and the ID of j is stored [19]. III.3. Data Encryption Technique Our proposed lightweight encryption algorithm provides secrecy and privacy during data transmission and it supports the data aggregation. Let Ni represent the any node in the sensor network Let Di represent the data Let EKi represent the encryption key Let VKi represent the verification key Let Ac represent the aggregator node The steps involved in the encryption of data are as follows: 1) Initially Ni is assigned with a one-way hash function f, and proof key VKi. Ac is assigned with one-way hash function h and aggregator verification keys VK c  VK c 1c . BS stores both the hash functions of node and aggregator i.e. both f and h and all the verification key VK for all i. 2) When Ni wants to transmit Di to Ac, it randomly generates the EK i and encrypts the data with EK i , f

III.4. Data Aggregation and Authentication (DAA) Protocol The data aggregation and authentication protocol (DAA) helps in detection of false data, securing data aggregation and providing data confidentiality at the data aggregators and respective neighboring nodes and also performs data verification while forwarding. DAA involves the following two phases which is explained in the following sub-sections: 1) Selection of Monitoring Nodes; 2) Secure data aggregation and detection of false data. III.4.1. Phase 1 Selection of Monitoring Nodes Consider Q   A0 , A1 ,...........An  represent the group of data aggregators on a path Z from a sensor node to BS. Let Ac represent the recently used data aggregator node where 1  c  n  1 . In this phase, M number of neighboring nodes of Ac is selected as monitoring nodes  MNi  from the total n neighboring nodes where n  M . Then, every Ac is monitored by the chosen MNi . The steps involved in the selection of monitoring nodes is as follows: 1) Initially Ac transmits a request message (REQ) to its neighboring nodes  N nei  for obtaining the node ID and random numbers: REQ Ac   Nnei

2) Nnei upon receiving REQ message generate two random numbers 1 and  2 using pseudo random number generator (PRNG). Using the generated random numbers, it computes the message authentication code  MAC 1 |  2   . N nei then transmits identity IDi, 1 ,  2 , and  MAC 1 |  2   :

and VK i . The encrypted data  Ei  Di   is represented ID  , ,MAC  |  

i 1 2 1 2  Nnei  Ac

using the following equation:



2330


3) Ac upon receiving the random numbers and node IDs from n neighboring nodes, it marks them as Ni in the receiving order of their random numbers where 1 i  n. 4) Ac then sorts the entire 2  n random numbers 1 , 2 ,.... 2n  in an ascending order and computes

MACVK c  VK c 1  a1 | a2 | ...| a2 xn  using: VK c  VK c 1 5) The sorted random numbers and MAC along with node IDs and with newly marked labels are broadcasted by the Ac:

 MACVKi 1 |  2 | ....|  2n 

sub

MAC2

named

sub

MAC

5) Ac then transmits the MAC full  DAG  to Nmi:

6) Each Ni upon receiving the broadcast message from Ac verifies whether the random number transmitted previously is same as the received one. If verification is successful: Then: VK c VKc 1

3) Nmi computes VKi  DAG 

4) Ac concatenates the subMAC of Nmi with its own subMAC to generate MAC full  DAG 

Ac  Ni :  IDi 1 , 2 ,...., 2n  Ni 

E VKi  MAC

This is realized by computing MACfull using monitoring and aggregator nodes in the network. This means that one subMAC is computed by the monitoring nodes and other SubMAC is computed using aggregator node. The steps involved in securing the data aggregation and false data detection are as follows: 1) Initially, Ac and Nmi gathers the encrypted data to generate the aggregated data DAG. 2) Ac computes sub MAC1 named subMAC VKc  VKc 1  DAC 

MAC

 DAG 

full Ac   Nmi

6) Nmi upon receiving the MAC full  DAG  verifies its sub MAC. If verification is successful: Then the data is authenticated and transmitted to BS Else

1 | 2 ....| 2n 

Ni   Ac Else:

verification _ failure Nmi   Ac

Re questing _ Re  selection _ of _ monitoring _ nodes Ni   Ac

End if. The above condition describes that when the verification is successful, then Ni encrypts the message with a key shared with Ac and transmits it to Ac. Else it informs about the verification failure to its neighboring nodes and sends a request to Ac for reselecting the monitoring nodes. 7) Ni executes the modulus function (Fi) for the determining the indices of M monitoring nodes. Through this estimation, the monitoring nodes are selected:

 n 1i   Fi   Rk  VKi  mod  n    1  k i  



Ac discards the false data. End if. Fig. 2 demonstrates the packet format of secure data aggregation. Each full size MAC (MACfull) fields in the right includes two subMAC. The size of field (in bytes) is shown within the parenthesis. The fields from the left of the figure are destination address, active message type, length of the message, source address and packet sequence number.

(2) Fig. 2. Packet structure of secure data aggregation

If index i of any Ni = Fi. Then: Ni is chosen as monitoring node (Nmi). End if. 8) If there exists duplicate Fi Then Fi is estimated again by incrementing the k value by 1. End if

Advantages of the Proposed Approach 1) DAA protocol helps in drop the redundant and also the false packets from the network through the MACs, which proportionally decreases the load and power consumption. 2) The secrecy, privacy and encryption are provided to the data packets during the aggregation.

IV. III.4.2. Phase 3 Secure Data Aggregation This phase helps in verifying the data integrity and detection of false data. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Simulation Results

The performance of Secured and Encrypted Data Aggregation with Message Authentication (SEDAA) is


2331


evaluated through NS-2 [22] simulation. A random network deployed in an area of 750×750m is considered. The sink is assumed to be situated 100 meters away from the above specified area. In the simulation, the channel capacity of mobile hosts is set to the same value: 2 Mbps. The simulated traffic is CBR with UDP source and sink. The number of attackers is varied from 1 to 5.

From Fig. 4, we can see that the packet drop of our proposed SEDAA is 36% less than the existing DAA method. From Fig. 5, we can see that the energy consumption of our proposed SEDAA is 2% less than the existing DAA method. From Fig. 6, we can see that the throughput of our proposed SEDAA is 0.03% higher than the existing DAA method. B. Based on Flows In our second experiment we vary the levels of flows as 1, 2, 3 and 4. From Fig. 7, we can see that the delivery ratio of our proposed SEDAA is 0.1% higher than the existing DAA method. From Fig. 8, we can see that the packet drop of our proposed SEDAA is 38% less than the existing DAA method. From Fig. 9, we can see that the energy consumption of our proposed SEDAA is 4% less than the existing DAA method. From Fig. 10, we can see that the throughput of our proposed SEDAA is 0.03% higher than the existing DAA method.

TABLE I SIMULATION PARAMETERS No. of nodes 46 Area size 750×750 Mac 802.11 Routing protocol AODV Simulation time 100 s Traffic Source CBR Packet Size 512 bytes Flows 1,2,3 and 4 Transmission range 250 m Transmit power 0.395 W Receiving power 0.660 W Idle power 0.035 W Initial energy 14.3 Joules

Attackers Vs Drop 80000

IV.1. Performance Metrics

A. Based on Attackers In our first experiment we vary the number of attackers as 1, 2, 3, 4 and 5. From Fig. 3, we can see that the delivery ratio of our proposed SEDAA is 0.19% higher than the existing DAA method.

DAA

20000 1

2

3

4

5

Attackers

Fig. 4. Attackers Vs Drop Attackers Vs Energy 15 14.5 14 13.5 13 12.5

SEDAA DAA

1

2

3

4

5

Attackers

Fig. 5. Attackers Vs Energy

Attackers Vs DeliveryRatio

Attackers Vs Throughput

0.8

20000

0.6

Throughput

DeliveryRatio

SEDAA

40000 0

Energy(J)

IV.2. Result Section

Pkts

60000

The performance of SEDAA technique is compared with DAA technique [16]. The performance is evaluated mainly, according to the following metrics.  Average Packet Delivery Ratio: It is the ratio of the number .of packets received successfully and the total number of packets transmitted;  Throughput: It is the number of packets received by the sink successfully;  Packet Drop: It refers to the no. of valid packets dropped due to malicious nodes;  Energy: It is the average energy consumed for the data transmission.

SEDAA

0.4

DAA

0.2 0 1

2

3

4

5

15000

SEDAA

10000

DAA

5000 0 1

Attackers

2

3

4

5

Attackers

Fig. 3. Attackers Vs Delivery Ratio

Fig. 6. Attackers Vs Throughput



2332


When the data aggregation is in process in the aggregation phase, where the encryption key and the verification key is assigned to the nodes while transmitting to the data aggregator, the MAC is also calculated by aggregator and the monitoring nodes. This helps in verifying the data integrity and false data detection. The monitoring node selection is done by the aggregators. By simulation results, we have shown that the proposed approach improves the data privacy and also reduces the load and power consumption.

DeliveryRatio

Flow s Vs DeliveryRatio 1.5 1

SEDAA DAA

0.5 0 1

2

3

4

Flow s

Fig. 7. Flows Vs Delivery Ratio

References [1]

Pkts

Flow s Vs Drop 25000 20000 15000 10000 5000 0

[2]

SEDAA

[3]

DAA

1

2

3

4

[4]

Flow s

[5]

Fig. 8. Flows Vs Drop Flow s Vs Energy

[6] Energy(J)

20 15

SEDAA

10

DAA

5

[7]

0 1

2

3

[8]

4

Flow s

[9]

Fig. 9. Flows Vs Energy

Throughput

Flow s Vs Throughput

[10]

15000

[11]

10000

SEDAA

[12]

DAA

5000 0 1

2

3

4

[13]

Flow s

[14] Fig. 10. Flows Vs Throughput

V.

[15]

Conclusion

In this paper, we have proposed a secured and encrypted data aggregation with message authentication code in wireless sensor network. Initially the aggregator nodes are chosen based on the nodes connectivity.


[16]

G. Padmavathi, D. Shanmugapriya, “A Survey of Attacks, Security Mechanisms and Challenges in Wireless Sensor Networks”, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 4, No. 1 & 2, 2009. Zhijun Li and Guang Gong, “A Survey on Security in Wireless Sensor Networks”, 2008. Hani Alzaid, Ernest Foo and Juan Gonzalez Nieto,k “Secure Data Aggregation in Wireless Sensor Network: a survey”, Australasian Information Security Conference in Research and Practice in Information Technology (CRPIT), Vol. 81, 2008. Prakash G L, Thejaswini M, S H Manjula, K R Venugopal and L M Patnaik, “Secure Data Aggregation Using Clusters in Sensor Networks”, World Academy of Science, Engineering and Technology, 51, 2009. Baljeet Malhotra, Ioanis Nikolaidis and Mario A. Nascimento, “Aggregation Convergecast Scheduling in Wireless Sensor Networks”, Journal Wireless Networks archive Volume 17 Issue 2, Pages 319-335, February 2011. A.Bartoli, J. Hernández-Serrano, M. Soriano, M. Dohler, A. Kountouris and D. Barthel, “Secure Lossless Aggregation for Smart Grid M2M Networks”, Smart Grid Communications (SmartGridComm), First IEEE International Conference on 4-6 Oct. 2010. Sanjay Madria and Sriram Chellappan, “Secure Data Aggregation in Sensor Networks”, 2007. Lu Su, Yan Gao, Yong Yang and Guohong Cao, “Towards Optimal Rate Allocation for Data Aggregation in Wireless Sensor Networks”, MobiHoc Proceedings of the Twelfth ACM International Symposium on Mobile Ad Hoc Networking and Computing Article No. 19, Paris, France, May 16–19, 2011. Jacques M. Bahi, Christophe Guyeux and Abdallah Makhoul, “Efficient and Robust Secure Aggregation of Encrypted Data in Sensor Networks”, SENSORCOMM, 4-th Int. Conf. on Sensor Technologies and Applications, Italy, 2010. Tamer AbuHmed and DaeHun Nyang, “A Dynamic Level-based Secure Data Aggregation in Wireless Sensor Network”, 2009. Rodrigo Roman and Javier Lopez, “Integrating Wireless Sensor Networks and the Internet: A Security Analysis”, Internet Research, Vol. 19 Iss: 2, pp.246 – 259, 2009. Jacques M. Bahi, Christophe Guyeux and Abdallah Makhoul, “Secure Data Aggregation in Wireless Sensor Networks Homomorphism versus Watermarking Approach”, ADHOCNETS, 2nd Int. Conf. on Ad Hoc Networks, Canada, 2010. Suat Ozdemir a and Yang Xiao, “Secure data aggregation in wireless sensor networks: A comprehensive overview”, Elsevier, 2009. Claude Castelluccia, Inria, Aldar C-F.Chan and Einar Mykletun, “Efficient and Provably Secure Aggregation of Encrypted Data in Wireless Sensor Networks”, ACM Transactions on Sensor Networks, Vol. 5, No. 3, Article 20, May 2009. Shih-I Huang, Shiuhpyng Shieh and J. D. Tygar, “Secure encrypted-data aggregation for wireless sensor networks”, Science+Business Media, LLC, Springer, 2009. Suat Ozdemir and Hasan Çam, “Integration of False Data Detection with Data Aggregation and Confidential Transmission in Wireless Sensor Networks”, IEEE/ACM Transactions on networking, VOL. 18, NO. 3, JUNE 2010.


2333


[17] Xiaodong Lin, Rongxing Lu and Xuemin (Sherman) Shen, “MDPA: multidimensional privacy-preserving aggregation scheme for wireless sensor networks”, Wireless Communication Mobile Computing, 843–856, 2010. [18] Zhijun Li and Guang Gong, “On Data Aggregation with Secure Bloom Filter in Wireless Sensor Networks”, 2010. [19] HevinRajesh, D. and B. Paramasivan, “ Fuzzy Based Secure Data Aggregation Technique in Wireless Sensor Networks”, Journal of Computer Science, pp 899-907, 2012. [20] Lu, C., Xiong, H., Liu, Z., An asymmetric encryption algorithm for wireless sensor networks based on elliptic curve cryptosystem, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2290-2297. [21] Qian, P., Wu, M., Privacy preserving in data aggregation of WSN, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2489-2494. [22] Network Simulator: http:///www.isi.edu/nsnam/n [23] Said Ben Alla, Abdellah Ezzati, A Qos-Guaranteed Coverage and Connectivity Preservation Routing Protocol for Heterogeneous Wireless Sensor Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 363-371. [24] Reza Mohammadi, Reza Javidan, Adaptive Quiet Time Underwater Wireless MAC: AQT-UWMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (4), pp. 236-243.

Authors’ information A. Latha obtained her Bachelor’s degree in Electronics & Communication Engineering from Bharathiyar University. Then she obtained her Master’s degree in Medical Electronics and Pursuing PhD in Electronics and Communication Engineering majoring in Wireless Sensor Network in Anna University, Chennai, India. Currently, she is an Assistant Professor in Computer Science and Engineering, Adhiparasakthi Engineering College, Melmaruvathur, Tamilnadu, India. Dr. S. Jayashri received the B.E. degree from Madurai Kamarajar University, India in 1982, M.E. degree from Anna University, Chennai, India, in 1992 and the Ph.D. degree from Anna University, Chennai, India in 2004. From 1994 to 2001, she worked as an Assistant Professor at SRM University, India. She is currently Professor in the Department of Electronics and Communication Engineering, Adhiparasakthi Engineering College, Melmaruvathur, Tamilnadu, India. Her research interests are in Optical communication, Wireless communication and Networks, Cloud Computing and Nano Network Communication. She served as Technical Chair person in various International conferences. She is a member of the IEEE IETE, IACSIT, IAENG.



2334


Hybrid Approach for Energy Optimization in Wireless Sensor Networks Using ABC and Firefly Algorithms T. Shankar1, S. Shanmugavel2 Abstract – Wireless sensor networks are made up of sensor nodes which are usually battery operated devices; hence the energy saving of sensor nodes is a major design issue. The life time of the network is the issue which affects the network, hence the energy optimization is required to increase the networks life time and improve the performance of network. Optimization algorithms are one of the best ways for energy optimization in wireless sensor networks (WSN). The objective of this paper is to analyze the life time and residual energy of the network based on optimization algorithms. Clustering is one of the best approaches used in many of the WSN routing algorithms where the appropriate cluster head to be selected for energy optimization. The proposed energy optimization algorithms are firefly and hybrid which is seen to provide better performance than traditional algorithm like direct transmission and LEACH (Low Energy Adaptive Clustering Hierarchy)protocols. The hybrid algorithm is formed by combining the Artificial Bee colony ABC and firefly optimization algorithm. The proposed technique improves the life time of the network, residual energy and throughput of the wireless sensor network. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Clustering, Firefly Algorithm, Energy Optimization, Hybrid

I.

Introduction

Wireless sensor networks consists of tiny battery operated sensor nodes distributed randomly in the field, which sense the physical qualities [1]-[18]. The sensed data is collected at the base station or the sink node which takes all the necessary action. The typical applications [2] of wireless sensor network are in temperature, pressure, humidity monitoring of a landscape, in military applications to sense the motion in hostile environment and many more. These sensor nodes can be configured easily, ready to install and works with specific battery power. Usually the sensor node has the confined processing capability due to the dependency on applications, also it work on limited bandwidth. The main challenge while designing routing protocol is to deal with the limited battery power of sensor devices. The limited battery power of the sensor nodes is the main problem which affect the network lifetime. Early death of nodes may fail the purpose of application hence the energy optimization technique is required for the purpose of routing [4] the sensor data to the base station. Clustering [3] is one of the best approach which can improve the network life time and throughput of the network significantly. A cluster head is chosen within each cluster that is responsible to collect and aggregate all the data and transmit it to the base station. In clustering, one round of communication has two steps; first the sensed data is transmitted to the cluster heads by the member nodes within their clusters, second the collected data by cluster head is aggregated and transmit to the sink node. Manuscript received and revised September 2013, accepted October 2013

2335

The main advantage of clustering is the better resource utilization and good management of sensor nodes in the network. The frequency band can be reused within the clusters, also clustering provide load sharing which can conserve the battery power effectively [14][15][16]. LEACH protocol [1] was the first effort in this direction which used the clustering technique to improve the lifetime of wireless sensor networks. It is a dynamic clustering method where cluster head changes for each round of communication. LEACH is a single hierarchical protocol where nodes first send the data to cluster heads, and then cluster head transmit to the base station. The main task of LEACH protocol is to select the appropriate cluster heads according to a priori known as threshold equation [1]. Later some variants of LEACH was introduced i.e. LEACH-C, ALEACH [6] etc. However LEACH provides good technique for communication but it has several draw backs also.

II.

First Order Radio Model

Currently there is a great deal of research in the area of low energy radios. Different assumptions about the radio characteristics, including energy dissipation in transmit and receive modes, will change the advantages of different protocols. This paper work assumes a simple model where the radio dissipates Eelec = 70 nJ/bit to run the transmitter or receiver circuitry and ∈ amp = 120 pJ/bit/m2 for the transmit amplifier to achieve an acceptable Eb/N0 (see Fig. 1 and Table I).


T. Shankar, S. Shanmugavel

III.1. LEACH: Low-Energy Adaptive Clustering Hierarchy The LEACH protocol [1] was the initial optimization technique that improved the performance of network by optimizing the energy. The clustering is dynamic and adaptive in LEACH, where the nodes are elected based on a threshold Eq. (3) before every round of communication. The elected cluster head takes the responsibility to collect the data from it’s member node and it also perform the compression of data. Base station collects the sensor data from all the cluster heads in each round. The cluster head selection is based on probabilistic method. Each node join the nearest cluster head and become the member node to save the energy as the energy consumption is directly proportional to the square of distance. The cluster head creates the TDMA schedule and provide entire bandwidth to one member at a time to send the data to cluster head. While the transmission is done by any member node to cluster head, the other member node wait for their turn as per the time division schedule:

Fig. 1. First Order Radio Model TABLE I PARAMETERS OF FIRST ORDER RADIO MODEL Energy Process dissipation Transmitter Electronics (ETx-elec ) 70nJ/bit Receiver Electronics (ERx-elec ) (ETx-elec = ERx-elec = Eelec ) 120pJ/bit/m2 Transmit Amplifier (∈amp)

( )= 1−

These parameters are slightly better than the current state-of-the-art in radio design. It also assumes an r2 energy loss due to channel transmission. Thus to transmit a k-bit message, a distance d using our radio model, the radio expends: ETx (k,d) = ETx-elec (k) + ETx-amp (k,d) (1) ETx (k,d) = Eelec × k + ∈amp × k × d2 and to receive this message, the radio expands: ERx (k) = ERx-elec (k) (2) ERx (k) = Eelec × k Above Eqs. (1) and (2) describe the cost of communication for transmitter electronics ETx (k,d) and the receiver electronics ERx (k,d) is depended on the number of bits ‘k’ and the distance of communication ‘d’.

III. Energy Optimization Algorithms The sensor nodes rely on each other and send data to base station in cooperative manner. The nodes in WSN may have to behave as router as well hence the clustering algorithm should be dynamic to avoid redundant transmission. The LEACH protocol was the initial method used in wireless sensor networks to optimize energy of the network.


× 0

1

if

∈ (3)

otherwise

Every node determine whether it become cluster or a member node in the beginning of each round ‘r’ based on the percentage ‘p’. Where ‘p’ is the predefined percentage of total cluster head. Each node in the network generates a number between 0 to 1 which is then compared to the threshold T(n) to decide whether the node become cluster head or not. If the generated number by node is less than the threshold, the node gets the chance to become cluster head for the current round. The equation (3) depends on the round ‘r’ and the probability ‘p’. As the round increases the threshold value also increases upto 1/p rounds and then the value of T(n) repeats after each 1/p rounds. G represents the set of nodes which doesn’t get the chance to become cluster head within last 1/p rounds. The main advantage of LEACH protocol is that the proper load sharing can be achieved in this method but the number of cluster head will not necessary to be fixed in each round. This method neither consider the residue energy of the node nor the location and distance while electing the cluster heads, which may lead to early death of nodes that reduce the lifetime of the network. The TDMA schedule is created that gives the better resource utilization but the nodes near the boundary may die early. III.2. Artificial Bee Colony (ABC) Optimization Artificial Bee Colony (ABC) algorithm [8] is a nature inspired optimization algorithm used in wireless sensor networks to find the appropriate cluster head. ABC International Review on Computers and Software, Vol. 8, N. 10

2336


optimization is based on intelligent foraging behavior of honey bee swarm. The main task of this algorithm is to elect cluster head in each round. Unlike the LEACH protocol the selection is based on iterative optimization method. It is widely used in many other application to find the non linear solution for the problems. Here the energy consumption in WSN is based on non linear first order radio model, hence the ABC algorithm is best fit for the cluster head selection problem. The ABC algorithm is implemented by the base station which is assumed to have unlimited power supply. The main difference in LEACH and ABC algorithm is in selection method of cluster head. ABC has better and effective method of clustering than LEACH. The fitness function is used in ABC algorithm to evaluate the eligibility of cluster head for the selection process. The fitness function given by Eq. (4) for each cluster head. The fitness function is defined as the fitness value which is inversely proportional to the energy consumed in each round:

=

·

+

(4)

insect to eat also they use it to attract opposite sex for mating. This behaviors of firefly [10] [11] is used to solve non linear optimization problems. This unique feature of firefly is used in signaling purpose to communicate each other. There are few assumptions made by Yang [8] to solve the optimization problem as follow: [a] All firefly are unisex so that any Firefly can be attracted towards any other Firefly. [b] The attractiveness of Firefly is proportional to the brightness of Firefly. The less brighter Firefly attract towards the more brighter Firefly. [c] If the fireflies have equal brightness then they all attract randomly. The brightness is calculated using the objective function f(x), which is to be optimize. All the fireflies are considered to be dispersed randomly in the search space and they achieve brightness as per the objective function to be optimized. There are two important aspect of the algorithm; (a) evaluate the brightness value (b) Movement of the firefly. Each firefly achieve the relative brightness β(r) which depend on the Eucledian distance ‘r’ between them, which is given below in Eq. (5): ( )=

Above Eq. (4) represent the fitness value for each cluster head, where ‘d’ is the distance from member node to cluster head and i is the index for number of member nodes. Distance from cluster head to base station is represted by variable ‘b’ and q is the radio constant. In the beginning of the round the fixed number of cluster head is chosen randomly and the fitness value is evaluated for them. The next set of cluster heads is chosen again randomly in next optimization, which is then compaired and find the best set of cluster head. Likewise the iterative comparision is done upto the certain optimization untile the maximum cycle reaches. Finally the best set of cluster head is chosen at the end of optimization in the round. All other non cluster head nodes select nearby cluster head to send their data. Once the data is collected by the cluster heads, it performs the data aggregation and send it to the base station or sink node. Each round has certain optimization cycles for comparing and find the best set of cluster heads. It is clear from the Eq. (4) that the distance is optimized for selecting appropriate cluster heads, which is main factor of energy dissipation. The ABC optimization method doesn’t consider the residual energy of the nodes during the selection process of cluster heads, hence there may be the chance of early death of the elected nodes if they are selected repeatedly.

( .

·

)

(5)

where is the attractiveness at r=0, and γ is the light absorption coefficient ranges [0, ∞). The movement of firefly i located at xi towards brighter firefly j located at xj is given as: ( + 1) =

( )+

.

( .

)

(

−

)

(6)

The optimization problem can be solved as per the pseudo code below: Objective function f(x), where x= (x1, x2…xd) Generate the initial population of fireflies xi (i=1,2…n) Determine the light intensity Ii at xi by f (xi) Define the light absorption coefficient γ while (tIi) move firefly i towards j end if vary attractiveness with distance r evaluate new solution and update light intensity end for j

IV.

Firefly Algorithm

end for i

Firefly algorithm is based on behavior of social insect known as “firefly” which produces flash light during night. Firefly produces flash light mainly to attract other Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

rank the fireflies and find the current best end while


2337


V.

(

Cluster Head Selection Using Firefly Algorithm

Cluster head selection is the main problem in optimization technique for wireless sensor networks. The firefly algorithm is used to find the best set of cluster heads to reduce the energy consuption in each round and to increase the lifetime of the network. Cluster head selection using Firefly is based on LEACH protocol, where the clustering is done and the fitness function is evaluated using ABC algorithm. The proposed method for energy optimization differs from ABC and LEACH in the selection method of cluster heads, rest all process of communication remain same. This method considers the distance as well as the residual energy of the nodes while selecting cluster heads. The algorithm is implemented at the base station where the table is maintained to keep the interdistances between all the nodes and the current residual energies. Let us consider n is the total number of nodes distributed randomly in the area under the survillance. For k number of clusters the cluster head selction algorithm is implemented as below: Step 1: Generate the population with randomly selected k number of cluster heads. Create the clusters by assigning non cluster head nodes to the selected cluster heads as per the minimum distances. Step 2: Calculate the fitness function of each cluster head by equation (4) Step 3: Perform the cluster head update according to firefly algorithm. (i) Select the member nodes which is having higher energy as cluster head in each cluster, discard the previous one. (ii) Create the new clusters according to newly elected cluster heads. Step 4: Calculate the fitness of newly elected cluster heads and find out the best cluster heads comparing with step 2. Step 5: Repeat the steps from step 2 to step 5, until the maximum number of cycle is reached.

VI.

)

≥ (

)

(7)

In the proposed method we introduce this criterion while selecting cluster head also the fitness of the cluster head is calculated based on the ABC routing method, as we discussed earlier. The cluster head update is done using firefly algorithm where we select the cluster head which is having higher residual energy within the cluster. The proposed algorithm can be summarized as below. Step 1: Initialize the population with randomly selected k number of cluster heads. Step 2: Create the clusters by assigning non cluster head nodes to the selected cluster heads as per the minimum distances. Step 3: Calculate the fitness function of each cluster head by equation (7) Step 4: Perform the cluster head update. (i) Select the member nodes which is having higher energy as cluster head in each cluster, discard the previous one. (ii) While selecting cluster head consider the criterion mentioned in equation (3.10) (iii) Create the new clusters according to newly elected cluster heads. Step 5: Calculate the fitness of newly elected cluster heads and find out the best cluster heads comparing with step 2. Step 6: Repeat the steps from step 2 to step 6, until the maximum number of cycle is reached. Step 7: Calculate the energy consumption based on radio model described in section II and update the residual energy of each node for the current round with selected cluster heads. Step 8: If first node death occurs then switch to ABC routing method.

Proposed Hybrid Algorithm

Proposed Hybrid algorithm is based on the ABC routing technique with Firefly optimization algorithm. The purpose of creating hybrid is to take benefit of both routing method and produce better result than both the technique. As we discussed in previous sections the cluster head is the node which consume more energy than other member nodes. In order to protect the cluster head from early death we proposed to introduce a criterion which avoids the early death of cluster head. The cluster head should not die out during the communication which leads to the data loss. The current residual energy (ECH)current means the energy left out in the node must be greater than the required energy (ECH)required to complete the task:


All above discussed algorithm is simulated on MATLAB and the results are compared and explained in next section.

VII.

Simulations and Results

The MATLAB simulation of proposed Hybrid routing technique is done as per the initialization parameters discussed in Table II and compared with all algorithm discussed earlier. The sensor nodes are distributed in the area of 100×150 as shown in Fig. 2 and the base station is located at (50,150). All the sensor nodes are considered in the range of communication and they are TDMA synchronized within the clusters. The probability of nodes being cluster head is taken p=10%.


2338


Fig. 3. Death of node Fig. 2. Random distribution of wireless sensor network

For the purpose of simulation the discrete model is assumed where each node is sending fixed amount of data, 4096 bits in each round. There are 100 sensor nodes considered for simulation. Fig. 3 shows the death of nodes for Hybrid routing technique with all other protocols. The occurrence of first node death (FND) is found little early in proposed hybrid routing technique in comparison with proposed Firefly routing but as the round goes on the death of nodes slow down in Hybrid algorithm. Fig. 4 shows the comparison of residual energy of all algorithms. The residual energy is found better in Hybrid routing method as compared to firefly algorithm (FA) and other protocols as well. Fig. 5 shows the throughput of network where proposed Hybrid routing method is dominant as compared to all other routing techniques.

Fig. 4. Residual energy of the network

VIII. Bar Graph Analysis The first node death (FND) and last node death (LND) are two important parameter to analyze the performance of wireless sensor network. TABLE II NETWORK PARAMETERS Parameter Value Network Area 100 x 150 m2 Base station location (50,150) m Total number of nodes 100 Percentage of cluster heads 10% Initial energy of each node 0.5 Joul Data frame size 4096 bits Eelec 70nJ/bit Eamp 120pJ/bit/m2 EDA (data aggregation energy) 5nJ

Fig. 5. Throughput of the network

In many applications the life time of network is considered as the beginning of communication till the FND and many few other application consider from the beging till the LND. Fig 6 shows the first node death comparison of all the algorithms simulated where direct transmission is not significant as compared to other cluster based routing techniques. Firefly algorithm is found to have higher first node death, where FND occurs at around 350th round. The last node death (LND) comparison is shown below in Fig. 7. The proposed hybrid algorithm for routing in WSN is found to be last long whose last node dies at 542nd round. ABC algorith also last long whose last node dies at 512th round.

TABLE III FIRST NODE DEATH & LAST NODE DEATH FND LND Algorithms (in rounds) (in rounds) Direct 43 320 Transmission LEACH Protocol 256 490 ABC Routing 210 512 Firefly Routing 351 440 Proposed Hybrid 317 542



2339


[5]

[6]

[7] Fig. 6. First Node Death for various algorithm

[8]

[9]

[10] [11]

[12]

[13] Fig. 7. Last Node Death for various algorithm

IX.

Conclusion

LEACH was the first protocol where clustering is done in wireless sensor networks which shows great improvement over conventional approach like direct transmission technique. The main cause of early death of node is the distance of communication between nodes and base station; hence the election of cluster head is needed. Optimal selection of cluster head nodes is done, which help to avoid early death of nodes and increasing the lifetime of the WSN. The hybrid optimization technique is simulated by taking the advantage of both, ABC and Firefly optimization techniques. The simulated result shows that the proposed technique gives maximum life time, better residual energy and throughput for the wireless sensor network.

[14]

References

[18]

[1]

[2]

[3]

[4]

Wendi Rabiner Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan, “Energy-Efficient Communication Protocol forWireless Microsensor Networks”, Proceedings of the 33rd Hawaii International Conference on System Sciences – 2000, 2000 IEEE. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, “Wireless sensor networks: a survey”, Computer Networks 38 (2002) 393–422, Elsevier 2002. Ossama Younis, Marwan Krunz, and Srinivasan Ramasubramanian, “Node Clustering in Wireless Sensor Networks: Recent Developments and Deployment Challenges”, 2006 IEEE. Wen-ya Zhang, Zi-ze Liang, Zeng-guang Hou and Min Tan, “A Power Efficient Routing Protocol for Wireless Sensor Network”,


[15]

[16]

[17]

Proceedings of the 2007 IEEE International Conference, Networking, Sensing and Control, London, UK, 15-17 April 2007. Bhaskar P. Deosarkar, Narendra Singh Yadav and R.P. Yadav, “Cluster head Selection in Clustering Algorithms for Wireless Sensor Networks: A Survey’’, IEEE, Proceedings of the 2008 International Conference on Computing, Communication and Networking (ICCCN 2008). Md. Solaiman Ali, Tanay Dey, and Rahul Biswas, “ALEACH: Advanced LEACH Routing Protocol for Wireless Microsensor Networks”, 5th International Conference on Electrical and Computer Engineering, 2008 IEEE. Xin-She Yang, ”Nature-Inspired Metaheuristic Algorithms”, Second Edition, 2010 by Luniver Press. ISBN-13: 978-1-90598628-6, ISBN-10: 1-905986-28-9. Dervis Karaboga, Selcuk Okdem, and Celal Ozturk, “Cluster Based Wireless Sensor Network Routings using Artificial Bee Colony Algorithm”, 2010 IEEE. Qing Bian, Yan Zhang, Yanjuan Zhao, “Research on Clustering Routing Algorithms in Wireless Sensor Networks”, 2010 International Conference on Intelligent Computation Technology and Automation, 2010 IEEE. J. Senthilnath, S.N. Omkar, V. Mani “Clustering using firefly algorithm: Performance study”, 2011 Elsevier. M. H. Sulaiman, M. W. Mustafa, Z. N. Zakaria, O. Aliman, S. R. Abdul Rahim, “Firefly Algorithm Technique for Solving Economic Dispatch Problem”, 2012 IEEE International Power Engineering and Optimization Conference (PEOCO2012), Melaka, Malaysia: 6-7 June 2012. Dervis Karaboga, Selcuk Okdem, Celal Ozturk , “Cluster based wireless sensor network routing using artificial bee colony algorithm”, Wireless Netw, Springer, April 2012. A.Karthikeyan, Arifa Anwar, Rasiya nwar,T.Shankar,V.Srividhya, Selection of cluster Head Using Decentralized Clustering Algorithm for Energy Optimization in Wireless Sensor Networks Based on Social Insect Colonies European Journal of Scientific Research Vol.99 April 2013, PP461-472, T.Shankar, Dr.S.Shanmugavel Hybrid Approach for Energy Optimization in cluster based wireless sensor networks using Energy balancing clustering protocol in the Journal of Theoretical and Applied Information Technology(JTAIT) “ 31st March 2013. Vol. 49 No.3 .pages 906-921, ISSN: 1992-8645 Shankar, T., Shanmugavel, S., Karthikeyan, A., Mohan Gupte, A., Sarkar, S., Load balancing and optimization of network lifetime by use of double cluster head clustering algorithm and its comparison with various extended LEACH versions, (2013) International Review on Computers and Software (IRECOS), 8 (3), pp. 795-803. T.Shankar, S.Shanmugavel, A.Karthikeyan, R.Dhanabal, Selection of Cluster Head using Neural Network in wireless Sensor Network” in the European Journal of Scientific Research Vol.83(2012), PP320-373, ISSN 1450202X. Said Ben Alla, Abdellah Ezzati, A Qos-Guaranteed Coverage and Connectivity Preservation Routing Protocol for Heterogeneous Wireless Sensor Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 363-371. Reza Mohammadi, Reza Javidan, Adaptive Quiet Time Underwater Wireless MAC: AQT-UWMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (4), pp. 236-243.

Authors’ information 1

Assistant Professor (Sr.), School of Electronics Engineering, VIT University, Vellore. 2

Professor, Department of ECE, College of Engineering, Guindy, Anna University, Chennai. E-mail: [email protected]


2340


T. Shankar received the B.E. degree in Electronics and Communication Engineering from University of Madras,Tamil Nadu,India in 1999, and the M.E Applied Electronics from College of Engineering Guindy, Anna University Chennai, Tamil Nadu, India in 2005 and Ph.D doing in Anna university, Chennai, Tamil Nadu India. His research interests are in the area of mobile ad-hoc networks, software router design and systems security. Currently he is an Assistant professor(SG) in teaching field. He is a Life member in ISTE (Indian Society for Technical Education Dr. S. Shanmugavel graduated from Madras Institute of Technology in electronics and communication engineering in 1978. He obtained his Ph.D. degree in the area of coded communication and spread spectrum techniques from the Indian Institute of Technology (IIT), Kharagpur, in 1989. He joined the faculty of the Department of Electronics and Communication Engineering at IIT, Kharagpur, as a Lecturer in 1987 and became an Assistant Professor in 1991. Presently, he is a Professor in the Department of Electronics and Communication Engineering, College of Engineering, Anna University, Chennai, India. He has published more than 68 research papers in national and international conferences and 15 research papers in journals. He has been awarded the IETE-CDIL Award in September 2000 for his research paper. His areas of interest include mobile ad hoc networks, ATM networks, and CDMA engineering.



2341


Distributed Relay Node Selection and Assignment Technique for Cooperative Wireless Networks S. Sadasivam, G. Athisha Abstract – In cooperative wireless networks, the relay selection and communication greatly depends on three factors namely Signal to Noise Ratio (SNR), channel capacity and bandwidth. Most of the existing works have employed the solution considering any one of the above issues. However, handling three factors at a hand will offer an efficient solution for relay selection and communication. Further, many works have considered either relay node selection or relay node assignment and not both at the same time. In order to provide meet these requirements, in this paper, we put forward a distributed relay node selection and assignment technique for cooperative wireless networks. The technique selects a set of reliable relays considering SNR, channel capacity and available bandwidth. Once, reliable relays are established, cooperative relay (CR) is assigned using Capacity-flow-ratio (CFR), which is the ratio of node capacity to overlapping sessions. Relay nodes with minimum CFR are assigned initially. Further, we exploit Decode-and-Forward (DF) method as a channel coding scheme. The proposed technique is simulated in MAT lab. Simulation results demonstrate the efficiency of our technique. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Node Selection, Cooperative Wireless Networks, Cooperative Relay, CFR, DF

Therefore, it has more applications ranging from mobile communication to satellite communication. Owing to limitations in size and hardware implementation of future networks, the cellular and sensor networks may not able to hold up multiple antennas. To prevail over these limitations, user cooperation diversity or distributed spatial diversity has been put forwarded. By using cooperative transmission mode, the signal of each user is transmitted by other nodes in different paths. This subsequently achieves spatial diversity. Extended coverage with reduced transmission power is made possible with cooperative communications. Because of these advancements, this technique is implemented in few wireless standards namely WiMAX 802.16m [1] [2]. A great trade-off between code rates and transmit power is achieved in cooperative communication. Transmission of power to both users from the baseline is lessened due to diversity. The assignment of relay nodes to the source assists it to forward data to the destination and therefore it forms a virtual antenna array [3] [4].

Nomenclature Prq p q,  U U(x)  Chi AvB TP C DT Cc RN CFR

Bit error probability Number of neighboring nodes with minimum distance Minimum distance Signal to Noise Ratio (SNR) Average utilization Immediate available bandwidth of a link Averaging time scale variable Total capacity of hop hi Available bandwidth Average transmission power of all relay nodes Channel coefficient Total amount of data to transmit Channel Capacity Relay node Capacity-flow-ratio

I. I.1.

Introduction

Cooperative Wireless Networks

In present day scenario of wireless network, diversity has become a prominent technology to protect the communication channel [1]-[18]. Typically, spatial diversity is made possible through multiple input and multiple output (MIMO) techniques. In MIMO, multiple antennas are deployed at the transmitter and/or the receiver. More wireless standards have incorporated MIMO technique.


2342

I.2.

Cooperative Relay Communication

In networks such as relay-enhanced cellular systems and ad hoc networks, cooperative relaying is well thought-out as unavailable way to accomplish performance improvement. The relay channel is the fundamental unit of cooperative relaying technique [15][16]. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

S. Sadasivam, G. Athisha

In this, a message is forwarded to the destination by the source node and a third node called relay node overhears the transmission and relays the message to the destination. Finally, the destination makes use of both the received messages to enhance the process of decoding. Further, spatial diversity is accomplished through cooperative relaying without the need of multiple antennas. This technique combats signal fading effectively. The maximum attainable throughput through cooperative relay channel is higher than both direct transmission (source-destination) and non-cooperative source-relay-destination communication [5] [6]. I.3.

Need for Cooperative Relaying in Wireless Networks

Two or more nodes in a cooperative network distribute their information and forward together as a virtual antenna array to accomplish cooperative diversity. Through this, the network attains higher data rates as compare with individual transmission. Here, the cooperative diversity is carried out by a node tuning into other node’s transmitted signal and processing the information they overheard. By maintaining time sharing among the cooperative nodes, every node transmits and receives on various channels [7]. I.4.

Issues of Cooperative Relaying in Wireless Communication Networks

 Owing to super positioning of multiple rejected and refracted copies of an incoming signal that comes from various directions, the signal at the transmitter experiences channel fading.  In wireless channel, inherent unreliability is brought about by fading and consequently it leads to limitation in increasing data rates [8] [16].  Because of size and cost limitations of few wireless applications, implementation of multiple transmit and/or receive antennas become impossible.  Half-duplex signaling is the key cause for a loss in spectral efficiency in a one-way relay network [9].  In cooperative communication, making decision of assigning relay nodes to each user session either for Cooperative Communication (CC) or as a multi-hop relay is a daunting task [10].  Some major issues of cooperative communication are coupling problem of multi-hop flow routing and relay node assignment [10].  Achieving cooperative diversity by assigning the strategically select relays in cooperative communications is a demanding issue [11]. I.5.

Problem Identification

The general issues that occur in the relay selection and communication of Cooperative Wireless Network depends on the SNR range, Channel coding and Bandwidth. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

The above issues are separately resolved to obtain the efficient performance. In [6], SNR metric used to select relay nodes in cooperative wireless network. But it failed to consider channel capacity and bandwidth efficiency parameters. In [12], though the relay node selection is performed based on channel capacity and bandwidth, SNR is not considered. Moreover there is no relay node assignment. In [10], the capacity-flow-ratio (CFR) the assignment of cooperative relay (CRs) start with the hop with the minimum CFR and works in increasing order. But this method doesn’t consider the SNR, channel capacity and bandwidth metrics for relay selection. To overcome the above drawbacks, in this paper, we propose a solution for relay node selection and assignment in Cooperative Wireless Networks which considers all the parameters. The paper is organized as follows, related work is given in section-2, section-3 includes the proposed solution, section-4 presents simulation results and section-5 concludes with conclusion.

II.

Related Work

Ahmed S. Ibrahim et al. [4] have presented a novel multi-node relay selection decode-and-forward cooperative scenario. Their approach has used the partial Channel State Information (CSI) that is available at the source and the relays. Achieving higher bandwidth efficiency and assuring full diversity order are the main objectives of their work. Relay with maximum instantaneous scaled harmonic mean function of its source-relay and relaydestination channel gains among the N helping relays is termed as optimal relay. For the symmetric scenario, an approximate expression of the achievable bandwidth efficiency, which decreases with increasing the number of employed relays. A threshold based relay selection protocol for two hops, multi-relay cooperative communication is proposed in [6] by Furuzan Atay Onat et al. Their protocol necessitates minimal information of signal to noise ratio (SNR’s) at source-relay links. The threshold value is selected such that it increases logarithmically and linearly with SNRs and number of relays respectively. Full diversity is accomplished in their protocol with the help of the threshold. Sushant Sharma et al. [10] have proposed an efficient solution procedure with the assistance of branch-and-cut framework along with several novel components to speed up the computation. They have also introduced a Feasible Solution Construction (FSC) algorithm. Their FSC algorithm is a local search algorithm and it establishes a feasible solution. Further, FSC is an efficient polynomial time algorithm and it refers the solution construction process in thress phases. They are Path Determination, Cooperative Relay (CR) Assignment, and Flow Recalculation. Yifan Li et al. [11] have put forwarded a dynamic relay selection scheme. It considers user’s mobility into account.


2343


Further, their model has considered energy consumption during relying with respect to the cost associated with each relay, and the user has to pay the selected relays for requiring cooperative transmission. The constrained Markov decision process (CMDP) have been formulated and rectified by the LP technique to attain the optimal policy for relay selection and to lessen the average cost while satisfying the long-term QoS requirement. Seunghoon Nam et al [12] have introduced two selection methods namely Best expectation and Best-m. The former method adaptively chooses the relays and the later selects an optimally pre-determined number of relays. Their methods are implemented with a simple and optimal algorithm. Additionally, they have provided closed-form, analytical approximations of their algorithms’ performance, which help simplify the process of finding the optimal number of cooperating relays. Beibei Wang et al [13] have proposed a distributed buyer/seller game theoretic framework over multi-user cooperative communication networks. Their main objective is to stimulate cooperation and enhance the system performance. By employing a two-level game to jointly consider the benefits of source nodes as buyers and relay nodes as sellers. Their proposed approach not only helps the source smartly find the relays at relatively better locations and buy optimal amount of power from them, but also helps the competing relays maximize their own utilities by asking the reasonable prices. The game is proved to converge to a unique optimal equilibrium.

The proposed cooperative network comprises of the source node S, the destination node D and a set of relay nodes  . The relay nodes are marked as R1, R2… Rn   . The network model is given in Fig. 1.

Fig. 1. Cooperative Wireless Networks

III.3. Computation of Metrics III.3.1. SNR Estimation Let us assume that all links that connect the source and the destination nodes go through self-determining Rayleigh fading. By taking account of typical modulation scheme, the bit error probability can be described as follows, [6]:

III. Proposed Solution

Prq    q erfc

III.1. Overview In this paper, we put forward a distributed relay node selection and assignment technique for cooperative wireless networks. The technique uses two relay selection schemes namely SNR based relay node selection scheme and optimal relay selection scheme. The first scheme selects a set of reliable relays considering SNR. Similarly, the later relay selection scheme chooses a set of reliable relays based on channel capacity and available bandwidth. Finally, common relays from both the schemes are elected as reliable relays. As soon as the reliable relays are established, cooperative relays are assigned through Capacity-flowratio (CFR), which is the ratio of node capacity to overlapping sessions. The nodes are sorted in nondecreasing order of CFR. Reliable nodes are marked with a binary variable R-R (Reliable Relay). The value 1 denotes the relay is reliable and 0 represents the unreliability. Relay nodes with minimum CFR are assigned initially. Further, we exploit Decode-andForward (DF) method as a channel coding scheme. III.2. Network Model





p , where (p,q > 0)

(1)

here, p is the total number of neighboring nodes with minimum distance and q is contingent on the minimum distance in the group. Selection of p and q approximates the bit error probability of many modulation schemes in practical point of view. For Binary Phase Shift Keying (BPSK) modulation scheme chooses the value of (q,p) as (0.5,1), which gives the approximate value of BER and in the same way, Mary Phase Shift Keying (M-PSK) method takes (q,p) as

1 / log

2

 M  ,log 2  M  sin 2  / M  

is exploited to

obtain approximate BER value. Thus, the average bit error probability in accordance with Rayleigh fading is given as:

 

Prq     qerfc 



 p   p,   1    1  p   



(2)

Let S-Ri and Ri-D be the links that connect source and destination S-D. Then the SNR’s of S-Ri, Ri-D and S-D links are represented as  SR,i ,  RD,i and  SD .

Consider a wireless network with N +2 terminals.

To make simpler the proposed technique, in this paper we suppose that all the relay nodes possess the similar



2344


average SNR’s to the source and to the destination. Thus, it can be given as:

where, DT is the total amount of data the source has to transmit to the destination, CC  SR,i  denotes the value

 SR,i   SR , For i  1, 2 ,..., 

(3)

of channel capacity between the source and relay i and CC  SGD  represents the channel capacities among the

 RD,i   RD , For i  1, 2,..., 

(4)

Therefore, the SNR value of links are represented as  SD ,  SR and  RD .

source, the group of selected relay and the destination. The group of selected relay nodes (cooperative group) is symbolized as G. Thus:

 TP | CSR,i |2 CC  SR,i   log 2  1   2 

III.3.2. Available Bandwidth Calculation Bandwidth is a time varying variable, it depends also on the traffic load at the link. To obtain the available bandwidth value of a link, we first calculate its average utilization over a time period ( T   ,T ) as, [14]: T

U T   ,T  

1 U  x  dx  T 



(5)

In the above equation, U(x) denotes the immediate available bandwidth of a link and  is the averaging time scale variable of available bandwidth. Let Chi be the total capacity of hop hi and Uhi stands for average utilization of hop hi, then, the available bandwidth AvBi can be derived as: AvBi  1  U hi  Chi (6) Thus, the available bandwidth of end-to-end path can be written as: AvB  min AvBi (7) i 1,2...H

where H is the total number of hops along the path. III.3.3. Computation of Channel Capacity In cooperative communication, transmission is accomplished by the source and relay node that cooperate to forward data to the destination. Assume that TP as the average transmission power of all relay nodes. Let CSD ,

CSR,i and CRD,i ( i   ) as the channel coefficients considering flat-fading from the source to the destination. The defined channel coefficients confine the effects of path loss and Rayleigh fading. The proposed technique supposing that the source avails instant Channel State Information (CSI) of channel coefficients. Typically, the cooperative transmission comprises of two phases namely, listening phase and cooperating phase. In view of this, the total transmission time of cooperative transmission is given as, [12]:

TTotal  Tlisten  Tcoop

(8)

DT DT  mini CC  SR,i  AvBi CC  SGD  AvBi

(9)


   

(10)

 TP C  CC  SGD   log 2 1  2 CSD,G CSD,G    

(11)

here, CSD,G represents the channel matrix coefficient between cooperative group G and the destination. It can be denoted as:

CSD,G  CSD Ci,D Ci 1,D where: i,i  1...i  k  G

Ci  2 ,D  Ci  k ,D 

T

(12)

III.4. Relay Node Selection III.4.1. SNR Based Relay Node Selection Scheme As we described in section (3.3.3), the cooperative communication consists of two phases namely listening phase and cooperating phase. During listening phase, the source transmits and the relay, destination nodes listens. In the second phase, each relay node Ri   measures SNR as given in section 3.1 and compares against the threshold value ThSNR. Relay nodes that have SNR greater than ThSNR are termed as reliable relays (Rrelays). Every R-relay node intimates the destination by forwarding an ACK (acknowledgement) message. Let RN be the total number of R-relays. While receiving the ACK, the destination selects the relay considering SNR of relays and the source to the destination. That is,  SD and

 RD,1 , RD,2 ,..., RD,RN . Finally, the destination

chooses the relay path with high SNR value. The relays selected in this scheme are noted as RN-I relays. III.4.2.

Optimal Relay Selection Scheme

The optimal relay selection scheme exploits the Best expectation method given in [12]. The relay selection scheme chooses the relays that minimizes the total transmission time (Eq. (8)). Here the relays are selected optimally considering the distribution of Rayleigh fading. Let G is the set of optimal relays. The optimal relay selection scheme decides on relays as a function of channel capacity and bandwidth between the source and the relays. Consequently, the relay relies on channels between the source and the relays.


2345


The relay selection scheme is described as follows:

    1 1 G   arg min  E   (13)  G  miniG CC  SR,i   CC  SGD     In the set of cooperative relays (G), if the relay terminal, say x reduces the capacity CC  SR,x  of source to relay, then the channel capacity equation can be given as:





CC SR,x  min CC  SR,x 

(14)

as:

  1 1    G*  arg min    (15) G  C SR,x  G   CC  SGD     C 





In the above equation, the term x  G  symbolizes that

x is the function of G. Then the channel capacity value given in Eq. (11) can be modified as:

 TP  2 CC  SGD   log 2 1  2  CSD  CRD,x   xG  



2

   (16) 

The relay terminal x is added into the group G and assumes it lessens the source-to-relay capacity



CC SR,x . When another relay terminal x  1 can be added into the cooperative group on condition that









CC SR,x  1  CC SR,x . This scheme contains the optimal relays in G* , which have high source-to-relay channel capacities of relay set  . Thus, the total number optimal relays vary according to detailed network realization. The relays selected in this scheme are noted as RN-2 relays. The algorithm for optimal relay selection scheme is given in Algorithm-1. Algorithm-1 1. Consider CSR,x as the channel coefficient, where x  2. Let x1 ,x2 ,...,xn as relay terminals   3. Let G be the cooperative transmission group, where G  4. Arrange CSR,x according to decreasing value on condition

that

CSx1  CSx 2  ...  CSxn

where

xn  

5. Assign TTotal = ∞ and x = 1 6. Include the terminals into Gi 7. Tx

is

estimated

x

Total

8.1.2 The corresponding relay is added into G 8.2 End if 9. x=x+1 10. Steps (6) to (9) are iterated until the algorithm reaches the last node III.4.3.

Cooperative Relay Node Selection

xG

Subsequently, relay selection scheme can be rewritten



8. Transmission time of Tx is compared 8.1 If (Tx < TTotal ) then 8.1.1 T T

as

1 1   E , CC  Sxi   CC  SGD  

CC  SGD  is computed as per equation (16)

Cooperative relay node selection phase combines the both schemes namely SNR based relay node selection scheme and optimal relay selection scheme. In SNR based relay selection scheme, relay nodes are selected considering SNR as a main metric. The relays selected in this scheme are noted as RN-1 relays. On the other hand, optimal relay selection scheme chooses the relays taking into account channel capacity and available bandwidth metrics. Here, the selected relays are termed as RN-2 relays. Cooperative relay node selection phase selects the common relays in RN-1 and RN-2. It can be represented as:

RN  RN 1  RN  2

(17)

Thus, the selected relays will possess high channel capacity, SNR and available bandwidth values. III.5. Relay Node Assignment When the relay node selection is established, the cooperative relay (CR) node assignment for a node is accomplished using the Capacity-flow-ratio (CFR) [10]. Here, the term CFR of a hop denotes the ratio of hop capacity to the total number of overlapping sessions. It can be given as: C H  CFR  C i (18) OS where, CC  H i  denotes the capacity of hop i and OS is the total number overlapping sessions during listening phase. The overlapping session indicates overlapping of path i with the path of another session. Nodes in the path such as an intermediate node or destination node could cause overlapping with another path. OS is obtained through widest-pipe approach of [10] Relays that have been selected as reliable are marked with variable called R-R (reliable relay binary variable). This is a binary variable that takes two values namely 0 and 1. The value 1 denotes the relay is reliable and 0 represents unreliability of the relay:

1 Relay is a reliable relay R  Ri   0 Otherwise

(19)

where R-Ri   .



2346


Now the selected reliable nodes are sorted in the nondecreasing order of CFR. The node with minimum CFR value is assigned CR first. Gradually, the CR assignment goes to the maximum value of CFR. This process of CR assigned is continued until the scheme reaches the last node of reliable node.

destination and in the second phase at the destination the SNR value is calculated and from that the higher SNR value of the relay is selected) and 3) Best expectation method (in this the relays are selected based on the capacity and bandwidth) plus TRSC. TABLE I SHOWS THE PARAMETERS USED IN THE SYSTEM MODEL No Parameter Values 1 Modulation 2 2 Channel length 400000 3 frames 20 4 Frame length 64,256 5 SNR 0:2:30 6 relaying schemes decode-and-forward (DF) relay cooperative network 7 Number of relays 2,4

III.6. Channel Coding In this paper, we exploit Decode-and-Forward (DF) technique as channel coding scheme. According to this scheme, initially the relay node Ri decodes the data to be transmitted and then forwards the computed data to the destination D. The DF technique is illustrated in Fig. 2. 0

10

TRSC+BEST EXPECTATION METHOD Direct TRSC

-1

10

-2

BER

10

-3

10

-4

10

Fig. 2. Decode and Forward (DF) technique -5

10

0

5

10

By using DF, the attainable rate can be given as, [10]:

ARDF  S ,R,D   AvB  I DF  S ,R,D 

15 SNR

20

25

30

Fig. 3. SNR Vs BER when N=2 for direct, TRSC and TRSC plus Best expectation method

(20) 0

10

Here, I DF  S ,R,D  denotes the maximum average

Direct TRSC TRSC+BEST EXPECTATION METHOD

-1

10

mutual information required for repetition-coded decode and forward coding scheme. It takes the value as:

-2

10

-3

BER

10

log 2 1  SNRSR  ,  I DF  S ,R,D   min   (21) log 2 1  SNRSD  SNRRD  

-4

10

-5

10

-6

10

IV. Simulation Results

-7

10

The proposed solution is implemented in MATLAB. In the cooperative wireless network we implement the relay selection method based on the 1) Threshold based Relay Selection Cooperation [6] and 2) Best Expectation method [12]. By using these two methods the relay is node is selected by which the source transmits its data to the destination. So that the error rate is reduced and gives better output performance. The following Table I represents the parameters that are used in the cooperative wireless networks. In Fig. 3 we showed the comparison graph for the 1) direct transmission without the relay selection, 2) threshold based Relay selection Cooperation (TRSC) in which the relay is selected based on two phases (in the first phase, at the relay the received SNR is calculated and based on the threshold value the reliable relays are selected and data’s are transmitted to the Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

-8

10

0

5

10

15 SNR

20

25

30

Fig. 4. SNR Vs BER when N=4 direct, TRSC and TRSC plus Best expectation method

So when we implement based on these two concepts (TRSC and BEM) and select the relays, it gives the good performance when compared to the previous method. In simulation work is done by the MATLAB 7.12 version which is a high level technical computing language. First we consider the relays as N=2 for the Cooperative Wireless Networks and based on the SNR, capacity and bandwidth we selected the relays and based on that selected relays the data is transmitted. The above graph shows when the signal to noise ratio increases the bit error rate decreases. Fig. 4 show that when we International Review on Computers and Software, Vol. 8, N. 10

2347


increase the number of relays into 4 we get higher performance than the number of relays as 2. Bit error rate is reduced near 15 db at 10^-7 when N=4 whereas 15 db at 10^-4 when N=2.

V.

[12]

[13]

Conclusion

In this paper, we have proposed a distributed relay node selection and assignment technique for cooperative wireless networks. The technique uses two relay selection schemes namely SNR based relay node selection scheme and optimal relay selection scheme. The first scheme selects a set of reliable relays considering SNR. Similarly, the later relay selection scheme chooses a set of reliable relays based on channel capacity and available bandwidth. Finally, common relays from both the schemes are elected as reliable relays. Once, reliable relays are established, cooperative relay (CR) is assigned using Capacity-flow-ratio (CFR), which is the ratio of node capacity to overlapping sessions. Relay nodes with minimum CFR are assigned initially. Further, we exploit Decode-and-Forward (DF) method as a channel coding scheme. The proposed technique is simulated in MAT lab. Simulation results have demonstrated the efficiency of our technique.

References [1]

Yonghui Li, “Distributed Coding for Cooperative Wireless Networks: An Overview and Recent Advances”, IEEE Communications Magazine • August 2009. [2] Marjan Baghaie, and Bhaskar Krishnamachari, “Delay Constrained Minimum Energy Broadcast in Cooperative Wireless Networks”, INFOCOM, 2011 Proceedings IEEE [3] Aria Nosratinia and Todd E. Hunter, “Cooperative Communication in Wireless Networks”, IEEE Communications Magazine • October 2004. [4] Ahmed S. Ibrahim, Ahmed K. Sadek, Weifeng Su and K. J. Ray Liu, “Cooperative Communications with Relay-Selection: When to Cooperate and Whom to Cooperate With?” IEEE Transactions on Wireless Communications, vol. 7, no. 7, July 2008. [5] Helmut Adam, Christian Bettstetter and Sidi Mohammed Senouci, “Adaptive Relay Selection in Cooperative Wireless Networks”, IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, Sept 15, 2008. [6] Furuzan Atay Onat, Yijia Fan, Halim Yanikomeroglu and H. Vincent Poor, “Threshold Based Relay Selection in Cooperative Wireless Networks”, Global Telecommunications Conference, 2008 IEEE GLOBECOM 2008, IEEE [7] Andrej Stefanov and Elza Erkip, “Cooperative Coding for Wireless Networks”, IEEE Transactions on Communications, vol. 52, no. 9, September 2004 [8] Aggelos Bletsas, Andrew Lippman and David P. Reed, “A Simple Distributed Method for Relay Selection in Cooperative Diversity Wireless Networks, based on Reciprocity and Channel Measurements”, Proceedings of IEEE Vehicular Technology Conference, 2005. [9] Ha X. Nguyen, Ha H. Nguyen and Tho Le-Ngoc, “Diversity Analysis of Relay Selection Schemes for Two-Way Wireless Relay Networks”, WIREL PERS COMMUNICATION, vol. 59, no. 2, pp. 173-189, 2011. [10] Sushant Sharma, Yi Shi, Y. Thomas Hou, Hanif D. Sherali and Sastry Kompella, “Cooperative Communications in Multi-hop Wireless Networks: Joint Flow Routing and Relay Node Assignment”, INFOCOM'10 Proceedings of the 29th conference on Information communications Pages 2016-2024, 2010. [11] Yifan Li, Ping Wang, Dusit Niyato and Weihua Zhuang, “A


[14]

[15]

[16]

[17]

[18]

Dynamic Relay Selection Scheme for Mobile Users in Wireless Relay Networks”, 2011 Proceedings IEEE INFOCOM (April 2011), pg. 256-260. Seunghoon Nam, Mai Vu, and Vahid Tarokh, “Relay Selection Methods for Wireless Cooperative Communications”, Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on 21 March 2008. Beibei Wang, Zhu Han, and K. J. Ray Liu, “Distributed Relay Selection and Power Control for Multiuser Cooperative Communication Networks Using Buyer/Seller Game”, IEEE INFOCOM 2007 proceedings. R. S. Prasad, M. Murray, C. Dovrolis and K. Claffy, “Bandwidth estimation: metrics, measurement techniques, and tools”, IEEE Network, pp- 27 - 35, 2003. Zhong, X., Zhou, B., An inter-cluster cooperative nodes selection scheme based on blind channel estimation, (2011) International Review on Computers and Software (IRECOS), 6 (6), pp. 960964. Zhong, X., Blind channel estimation of relaying cooperative communication in IoT systems, (2012) International Review on Computers and Software (IRECOS), 7 (1), pp. 450-455. Seyed Mohammad-Sajad Sadough, Ardalan Alizadeh, Optimal Beamforming for Spectrum Leasing in Cognitive Radio Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (5), pp. 307-311. Radosveta Sokullu, Engin Karatepe, Dual Packet Selection for Improved Bluetooth Performanc, (mber) International Journal on Communications Antenna and Propagation (IRECAP), 1 (6), pp. 488-494.

Authors’ information S. Sadasivam obtained his Bachelor’s of AMIE in Electronics and Communication Engineering from Institute of Engineers(India) Calcutta. Then he obtained his Master’s degree in Computer and Information Technology from Manonmanium Sundaranar University Thirunelveli of Tamilnadu, India. Currently, he is an Associate Professor at the Department of Information Technology of Sethu Institute of Technology, Pulloor, Kariapatti, VirudhuNagar District, TamilNadu, India. His specializations include wireless networks, networking, and Sensor Networks. Dr. G. Athisha received the B.E. degree in Electronics and Communication Engineering from the P.S.N.A.College of Engineering and Technology, Dindigul of Madurai Kamaraj University of Madurai,Tamilnadu,India, in 1997, the M.E. degree in Applied Electronics from the Coimbatore Institute of Technology, Coimbatore from the Bharathiar University,Coimbatore of TamilNadu,India, in 1998, and the Ph.D. degree in Information and Communication Engineering from the Anna University, Chennai of TamilNadu, India, in 2006. She is currently a Professor and Head in the Department of Electronics and Communication Engineering at P.S.N.A.College of Engineering and Technology . Her research interests are in Intellectual Property Protection of Electronic Products, Nanotechnology based QCA design, Reconfigurable/Self – evolving Architectures, Digital Information Content Protection, Network Processors and their Applications, Security using RF ID tags, Development of cryptoprocessor modules for specific applications, Trends and Challenges in Optical Networking, Security in adhoc networks, Watermarking for content protection.Routing Security in Wireless networks.Network Security Methodologies. She received the “Young Engineers Award” in recognition of contributions in the field of Electronics and Telecommunication Engineering on the occasion of The Twenty – fourth National Convention of Electronics and Telecommunication Engineers, held at Jharkand State Centre, Ranchi held on October 18 – 19, 2008 from The Institution of Engineers (India).


2348


Improving Network Life Time of Wireless Sensor Network Using LT Codes Under Erasure Environment V. Nithya, B. Ramachandran Abstract – In recent years, Wireless Sensor Network (WSN) finds application in Multimedia Broadcast and Multicast Service (MBMS). Reliability is the prime concern as some of the transmitted information gets lost in the erasure channel. As the application involves broadcast or multicast delivery of multimedia data through an erasure channel, Automatic Repeat request (ARQ) will not be good choice for improving the reliability as it leads to more energy wastage. Therefore, in this paper, we investigate the use of forward error correction scheme for energy conservation in WSN under erasure channel conditions. We found that the use of Luby Transform (LT) codes for IEEE 802. 15. 4 compliant WSN conserves more energy when compared to retransmission strategy. Other performance measures such as Bit Error Rate (BER) and throughput also showed improved results. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Bit Error Rate, Energy Conservation, Erasure Channel, Luby Transform Codes, Reliability, Wireless Sensor Network

I.

Introduction

Recent developments in wireless communication technologies such as Bluetooth and Zigbee have led to great interest in Wireless Sensor Networks (WSN). Sensor nodes are constructed only by using sensor devices with wireless communication facilities [1]-[21]. Energy conservation is one of the most important issues in WSN, where nodes are likely to rely on limited battery power. Multi hop communication of WSN necessitates error control schemes to achieve reliable data transmission. Automatic Repeat request (ARQ) and Forward Error Correction (FEC) are the key error control schemes to achieve reliable data transmission [2]. IEEE 802. 15. 4 standard, targeting low-power low-rate radios, does not provide any advanced error-control mechanism. Instead, it combines error detection by Cyclic Redundancy Check (CRC) with ARQ. Another approach to improve transmission reliability is FEC. The maximum fractions of errors or of missing bits that can be corrected are determined by the design of the FEC code, so different FEC codes are suitable for different conditions [3]. The communication channel considered for our study is Erasure Channel (EC). Here, the transmitter sends information and the receiver either receives the information or it receives a message that the information was erased. For such channel conditions, ARQ is not suitable for achieving reliable communication; however FEC can be the choice of implementation. The FEC code used for erasure channels are called erasure codes which transforms a message of k symbols into a longer message called codeword with n symbols such that the original


2349

message can be recovered from a subset of n symbols [4]. Some of the earlier work in WSN aiming to prolong network lifetime are as follows: A novel clustering algorithm called Hybrid Distributed, Energy Efficient, and Dual Homed Clustering was proposed and comparative study performed with distributed, energy efficient, dual homed clustering algorithm shows improvement in network lifetime and throughput [5]. The evaluation of hierarchical routing protocols by their sensitivity to energy heterogeneity and their effects on lifetime and other performance measures is carried out in [6]. The authors of [7] has proposed data mining based approach that reduces storage space, energy and communication cost consumption for energy efficient clustering sensor networks. By integrating localization and clustering to balance the energy consumption overall the network and to meet the real time Quality of Service (QoS) requirements, a dynamic routing protocol is introduced in [8]. The problem of finding the position to deploy sensors and the relays to reduce the energy consumption for transmission and sensing was considered and network optimization is performed [9]. Most of the related work focuses on achieving energy efficiency by working on the higher layers of the network protocol stack and have not considered erasure channel environment. However the authors of [10] have studied the performance of Luby Transform (LT) code to increase the energy conservation in WSN operated in fading environment. But the results shows that there is no significant improvement in the performance metrics and therefore Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

V. Nithya, B. Ramachandran

LT Coded IEEE 802.15.4 Transceiver

We consider a wireless sensor network consisting of sensor nodes that use IEEE 802.15.4 Zigbee transceiver under 2.4 GHz frequency band, e.g., TelosB, MicaZ. The IEEE 802.15.4 standard has been adopted by Zigbee for WSN technology. The salient features of Zigbee include low cost, very low power consumption, reliable data transfer and ease of implementation The PHY layer of IEEE 802.15.4 standard can be operated in three unlicensed frequency bands namely 858 MHz, 915 MHz and 2.4 GHz. Accordingly, the standard specifies three different physical media: (i) Direct Sequence Spread Spectrum using BPSK operating in the frequency range of 868 MHz at a data rate of 20 Kbps, (ii) Direct Sequence Spread Spectrum using BPSK operating in the frequency band of 915 MHz at a data rate of 40 Kbps, (iii) Direct Sequence Spread Spectrum using O-QPSK (DSSS-OQPSK) operating in the frequency band of 2.4 GHz at a data rate of 250 Kbps. For analysis purposes in this work, we have considered higher data rate physical media (2.4 GHz/250 Kbps) which is an internationally used license free ISM frequency band. The specifications of IEEE 802.15.4 Zigbee transceiver operated with 2.4 GHz band is as follows: a

i LT Encoder

Coded Data Bits to Zigbee Symbol Group

s

  t   sin   , 0  t  2Tc p  t     2Tc   otherwise 0 ,

The modulated signal transmitted through the channel is subjected to information loss depending upon the channel conditions. In this paper, we have considered erasure channel characterized by erasure probability to study network performance. The receiver section of Zigbee consists of the blocks to perform reverse operations as that of the transmitter. This includes demodulation, chip to Zigbee symbol remapping, and finally Zigbee symbol to regrouping followed by decoding. In this paper we have used LT codes to achieve forward error correction and thereby aim to improve network lifetime.

Zigbee Symbol to PN Chip Map

c

y

n

â LT Decoder

Zigbee Symbol to Coded Data Bits Regroup

ŝ

PN Chip to Zigbee Symbol Remap

ĉ

OQPSK Demodulator

AWGN Noise

î

(1)

Erasure Channel

II.

The data modulation scheme used here is DSSS-OQPSK. The complete block diagram of Coded IEEE 802.15.4 Zigbee RF transceiver system is shown in Figure 1. This is our proposed modified IEEE 802. 15. 4 Zigbee RF transceiver block diagram, where FEC encoder and decoder blocks are included in conventional functional block diagram. It involves spreading and modulation of input bits. In the first stage, incoming bits are grouped into four, so as to represent a Zigbee symbol. These four bits are used to select one of the 16 nearly orthogonal Pseudo random Noise (PN) sequences to be transmitted. The mapping of symbols to chips is achieved through 32chip PN sequences shown in Table I. The PN sequences are related to each other through cyclic shifts and the successive selected PN sequences are concatenated and sent to the OQPSK modulator. The incoming chip sequences to the OQPSK modulator are modulated onto the carrier with half-sine pulse shaping. The half-sine pulse shaping [11] used to represent each baseband chip is given by:

OQPSK Modulator

LT codes are not found suitable under fading channel conditions. This paper attempts to incorporate LT code to ensure reliable data delivery which in turn conserves energy under erasure channel condition. The rest of the paper is organized as follows: The LT Coded IEEE 802.15.4 Zigbee RF transceiver model for WSN is detailed in Section II. Section III describes the simulation model. Results and discussions are presented in Section IV. Finally, Section V presents the conclusions.

r

Fig. 1. Block Diagram of LT Coded Zigbee IEEE 802.15.4 RF Transceiver


International Review on Computers and Software, Vol. 8, N.10

2350


TABLE I ZIGBEE SYMBOL TO CHIP MAPPING [11] Zigbee Symbol Chip Values (c0c1,…, c30c31) 0000 11011001110000110101001000101110 1000 11101101100111000011010100100010 0100 00101110110110011100001101010010 1100 00100010111011011001110000110101 0010 01010010001011101101100111000011 1010 00110101001000101110110110011100 0110 11000011010100100010111011011001 1110 1001110000110101001 0001011101101 0001 10001100100101100000011101111011 1001 10111000110010010110000001110111 0101 01111011100011001001011000000111 1101 01110111101110001100100101100000 0011 00000111011110111000110010010110 1011 01100000011101111011100011001001 0111 10010110000001110111101110001100 1111 11001001011000000111011110111000

neighbour. 2) (Cover) The released encoding symbols cover their unique neighbour information symbols. In this step, the covered but not processed input symbols are sent to ripple, which is a set of covered unprocessed information symbols gathered through the previous iterations. 3) (Process) One information symbol in the ripple is chosen to be processed the edges connecting the information symbol to its neighbour encoding symbols are removed and the value of each encoding symbol changes according to the information symbol. The processed information symbol is removed from the ripple.

III. Simulation Model LT is first practical implementation of Fountain codes [12], [13]. Fountain codes are from class of codes called rate less codes. Any number of encoding symbols can be independently generated from k information symbols by the following encoding process: LT Encoding 1) Determine the degree d of an encoding symbol. The degree is chosen at random from a given node degree distribution P(x). 2) Choose d distinct information symbols uniformly at random. They will be neighbours of the encoding symbol. 3) Assign the XOR of the chosen d information symbols to the encoding symbol. The design of LT codes is mainly influenced by the distribution function; Distribution function helps in choosing the combination of information symbols that are to be XORed. Successful decoding also depends on the distribution function [12], [13]. The basic property required of a good degree distribution is that input symbols are added to the ripple at the same rate as they are processed. This property is the inspiration for the name Soliton distribution, as a soliton wave is one where dispersion balances refraction perfectly. For decoding of LT codes, a decoder needs to know the neighbors of each encoding symbol. This information can be transferred in several ways. For example, a transmitter can send a packet, which consists of an encoding symbol and the list of its neighbors. An alternative method is that the encoder and the decoder share a random number generator seed, and the decoder finds out the neighbours of each encoding symbol by generating random linear combinations synchronized with the encoder. With the encoding symbols and the indices of their neighbours, the decoder can recover information symbols with the following three-step process [14]. LT Decoding 1) (Release) All encoding symbols of degree one, i.e., those which are connected to one information symbols, are released to cover their unique Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

The step-wise procedure for simulating LT coded IEEE 802.15.4 Zigbee transceiver (Figure 1) in MATLAB is given below. The basic block for simulating IEEE 802.15.4 Zigbee transceiver is considered from the uncoded system [15]: 1. First, the information bits i to be transmitted are given to a LT encoder to generate coded bits a. 2. Every four bits of binary data stream generated is grouped to form a Zigbee Symbol s. 3. Each of these 16 Zigbee symbol s is mapped to the 32-chip c PN sequence as shown in Table I. 4. The chip sequence is then sent as an input to the OQPSK modulator where half-sine pulse shaping of the incoming chips is performed. 5. The modulated signal y is later transmitted through an erasure channel where the transmitted information is erased depending on the erasure probability of the channel and channel noise n is being added to the transmitted signal. 6. The decision about the transmitted signal from the received signal r is made by computing the minimum of the Euclidean distance between the received and the reference signal. 7. The estimate of the transmitted symbols ŝ is obtained from the estimated chip sequence ĉ, which is available at the demodulator output. 8. The reverse process of chip to Zigbee symbol remapping and symbol to bits â, regrouping are done. 9. Finally, the lost bits are recovered by passing the information stream through belief propagation decoder to recover the source information î. The bit error rate is obtained by dividing the bit error count by the total number of information bits transmitted. The energy efficiency of the wireless sensor network can be calculated in terms of energy spent per bit. The parameters used in the calculation are based on CC2420 IEEE 802.15.4 Zigbee transceiver chip and Texas Instrument’s MSP 430 Microcontroller with 10 KB [16]. The network parameters used in our simulation are taken from the IEEE 802.15.4 standard values as specified below in the Table II.


2351


21

I r (mA)

23

Vr (v)

3.6

l Rb (kbps)

256

Overhead H+FCS (bytes)

11

IV.

250

Simulation Results

The performance of IEEE 802.15.4 compliant WSN is studied for specified network parameters as a function of transmitted SNR and erasure probability. The wireless channel modeled for our study is the Erasure channel. The erasure probability of the channel is chosen between 0 and 0.5. Here, the error control schemes considered for simulation are ARQ, which is an optional reliability mechanism available in IEEE 802.15.4 Zigbee RF transceiver with maximum number of retries as 7 and default value is set to as 3 and LT codes for different values of k and n. The bit energy (Eb) is assumed as 0.3J. Simulations are carried out for 106 information bits. Fig. 2 shows the BER analysis of LT coded communication with reference to uncoded and ARQ of 3 retransmissions. The k and n value of LT code considered for analysis is 104 and 204 respectively. The erasure probability of the channel is considered as 0.1. It is found that the ARQ strategy does not provide any improvement in BER as the retransmissions occur over the same channel conditions. However, LT codes shows small improvement in BER for low SNR values from -5 to -1and the error rate reduces to a greater extent for SNR values above 0 dB. Fig. 3 depicts the energy spent per bit for LT coded network over uncoded network. Energy spent per bit is calculated as the ratio of total energy spent by the network over number of successfully received bits without any error. As before ARQ scheme of three retransmissions is considered for comparative study. It is observed that ARQ scheme does not provide any improvement in the BER and also consumes more energy for doing 3 retransmissions. But as the error rate reduces as SNR increases, LT coded network consumes only slightly higher energy compared to uncoded network. This additional energy is due to the redundant bits introduced by LT code. Fig. 4 shows the throughput efficiency of LT codes over an uncoded network. Throughput efficiency is defined as the ratio of total number of bits received successfully to the total number of bits transmitted. It is clear from the figure that the number of bits successfully received by the LT coded network is higher than the uncoded network where the loss of information is more. From the above results we can conclude that the ARQ strategy cannot improve the performance of the network in terms of BER, Energy efficiency, throughput under


BER Vs SNR

0

10 uncoded LT code(n,k)-(204,104) ARQ-3 Retxn

BER

Values

erasure channel environment. Similar kind of analysis is also carried out for a given SNR value of 0 dB and by varying the erasure probability of the channel. This is performed to study the behavior of the network for different channel conditions of varying erasure probabilities. Figs. 5, 6 and 7 shows the performance of the network as a function of erasure probability of the channel in terms of BER, Energy spent per bit and throughput efficiency respectively. It is clear that as the erasure probability of the communication channel increases the overall performance of the network degrades. The BER of the LT coded network increases from 0.01 to 0.8 for erasure probability of range 0.05 to 0.5 as shown in Fig. 5. This follows the fact that higher value of erasure probability indicates more erasures and low value denote minimum number of erasures introduced by the channel. Fig. 6 shows the energy spent per bit by varying erasure probability of the channel.

-1

N=106 Eb=0.3J

10

h=0.1

-2

10

-5

-4

-3

-2

-1

0

1

2

3

4

5

SNR(dB)

Fig. 2. Comparative analysis of BER of LT coded WSN with uncoded and ARQ scheme by varying SNR Energy Spent per Bit Vs SNR

0.012 uncoded LT code(n,k)-(204,104) ARQ-3 Retxn

0.01 Energy spent per bit (mJ)

TABLE II NETWORK PARAMETERS [16] Parameter I t (mA)

N=106 Eb=0.3J

0.008

h=0.1

0.006

0.004

0.002

0 -5

-4

-3

-2

-1

0

1

2

3

4

5

SNR(dB)

Fig. 3. Comparative analysis of Energy Spent per bit by LT coded WSN with uncoded and ARQ scheme by varying SNR


2352


As mentioned before, the energy spent per bit of ARQ scheme is higher compared to uncoded and LT coded network. However, the energy spent per bit of LT coded network follows that of uncoded network because BER of LT coded network shows only small improvement over uncoded network for the entire range of erasure probability as shown in Fig. 5. Therefore, the energy spent measured in terms of micro joules of coded network overlaps with that of uncoded network. The throughput efficiency of the network by varying the erasure probability of the channel as shown in the Fig. 7 follows that of Fig. 5. It is clear that the throughput efficiency decreases as the number of erasures introduced by the channel increases. Figs. 8 and 9 depict the error correcting capability of the LT code chosen for the study. i.e., k=104 and n=204. The parameters used for this analysis are information BER and transmission BER. Information BER is defined as the ratio of total number of error bits obtained after decoding to the total number of information bits transmitted.

Throughput Efficiency Vs SNR

1 0.9 N=106 Eb=0.3J

0.8 Throughput Efficiency

h=0.1

0.7 0.6 0.5 uncoded LT code(n,k)-(204,104) ARQ-3 Retxn

0.4 0.3 0.2 0.1 -5

-4

-3

-2

-1

0

1

2

3

4

5

SNR(dB)

Fig. 4. Comparative analysis of throughput efficiency of LT coded WSN with with uncoded and ARQ scheme by varying SNR BER Vs Erasure Probability

0

10

Throughput Efficiency Vs Erasure Probability

1 uncoded LT code(n,k)-(204,104) ARQ-3 Retxn

0.8

uncoded LT code(n,k)-(204,104) ARQ-3 Retxn

-1

10

Throughput Efficiency

BER

0.9

N=106 Eb=0.3J SNR=0dB

0.7 0.6 0.5 0.4 N=106 Eb=0.3J

0.3

SNR=0dB

-2

10 0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.2

0.5

Erasure Probability

0.1 0.05

Fig. 5. Comparative analysis of BER of LT coded WSN with uncoded and ARQ scheme by varying erasure probability of the channel

N=10 Eb=0.3J

0.35

0.4

0.45

0.5

Transmission BER Information BER

0.6

SNR=0dB

0.5

0.008

BER

Energy Spent per Bit (mJ)

0.3

0.7

6

0.01

0.006

0.4 0.3

0.004

N=106 h=0.1

0.2

0.002 0 0.05

0.25

0.8

uncoded LT code(n,k)-(204,104) ARQ-3 Retxn

0.012

0.2

Fig. 7. Comparative analysis of throughput efficiency of LT coded WSN with uncoded and ARQ scheme by varying erasure probability of the channel

Erasure Probability vs Energy Spent per Bit

0.014

0.15

Erasure Probability

0.018 0.016

0.1

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.1

0.5

Erasure Probability

Fig. 6. Comparative analysis of Energy spent per bit by LT coded WSN with uncoded and ARQ scheme by varying erasure probability of the channel


0 -5

-4

-3

-2

-1

0

1

2

3

4

5

SNR(dB)

Fig. 8. Comparison of Information BER and Transmission BER of LT coded WSN by varying SNR


2353


1 0.9 0.8

[2]

Transmission BER Information BER N=106 SNR=0dB

[3]

0.7 BER

[4] 0.6

[5] 0.5 0.4

[6]

0.3 0.2 0.05

0.1

0.15

0.2

0.25 0.3 0.35 Erasure Probability

0.4

0.45

[7]

0.5

Fig. 9. Comparison of Information BER and Transmission BER of LT coded WSN by varying erasure probability of the channel

[8]

Transmission BER is defined as the ratio of number of error bits obtained before decoding to the transmitted encoded bits. It is clear that for a code to be efficient information BER will be always lesser than transmission BER. It is found from the Fig. 8 that information BER is very low compared to transmission BER for various values of SNR. This corresponds to the decoding capability of the LT decoder used. The decoding method used in our simulation study is Belief Propagation decoding. Similar kind of study is also performed by varying erasure probability of the channel as shown in Figure 9. It is observed that the decoder performs well under severe erasure condition and this is found from the BER values of higher values of erasure probability.

V.

Conclusion

[10]

[11] [12]

[13]

[14]

[15]

This paper provides a solution for the reliability issue present in WSN under erasure environment. The use of LT codes in IEEE 802.15.4 compliant WSN showed improved performance in terms of BER, Energy spent per bit and throughput efficiency. Also, it is found that LT code with (n, k) having (204, 104) performs well for various erasure probability of the channel. The decoding method used in our simulation is belief propagation decoding. However the efficiency of any forward error correction technique relies upon the decoding efficiency. Therefore as a future work, the performance of LT code with different decoding structure such as Incremental Gaussian Elimination method can be evaluated. Similar kind of analysis can be carried out by selecting other rate less codes such as Low Density Parity Check (LDPC) code and can find an optimum coding method for WSN under erasure environment.

[16] [17]

[18]

[19]

[20]

[21]

References [1]

[9]

I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A Survey on Sensor Networks” IEEE Communications Magazine, vol. 40, no. 8, pp. 102-114, Aug. 2002.


H. She, Z. Lu, A. Jantsch, D. Zhou, and L. R. Zheng, “Analytical Evaluation of Retransmission Schemes in Wireless Sensor Networks,” Proceedings of IEEE 69th Vehicular Technology Conference, Barcelona, Spain, pp. 1-5, April 2009. A. Willig, “Recent and emerging topics in Wireless industrial communications: A selection,” IEEE Transaction on Industrial Informatics, vol. 4, no. 2, pp. 102-124, May 2008. J. G. Proakis, “Digital Communications,” 4th edition, McGrawHill, Inc., NY, 2002. Maizate, A., El Kamoun, N., A new metric based cluster head selection technique for prolonged lifetime in wireless sensor networks, (2013) International Review on Computers and Software (IRECOS), 8 (6), pp. 1346-1355. Al-Hilal, A., Dowaji, S., Evaluation of WSN hierarchical routing protocols according to energy efficiency heterogeneity levels, (2013) International Review on Computers and Software (IRECOS), 8 (5), pp. 1170-1179. Rizvi, S.S., Chung, T.-S., Investigation of in-network data mining approach for energy efficient data centric wireless sensor networks, (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 443-447. Tan, P., Ju, M., An energy-efficient real-time routing protocol for wireless sensor networks, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2285-2289. Zeng, B., Yao, L., Wang, R., An energy efficient deployment scheme for ocean sensor networks, (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 507-513. V. Nithya, B. Ramachandran, and K. Muruganand, “Energy Conservation in IEEE 802. 15. 4 Compliant Wireless Sensor Network using LT codes,” International Journal of Computer Applications, vol. 79, no. 12, pp. 11-16, Oct. 2013. IEEE 802.15.4 version 2006, IEEE Standards Association, http://standards.ieee.org/getieee802/download/802.15.4-2003.pdf Z. Zhiliang, et.al. “Performance analysis of LT codes with different degree distribution”, Fifth International Workshop on Chaos Fractal Theories and Applications, IEEE 2012. M. Luby, “LT Codes”, Proceedings of 43rd Annual IEEE symposium on Foundations of Computer Science, pp.271-282, 1619 November 2002. C. CongZhe, F. ZeSong, X. Ming, H. GaiShi, X. ChengWen, K. JingMing, “An extended packetization-aware mapping algorithm for scalable video coding in finite-length fountain codes,” Science China Information Sciences, vol. 56, no.4, pp. 1-10, April 2013. V. Nithya, B. Ramachandran and Vidhyacharan Bhaskar, “ Energy and Error analysis of IEEE 802.15.4 Zigbee RF transceiver under various fading channels in Wireless Sensor Network”, International Conference on Advanced Computing, MIT, Anna University, Chennai, India, pp. 1-5, 13-15 Dec.2012. CC2420,http://foi.com/analog/docs/enggressdetail.tsp?familyld=3 67&genContentId=3573. Munsif Ali Jatoi, Forward Error Correction Using Reed-Solomon Coding and Euclid Decoding in Wireless Infrared Communications, (2013) International Journal on Communications Antenna and Propagation (IRECAP), 3 (2), pp. 97-101. Ghandi Manasra, Osama Najajri, Samer Rabah, Hashem Abu Arram, DWT Based on OFDM Multicarrier Modulation Using Multiple Input Output Antennas System, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (5), pp. 312-320. Said Ben Alla, Abdellah Ezzati, A Qos-Guaranteed Coverage and Connectivity Preservation Routing Protocol for Heterogeneous Wireless Sensor Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 363-371. Reza Mohammadi, Reza Javidan, Adaptive Quiet Time Underwater Wireless MAC: AQT-UWMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (4), pp. 236-243. D. David Neels Pon Kumar, K. Murugesan, K. Arun Kumar, Jithin Raj, Performance Analysis of Fuzzy Neural based QoS Scheduler for Mobile WiMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 377-385.


2354


Authors’ information Department of Electronics and Communication Engineering, SRM University, Kattankulathur -603203, Tamilnadu, India. V. Nithya received her Bachelor’s degree in Electronics and Communication Engineering from Adhiparasakthi Engineering College, Melmaruvathur (affiliated to University of Madras, Chennai) in 2002 and Master’s degree in Communication Systems from S.R.M. Engineering College, Chennai (affiliated to Anna University, Chennai), in 2004. She is pursuing her research in the area of Wireless Sensor Networks. She is working in the department of Electronics and Communication Engineering at S.R.M. University, Chennai as an Assistant Professor. She has published 5 referred journals and 8 conference papers. Her teaching and research interests include Wireless Communication, High Performance Networks, Wireless Sensor Networks and Mobile Ad hoc Networks. She is a Life member of Institution of Electronics and Telecommunications Engineers. E-mail: [email protected] B. Ramachandran was born in Avaraikulam, near Kanyakumari in Tamilnadu, INDIA in 1969. He received Bachelor’s degree in Electronics and Communication Engineering from Thiagarajar College of Engineering, Madurai in 1990 and Master’s degree in Satellite Communications from Regional Engineering College (presently known as National Institute of Technology), Trichy in 1992. He obtained his Ph.D degree in the area of Wireless Mobile Networks from Anna University Chennai in 2009. He joined the department of Electronics and Communication Engineering at S.R.M. Engineering College, Chennai as a Lecturer in 1993 and became Assistant Professor in 2000. At present he is working as Professor in the Faculty of Engineering and Technology of SRM University in Chennai. He authored a textbook on Digital Signal Processing. His teaching and research interests include Digital Communication, Wireless Networks, Network Security, and Mobile Computing. He has published 25 research papers in national and international conferences and journals. He is a member of ISTE, and fellow of IE(I), and IETE. E-mail: [email protected]



2355


On the Performance of MANET Using QoS Protocol B. Nancharaiah, B. Chandra Mohan Abstract – A mobile adhoc network (MANET) is a collection of wireless nodes without any central administrator. Because of the mobility of the network, a routing protocol is required to adopt the frequent changes in the network topology. A Quality of Service (QoS) protocol that supports real- time applications is adopted in MANETS to a feasible path that satisfies QoS constraints. The most important phase of the QoS routing protocol is route discovery phase, in which the heuristic search algorithm is required to search their route cache for the availability of routes to destination. With no recent research in this field, there is a need of optimization algorithm, such as Cuckoo Search Algorithm (CSA). In this work, we propose an approach that uses CSA to find the feasible paths in the route discovery of MANET using QoS protocol. The proposed algorithm is superior compared to an existing algorithm in which Ant Colony Optimization (ACO) with particle swarm optimization (PSO) is used. The proposed QoS routing protocol performs better compared to PSO hybrid with ACO algorithm in terms of throughput, packet delivery, end- to-end delay and efficiency. Copyright © 2013 Praise Worthy Prize S.r.l. All rights reserved.

Keywords: Cuckoo Search Algorithm (CSA), ACO_PSO, QoS Protocol, MANET

Nomenclature si

t



Solution Step length

X it 1   Levy   

New solution Step size Entry wise multiplications Levy flight

pa

Fitness probability rate

Fi Fj

Fitness New fitness

I.

Introduction

Presently the routing protocols in adhoc networks of wireless hosts have encountered many issues. In mobile node environment the changing topology triggers frequent re-computation of routes and the overall convergence to stable routes is mostly impossible because of the high-level of mobility. Routing in mobile adhoc networks (MANETs) should consider their important characteristics, such as node mobility. There are numerous routing protocols presently proposed for adhoc networks [1]. An adhoc network is a collection of nodes that are dynamically and arbitrarily located such that the inter connections between nodes can change continually [2]. The process of path selection and directing packets from a network source node to the destination node is


2356

routing, which is an active area of research in adhoc networks [3]. The main advantages of an adhoc network are as follows: i) Independence from central network administration ii) Self-configuring: nodes are also routers iii) Selfhealing: through continuous re-configuration iv) Scalable: accommodates the addition of more nodes v) Flexible: can access the Internet from varied different locations [4]. A metric is a standard of measurement; such as path bandwidth, reliability, delay, and current load on that path, which is used by routing algorithms to determine the optimal path to a destination [5]. Mobile hosts communicate with each other using multi-hop wireless links and the nodes are free to move arbitrarily; thus, network topology, which is typically multi-hop, may change randomly and rapidly at unpredictable times [6]. Each node in the network also acts as a router, as it forwards data packets for other nodes [7] [8] and sends packets to the destination, each consecutive node then forwards the packet to its neighbours in turn, until the packet reaches the destination [8] [9]. Thus a wireless adhoc network does not have a clear line of defence, and every node must therefore be prepared for encounters with an direct or indirect adversary [10]. Mostly, destination/next hop relations informs a router that a particular destination can be touched optimally by sending the packet to a specific node on behalf of the “next hop” on the way to the final destination [11]. A router checks the destination address and attempts to associate this address with the next hop, when it receives an incoming packet.


B. Nancharaiah, B. Chandra Mohan

These networks comprise a combination of fixed wireless services and mobile networking [12]. As community networks are absent in hierarchicallyorganized networks, a number of challenges are presented [13]. The sender uses a route to transmit the packet when, a route is found. The sender may attempt to discover a route using the route discovery protocol- if no route is found.

II.

Related Works

Ali et al. [14] have proposed a method for the routing protocols in adhoc and sensor wireless networks with genetic programming (GP), Neural Network, Evolutionary programming (EP), Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO). There were restraints involved in those protocols because of the mobility and non-infrastructure nature of an adhoc and sensor networks. This paper included a probabilistic performance evaluation frameworks and Swarm Intelligence approaches (PSO, ACO) for routing protocols. The performance evaluation metrics employed for wireless and adhoc routing algorithms was: (a) routing overhead, (b) route optimality, and (c) energy consumption. The proposed method critically analysed PSO and ACO based algorithms with other approaches applied for the optimization of an adhoc and wireless sensor network routing protocols. Mahmood et al. [15] have evaluated the method of MANETs. They introduced a new adaptive and dynamic routing algorithm for MANETs based on the ACO algorithms with network delay analysis. ACO algorithm helps in finding, if not the shortest, at least a very good path connecting the colony’s nest with a source of food. This experimental evaluation of MANETs was based on the estimation of the mean end-to-end delay to send a packet from the source to destination node through a MANET. The most important performance evaluation metrics in computer networks was evaluated as mean end-to-end delay. They proved that algorithm offers good results under certain conditions such as, increasing the pause time and decreasing the node density. Guimarães et al. [16] presented a short overview of some existing quality of service (QoS) routing mechanisms and protocols in MANETs. Yang and Deb [17], developed a optimization mechanism using cuckoo search algorithm(CSA), a new meta-heuristic algorithm imitating animal behaviour. The optimal solutions obtained by the CS are much better than the best solutions obtained by efficient particle swarm optimizers and genetic algorithms. Santhi et al. [18] provided a thorough overview of the more widely accepted MAC and routing solutions for providing better QoS in MANETs. B.Nancharaiah and B.C.Mohan [19] have proposed work addresses routing problem by employing Ant Colony Optimization and Fuzzy logic techniques while developing the routing algorithm. The path information by ants will be given to FIS (Fuzzy Interference system)


in order to compute the available path’s score values, based on this score value from the FIS system the optimal paths will be selected. Hence, the routing problem can be solved more effectively by achieving high successful path delivery rate rather than the conventional routing algorithms.

III. Cuckoo Search Algorithm (CSA) Cuckoo Search (CS) is based on the obligate brood parasitism of some cuckoo species, who lay their eggs in the nests of host birds. Cuckoos utilize a forceful strategy of reproduction they occupy the female hew nests of other birds to fertilize their eggs. Occasionally, the egg of cuckoo in the nest is revealed and the hacked birds throw away or abandon the nest and rear their offspring elsewhere. The CS models such breeding behavior and, thus, can be applied to various optimization problems based on the subsequent three idealized rules, which are as follows:  Every cuckoo lays one egg at a time, and deposits it in a erratically chosen nest;  The top nests with high class of eggs (solutions) take over to the next generations;  The number of existing host nests is fixed, and a host can detect an alien egg with a possibility pa   0,1 . In this case, to build an entirely new nest in a new place the host bird can either throw the egg away or discard the nest as it is. An essential advantage of this algorithm is its simplicity. There is basically only a single parameter in CS (apart from the population size) as compared to other population- or agent-based metaheuristic algorithms such as PSO and harmony search. The CS pseudo code of the CS is set in the following [20]. The steps involved in CSA are listed below: Step 1: Generate random solutions in the form of:

Sol :  S1 ,S2 ,S3 ....,Sn  Step 2: Fitness of the solutions can be found by using Eq. (1):

f  S1 ,S 2 ,S3 ,...,S n  

n

 i 1 Si2

(1)

Step 3: Determine the better fitness and generate the new solution using Eq. (2):

X it 1  X it    Levy   

(2)

where   0 , which should be related to the scale of the problem of interest,. In this work, we considered a Levy flight in which the step-lengths are distributed according to the probability distribution in Eq.(3):

Levy     t  , 1    3

(3)

Step 4: Find the fitness probability rate using pa   0,1 ,


2357


whereas the best solution can be determined using Eq. (1). Step 5: Terminate the process Pseudo code of Cuckoo Search Algorithm begin Objective function f  x  , X   x1 ,x2  xd  Generate initial population of n host nests X i  i  1, 2 n  while  t  MaxGeneration  or (stop criterion) Get a cuckoo randomly by Levy flights evaluate its quality/fitness Fi Choose a nest among n (say, j) randomly





if Fi  F j ,

for the type of dynamic changes that may occur in an adhoc network. Routers do not generally move around in conventional networks and only rarely leave or join the network [22]. QoS has been defined by the Consultative Committee for International Telephony and Telegraphy (CCITT) as “The collective effect of service Performance which determines a degree of satisfaction of a user of the service”. It has a rapid enhancement in both wired and wireless network communication. QoS in MANETs depends not only on the available resources but also on the mobility rates of such resources. It means providing a set of parameters to adapt the applications to the quality of the network while routing them through the network [23] [24]. The three main constraints related to the QoS are bandwidth constraints, dynamic topology of MANET and the limited processing and storing capacity of mobile nodes. This has led to the development of several routing protocols that emphasizes on the implementation of effective technologies to improve QoS thereby significantly increasing the performance [25].Hence, the task of QoS routing is to optimize the network resource utilization while satisfying application requirements [26]. This can be possible using the QoS parameters such as: throughput, packet delivery ratio, end- to- end delay and efficiency. The QoS protocol has its main impact on route discovery property of routing.

replace j by the new solution; IV.1. Parameters Description end

Throughput or network throughput is the average rate of successful message delivery over a communication channel. Throughput can be measured in bits per second (bit/s or bps), data packets per second, or data packets per time slot is given in Eq. (4). This data may be delivered over a physical or logical link, or pass through a certain network node. The greater value of throughput means the better performance of the protocol: Tot.no of data packets throughput= (4) Time taken

A fraction  pa  of worse nests are abandoned and new ones are built; Keep the best solutions (or nests with quality solutions); Rank the solutions and find the current best end whilePost process results and visualization

Packet delivery ratio is defined as the ratio of the number of data packet delivered to the destination. It indicates the level of delivered data to the destination. Greater value of packet delivery ratio indicates better performance of the protocol. Packet delivery ratio formula is given in Eq. (5):

end

IV.

MANET Using QoS Protocol

Routing is the exchange of information from one station of the network to the other. The major goals of routing are to find and maintain routes between nodes in a dynamic topology with possibly uni-directional links, using minimum resources. A protocol is a set of standard or rules that allows data exchange between two devices. Routing protocols in wired networks are based on either distance vector or link state routing algorithms [21]. Both of these algorithms require periodic routing advertisements to be broadcast by each router. These conventional routing algorithms are clearly not efficient


packet delivery ratio=

 No.of packets received  No.of packets send

(5)

End-to-end delay is defined as the average time taken by a data packet to arrive to the destination. It also includes the delay caused by route discovery process and the queue in data packet transmission. Only those data packets that are successfully delivered to the destinations are counted. The unsuccessful delivery could be because


2358


of possible delays caused by buffering during route discovery latency and queuing at the interface queue. The Lower value of end- to- end delay indicates a better performance of the protocol. End- to- end delay formula is given in Eq. (6):

end-end delay =

  arrive time-send time   No.of connections

Figs. 2-5 indicate the performance comparison for nodes = 100, with data rate = 5 packets/s. It is known that throughput increases when the connectivity is better. Fig. 2 demonstrates that evaluation using CSA yields better throughput as compared to ACO-PSO.

(6)

In Routing efficiency defines the energy consumption used for transmission and reception of data packets. endto- end delay. Efficiency formula is given in Eq. (7):

Efficiency =

V.

Total power consumption Total number of hops

(7)

CS Algorithm in MANET Routing Using QoS

A CSA-based source-routing protocol for MANETs has been proposed, which used QoS metrics to detect an optimal solution to a problem. Route discovery is the process of finding a route between two nodes as shown in the Fig. 1. For node S to send data to node D, it must first discover a route to node D. Node S discovers a route to node D via node B, and sets up the route. Once the route is established, node S can begin sending data to node D along the route.

Fig. 2. Throughput(bps) for CS and ACO-PSO under different iterations for nodes = 100

Fig. 1. Route discovery in MANET

The QoS parameters such as throughput, packet delivery ratio, end-to-end delay, and efficiency are used as metrics and the heuristic search algorithm used its fitness values to find the optimal path.

VI.

Fig. 3. Packet delivery ratio for CS and ACO-PSO under different iterations for nodes = 100

Experimental Results

This section details the experimentation and performance evaluation of the proposed approach. The route discovery basis with CSA has been proposed using QoS performance metrics. The results of the CSA in MANET using QoS protocol has been compared using ACO-PSO for performance evaluation. The different parameters with algorithm comparisons are shown in Figs. 2-5. Moreover, the MANET routing network paths between the source and destination nodes is shown in Fig. 6. The simulation parameters were as follows: No. of nodes = 100, and 200 Data rate = 5 packets/s and 10 packets/s Packet size = 512 Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Fig. 4. End- to- end delay(s) for CS and ACO-PSO under different iterations for nodes = 100


2359


Upon considering the above QoS parameters CSA finds the optimal path as for route discovery which is in Fig. 11. The blue colour line in Fig. 11 indicates the path through the CS algorithm and dotted magenta indicates the path through ACO-PSO.

Fig. 5. Efficiency for CS and ACO-PSO under different iterations for nodes = 100

Fig. 7. Throughput(bps) for CS and ACO-PSO under different iterations for nodes = 200

Fig. 6. Communication network with routing paths for nodes = 100 with data rate = 5 packets/s

In Fig. 3 packet delivery ratio using CSA is much better than conventional algorithm ACO-PSO. End to End delay values should be lowest for the better connectivity. Even if CS has poor delay values up to iterations=70, it goes better after that as shown in Fig. 4. Here the power consumption was considered to indicate the efficiency, that is-lower the power consumption higher the efficiency, CSA was found to have better efficiency than ACO-PSO as shown in Fig. 5. The communication network graph shown in Fig. 6, consists of source node, destination node and all intermediate nodes. In this approach source node is 12 and destination node is 45. This network consists of two paths, indicated by blue colour lines and dotted magenta. The blue colour line in Fig. 6 indicates the path through the CS algorithm and dotted magenta indicates the path through ACO-PSO. Therefore, from the graph in Fig. 6, it can be concluded that considering the QoS parameters. CS algorithm finds the best path as compared to conventional ACO-PSO algorithm. Fig. 7 to Fig. 10 indicate the performance comparison for nodes = 200 with data rate = 5 packets/s. From the Fig. 7, Fig. 8, Fig. 9 and Fig. 10 it is clear that CSA has better throughput, delivery ratio, end to end delay and efficiency while compared to ACO-PSO. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Fig. 8. Packet delivery ratio for CS and ACO-PSO under different iterations for nodes = 200

Fig. 9. End- to end delay(Sec.) for CS and ACO-PSO under different iterations for nodes = 200


2360


Fig. 10. Efficiency for CS and ACO-PSO under different iterations for nodes = 200


Fig. 11. Communication network with routing paths for nodes = 200 with data rate = 5 packets/s Fig. 14. Communication network with routing paths for nodes = 300 with data rate = 10 packets/s

Therefore, the optimal path finding communication network is plotted in Fig. 12, Fig. 13 and Fig. 14. The blue colour line in Fig. 12, Fig. 13 and Fig. 14 indicate the path through the CS algorithm and dotted magenta indicates the path through ACO-PSO. The results suggest that CSA finds the optimal path as compared to hybrid ACO-PSO.

VII.


For further analysis of the efficiency of the algorithm data rate for evaluation is changed to 10 packets/s. The parameters comparisons results are similar to that of initial data rate = 5 packets/s.


Conclusion

In this paper, we employed a recent heuristic search algorithm called CS algorithm to find the optimal solution for the problems in MANETs. Based on the QoS routing protocol the performances are evaluated using the QoS metrics such as throughput, packet delivery ratio, end- to- end delay and efficiency. The performance comparison of proposed approach is made with the conventional algorithm such as ACO with PSO hybrid. From the comparisons graph using the QoS parameters and communication network graph it can be concluded that CS algorithm performs better than


2361


conventional ACO-PSO in terms of throughput, packet delivery ratio, end- to – end delay and efficiency.

References [1]

[2]

[3]

[4]

[5]

[6]

[7] [8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Kun Tan, Qian Zhang and Wenwu Zhu, “Shortest path routing in partially connected adhoc networks”, IEEEGlobal Telecommunications Conference, Vol. 2, pp. 1038- 1042, 2003. Sachin Kumar Gupta and Saket, “Routing Protocols in Mobile Ad-hoc Networks”, Special Issue of International Journal of Computer Applications, Vol. 4, pp. 24-27, Dec 2011. Sunil Taneja and Ashwini Kush, “A Survey of Routing Protocols in Mobile Adhoc Networks”, International Journal of Innovation, Management and Technology, Vol. 1, No. 3, pp. 279-285,August 2010. Tanu Preet Singh, “Adhoc Networks: An Analytical Overview”, International Journal of Computing and Corporate Research, Vol. 2, No. 1, pp. 1-21, January 2012. Anelise Munaretto and Mauro Fonseca, “Routing and quality of service support for mobile adhoc networks”, Computer Networks, Vol. 51, No. 11, pp. 3142-3156, February 2007. M.Uma and Dr.G.Padmavathi “A Comparative Study and Performance Evaluation of Reactive Quality of Service Routing Protocols in Mobile Adhoc Networks", JATIT, Vol.6,No.2, August 2009. Gary Breed, “Wireless Adhoc Networks: Basic Concepts”, High Frequency Electronics, pp. 44-46, March, 2007. Raju Baskar and Gulfishan Ahmed, “Different Approaches on Cooperation in Wireless Adhoc Networks”, International Journal of Computer Applications, Vol. 28, No. 3, pp. 36-41, August 2011. David B. Johnson, David A. Maltz and Josh Broch, “DSR: the dynamic source routing protocol for multi hop wireless adhoc networks”, In Adhoc networking, pp. 139-172, December 2000. Haiyun Luo , Zerfos , Jiejun Kong , Songwu Lu and Lixia Zhang, “Self-securing Adhoc Wireless Networks”, In Seventh IEEE Symposium on Computers and Communications, pp. 1-17, 2002 Seoung-Bum Lee, Gahng-Seop Ahn, Xiaowei Zhang and Andrew T. Campbell, “INSIGNIA: An IP-Based Quality of Service Framework for Mobile adhoc Networks”, Journal of Parallel and Distributed Computing, Vol. 60, pp. 374-406, 2000. Anelise Munaretto and Mauro Fonseca, “Routing and quality of service support for mobile adhoc networks”, Computer Networks, Vol. 51, No. 11, pp. 3142-3156, 2007. Upkar Varshney and Ron Vetter, “Emerging mobile and wireless networks”, Magazine Communications of the ACM, Vol. 43, No. 6, pp. 73 – 81, June 2000. Zulfiqar Ali and Shahzad, “Critical Analysis of Swarm Intelligence based Routing Protocols in Ad-hoc and Sensor Wireless Networks”, International Conference on Computer Networks and Information Technology, pp. 287- 292, 2011. Toqeer Mahmood, Tabassam Nawaz, Rehan Ashraf and Syed M. Adnan Shah,“Gossip Based Routing Protocol Design for Adhoc Networks”, IJCSI International Journal of Computer Science Issues, Vol. 9, No 1, pp. 177-181, January 2012. Rafael Guimarães, JuliánMorillo, Llorenc Cerdà, José-M. Barceló, and Jorge García “Quality of Service for Mobile Ad-hoc Networks: an Overview” Technical University of Catalonia, Barcelona, Spain, June 2004. Yang XS, Deb S, (2010) Engineering Optimisation by Cuckoo Search, Int. J. Mathematical Modelling and Numerical Optimisation. Vol. 1, No. 4, pp 330–343. G. Santhi and Alamelu Nachiappan, “A survey of QoS routing protocols for mobile adhoc networks” (IJCSIT) Vol.2, No.4, pp. 125-136,August 2010. B.Nancharaiah and B.Chandra Mohan, “Routing in Mobile Adhoc Networks(MANETS) using Fuzzy Aided Ant Colony Optimization(ACO)Technique”, Journal of Theoretical and Applied Information Technology, Vol. 53 No.2, pp. 227-235, June 2013. Xin-She Yang and Suash Deb, “Cuckoo Search via Levy Flights”, Proc. of World Congress on Nature & Biologically Inspired Computing, pp. 1-7, 2010.


[21] Ye, Z., Krishnamurthy, S.V., Tripathi,"A Framework for Reliable Routing in Mobile Adhoc Networks" INFOCOM 2003, TwentySecond Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, Vol.1, pp:270 – 280, 30 March3 April 2003. [22] A. Munaretto and M. Fonseca. Routing and quality of service support for mobile adhoc networks. Computer Networks, 51(11),pp:3142–3156, 2007. [23] NavidNikaein and Christian Bonnet and ShophiaAntipolis “A Glance at Quality of service Models for Mobile Adhoc Networks” NAC 2002, DNAC (De Nouvelles Architectures pour les Communications), December 2-4, 2002, Paris, France. [24] P. Chenna Reddy, and Dr. P. ChandraSekhar Reddy, "Performance analysis of adhoc network routing protocols", Academic Open Internet Journal, ISSN: 1311-4360, Vol. 17, April, 2006. [25] Islam, M.M., Pose, R.., & Kopp, “Makings SAHN-MAC independent of single frequency channel and omnidirectional antennas”, International Conference on Networks and Communication Systems (NCS), pp 1-6.,2005. [26] Abdi, A., Asghari, S.A., Pourmozaffari, S., Taheri, H., Pedram, H., An optimum instruction level method for soft error detection, (2012) International Review on Computers and Software (IRECOS), 7 (2), pp. 637-641.

Authors’ information B. Nancharaiah received the bachelor degree (B.E) in Electronics & Communication Engineering from S R K R Engineering College, Bhimavaram affiliated to Andhra University in 1999, the master degree (M.Tech) in Electronics & Communication Engineering from Pondicherry Engineering College, Pondicherry Central University in 2003. He is pursuing PhD in JNTU, Hyderabad, having Twelve years of Teaching Experience. Currently, He is working as Faculty (ECE Dept.) in NRI Institute of Technology, Guntur. His research interests are in the areas of Wireless Communications, Mobile Computing and Networks. Chandra Mohan B. received the bachelor degree in Electronics & Communication Engineering from Bapatla Engineering College, Bapatla, in 1990, the master degree in Microwave Engineering from Cochin University of Science and Technology in 1992. He obtained his PhD degree from JNT University, Hyderabad in 2009. Presently, He is professor and Head in the department of ECE, Bapatla Engineering College. He has Twenty One years experience of teaching undergraduate students and post graduate students. His research interests are in the areas of image watermarking, image compression and Communications.


2362


Performance Analysis of Modulation and Coding to Maximize the Lifetime of Wireless Sensor Network M. Sheik Dawood1, G. Athisha2 Abstract – A wireless sensor network (WSN) consists of several sensor nodes to monitor physical or environmental conditions. The development of WSNs is motivated by several applications such as military surveillance, industrial and consumer applications. In this paper transmission mechanism of a sensor node is analyzed with modulation schemes and error control codes. It furthermore compared based on channel condition in a sensor network. The modulation schemes considered for this work including 16-PSK, 16-PAM, 16-QAM and 16-FSK along with convolutional, Golay and RS codes under both AWGN (Additive White Gaussian Noise) and Rayleigh fading channels. To maximize the lifetime of WSN, appropriate combination of modulation scheme with error control codes is chosen for sensor data transmission. The result shows that 16-FSK with Golay codes in AWGN and 16-QAM with Golay codes in Rayleigh channel is more energy efficient than other combination of modulation and coding techniques for energy efficient sensor data transmission. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Sensor Network, Energy Efficiency, Lifetime, Modulation, Error Control Code

I.

Introduction

II.

In the recent past, sensor network and its application have been playing an important role in research and industrial needs. Sensor devices have evolved over the last decade to support various applications, such as asset monitoring, surveillance, structural health monitoring, habitat monitoring, and even underwater sensing. A WSN can be defined as a network of devices, denoted as nodes, which can sense the environment and transmit the information gathered from the monitoring field through wireless links [1]-[32]. The data are transmitted via multiple hops, to a sink and then to other Networks through a gateway node [1]-[3]. The node can be homogeneous or heterogeneous. In a sensor network, the transceiver selects two types of modulation techniques based on the sensing range of sensor nodes. An exclusive energy efficient modulation technique is used for all the sensor nodes that are known as homogeneous modulation. While, diverse modulation techniques are used for the sensor nodes that are known as heterogeneous modulation. In this paper, for transmitting information among sensor nodes, different modulations such as 16-PSK, 16PAM, 16-QAM and 16-FSK along with convolutional, Golay and RS codes are employed to improve the energy efficiency and the lifetime of sensor nodes in different channels. In this work QoS Enhanced Base station controlled dynamic clustering architecture is considered as clustering structure of WSN.


2363

Existing Work

Performance analysis of various modulation schemes such as BPSK, MSK and QAM in AWGN channel environment is analyzed. In particularly energy efficient sensor data transmission approach is proposed to improve lifetime in faults tolerant WSNs for landslide area monitoring. The modulation schemes are compared based on their energy consumptions at their transceiver node. It furthermore identified the distance based appropriate modulation schemes to improve the energy efficiency of WSN [4]. The role of BPSK and QPSK modulation in MIMO OFDM System is studied with and without adaptive beam forming in the AWGN channel, Rician and the Rayleigh fading channel. The simulation results show that the BER (bit error rate) performance of the system is improved by the combination in BPSK with adaptive beam forming method in AWGN channel [5],[9]. Investigating the effect of communication channels of IEEE 802.16 OFDM based WiMAX Physical Layer and the performance measures of BER versus the ratio of bit energy to noise power spectral density (Eb/No) is performed in earlier work. The work furthermore demonstrated that AWGN channel has better performance than Rayleigh and Rician fading channels [6]. The evaluation of performance of transmission modes by calculating the probability of BER versus the Signal Noise Ratio (SNR) for different modulations like 16QAM, 64QAM, 16DPSK and 64DPSK under the wireless channel models (AWGN, Rayleigh and Rician) is carried out.


M. Sheik Dawood, G. Athisha

The research further concluded that 16-QAM is performing better than 64-QAM [7]. Various adaptive modulation such as BPSK, QPSK, 16-QAM, 64-QAM is evaluated with convolutional coding for all-participate amplify-and-forward relay network systems to achieve a better overall system performance where, the two source nodes chosen the appropriate modulation and coding scheme based on the information from the feedback channels to ensure that the block error rate (BLER) of the relay system. The result finally derived expressions for the system performances over Rayleigh Channels [8]. In digital communication the Reed-Solomon code is used to encode the data stream. The performances are evaluated by applying to a binary PSK modulation scheme in symmetric AWGN channel. We concluded that the BER performance is improved as the code rate is decreased and the simulations also showed that the BER performance is also improved for large block lengths. [7]. An approach is developed in finding suitable error control codes for WSNs. Also several simulations are taken considering different error control codes (i.e. RS codes and Bose, Chaudhuri, and Hocquenghem (BCH) codes) and the result showed that the RS (31, 21) fits both in BER and power consumption criteria [11]. Optimal hop distance estimation which is used to find the minimum number of hops required to relay a packet from one node to another node in a random network by statistical method is proposed. The energy consumption and latency are calculated from the minimum number of hops [12], [13]. The energy efficiency of LT codes with Non-Coherent M-ary FSK (NC-MFSK), known as green modulation in a proactive WSN over Rayleigh flat-fading channels with path loss is measured. The results proved that LT codes are beneficial in practical low-power WSNs with dynamic position sensor nodes [15]. Focus on the improvement of life time of each cluster of sensors in hierarchical WSN using optimization techniques at the physical layer and how the locationaware selection of the modulation schemes for sensors can affect their energy efficiency was proposed. Furthermore, the work analyzed how the energy in the network can be distributed more evenly by proper selection of the modulation schemes for different sensors. Moreover the work analyzed on how certain physical layer attributes can affect both the lifetime and the end-to-end delay in a hierarchical WSN. A heterogeneous modulation scheme has been presented and reported its impact on the spatial distribution of energy dissipation and the resulting network lifetime. In addition that the need of heterogeneous modulation is studied this affects the power efficiency and bandwidth efficiency of the different modulation schemes [12], [16], [17]. The study on the performance analysis of various error control codes in terms of their BER performance is explained. It furthermore studies the power consumption on different platforms by transmitting randomly generated data through a Gaussian channel. Based on the


study and comparison of the three different error control codes such as BCH code, RS code and convolutional code, identified that binary-BCH codes with ASIC implementation are best suitable for WSNs [18]. The energy performances of uncoded MPSK, MQAM, and MFSK modulations is evaluated in both AWGN and Rayleigh fading channels and the performance of modulation techniques compared for very short-range communication. The results proven that M-QAM is more energy efficient than other modulation schemes [19]. A general formula is derived for the calculation of lifetime of WSNs which holds the network model including network architecture and protocol, data collection initiation, lifetime definition, channel fading characteristics, and energy consumption model. It also referred to an approach that enables the sensor network protocol to maximize the minimum residual energy across the network in each data collection [20]. The application of AMC for 3rd Generation (3G) wireless systems is studied. A new method for selecting the appropriate modulation and coding scheme that including 16QAM, 8PSK, BPSK with turbo codes is also proposed. According to the estimated channel condition and taken a statistical decision making approach to maximize the average throughput of the communication system [21].

III. About Proposed Work Energy is insufficient resource that must be exploited properly because it is impractical to recharge each node so it must be energy efficient as possible as. This paper extends the work of earlier research to get better the energy efficiency of clustered WSNs by examining performance studies of homogeneous and heterogeneous modulations along with error control codes. This work considers four different modulation types frequently employed in wireless communications; 16PSK, 16PAM, 16QAM and 16FSK along with convolutional, Golay and RS codes under AWGN and Rayleigh channels and derive an energy minimization scheme for node communications in sensor network. Amongst these four modulation types, 16FSK with Golay codes in AWGN and 16QAM with Golay codes in Rayleigh channel has been a preferable choice for WSN due to its constant peak-to-average power ratio of transmitting signal and low complexity that allows the use of incoherent detection at the receiver. Numerical analysis of the minimum energy, considering both transmit signal and circuit, spent per information bit are carried out in this work. Our aim is to derive a suitable modulation and coding format and the optimum parameters that achieve minimum energy consumption for a given distance between nodes. In this paper, QoS Enhanced Base Station Controlled Dynamic Clustering Protocol (QBCDCP) WSN architecture is adapted to analyze performance of modulations and coding. Example of WSN model is shown in Fig. 1. The network model considered in this paper is as follows:


2364


i. The sensor network consists of several sensor nodes in Spreaded manner in an error prone sensor field. The operational situation is illustrated in Fig. 1, here the sensor field is a square area and it is based station located a distance from sensor field. ii. The sensor nodes are provided with limitations on battery power, processing power and memory space. iii. The sensor nodes are permanent and are connected physically in clusters. The nodes in a cluster may carry out two functions: cluster head or sensing. Every cluster head carries out the intra cluster and inter cluster communications, data collection and data forwarding to the Base station with the help of multihop routing. On the other hand, a cluster head node cannot carry out the sensing process. The purpose of cluster head is rotated between the non-sensing nodes in a cluster. Alternatively, a sensing node will be dynamically sensing the event or in the inactive mode if it is not sensing. The nodes to be used for sensing events are decided by the Base station. iv. The data sensed by the sensing nodes in a cluster are broadcasted straightly to their cluster head that afterwards aggregates and/or pass the data to another cluster head which will direct it to the Base station. Different from the sensor nodes, the Base station is not with the restricted resources. Thus the communication from the Base station to the sensor nodes can be performed straightforwardly. v. The Base station has information about the position of every node in the network that is situated within the sensor field [11][29]. III.1. Energy Efficient Clustering Architecture for Performance Analysis of Modulation and Coding QoS Enhanced Base Station Controlled Dynamic Clustering Protocol (QBCDCP) architecture is adapted for this work. The clustering methods accessible here to realize satisfactory QoS and lesser depletion of energy may also be suitable to the construction of protocol handling with homogenous and heterogeneous Networks. QBCDCP is an enhanced technique of BCDCP with the recently added functionality of QoS based route selection. QoS is maintained in QBCDCP by including delay and bandwidth detail in route selection. Every data round, specified by a fixed time interval, the Base station groups the sensor nodes into balanced clusters with the help of LEACH [14][29]. III.2. Sensor Node Energy Model Fig. 2 shows a sensor energy model used for this work. This model used in this work has been widely taken on in some studies. Here the energy spent by the transmitter is only considered, the circuit has three modes of operation: on, transient, and sleep. The ‘on’ state is used for the transmission of information. The sleep state is used for saving energy. The sleep state has a very small power consumption compared to the other states.


Fig. 1. WSN Model

Fig. 2. Sensor node Energy Model

The primary purpose of the work is to in particular the minimization of the energy spent in the ‘on’ state [26], [27], [28]. Energy consumption in WSN is mainly divided into two parts, based on energy consumption for the processing, computation and transmission of collecting data. The energy required for data transmission will be more compared to data collected. The distance between sensor nodes and cluster head, the distance between CH and Base stations are playing important role in the energy efficient transmission of sensor data. As the distance between resources field to sink plays a vital role in energy consumption, the sensor nodes that transmit data over a long distance will drain energy soon. Reducing the node transmission radius will lead to less energy consumption to calculate and find the distance. The BSCDCP is taken as energy efficient clustering for this scenario [11][22][29]. III.3. Calculation of Eb/No Calculation of Eb/No theoretically explained by T.S. Rapparport and Proakis with the various digital modulation schemes its performance over different channel conditions [22], [23]. An energy efficient modulation scheme for WSNs is chosen by the aid of BER Vs Eb/No plot using MATLAB. In this paper, BER is taken as 10-1 and the corresponding Eb value is found from the graph of BER Vs Eb/No (dB).

IV.

Proposed Work

The network lifetime depends on the lifetime of each cluster. In this study, we try to poise the dissemination of energy consumption within each cluster. Assume that


2365


cluster head has access to larger energy source than the sensor nodes, so we focus on the energy consumption of the sensors due to data transmission to cluster head. To increase a cluster’s lifetime, the energy consumption in each sensor should be reduced. Energy dissipation due to data transmission is a large percentage of the overall energy consumption within the sensors. To preserve the sensor node energy, we need to choose the appropriate transmission model and it should be implemented in the energy efficient clustering protocol. Hence this work analyzes the performance of energy efficient transmission with robust modulation and coding in QBCDCP, to reduce energy consumption and increase the sensor node lifetime In the process of transmitting sensory events, Modulation scheme is considered across the network and the levels adjusted to achieve lower required energy per bit. However, scaling to lower energies will result in an increase in BER. In scenario one, a common modulation scheme is considered for all sensor nodes and the target bit rate is adjusted to achieve lower energy per bit. Another approach is to consider a heterogeneous distance aware modulation scheme where different nodes may use different modulation schemes under the same BER constraint. In this scheme, energy consumption distribution within a cluster is balanced by using various modulation schemes for different nodes within a cluster. Homogeneous and heterogeneous modulation schemes such as 16-PSK, 16-PAM, 16-QAM and 16-FSK along with convolutional, Golay and RS codes under AWGN and Rayleigh channel are analyzed to improve the energy efficiency and lifetime of the sensor nodes in a WSN. The work is divided into following components: the choice of selecting the energy efficient clustering architecture; finding out the distance between CH and event detection node in energy efficient clustering architecture; Optimal Selection of modulation and coding technique with energy efficient error control codes and performance analysis of modulation and coding in AWGN and Raleigh channel conditions with respect to the QBCDCP. IV.1

Choice of Selecting the Energy Efficient Clustering Architecture

The simulation result shows that QoS Enhanced Base station controlled dynamic protocol clustering technique results in more number of live nodes when compared to the existing cluster based routing technique for wireless sensor network. To achieve higher energy consumption in sensor nodes and cluster, we deploy the adaptive energy efficient transmission scheme with modulation and coding technique into this clustering architecture. It furthermore reduces the energy consumption and increase the cluster lifetime [11]. The BSCDCP has been proven a veracious choice for WSNs over the traditional cluster based routing protocol LEACH.


IV.2

Finding Out the Distance between Cluster Head and Sensing Node

Consider a sensor field with several sensors; all sensor nodes are equipped with software enabled radio. An entire sensor field divided into several clusters. It consists of two types of nodes one is for event detection and another for a CH. The information collected by CH transfers to Base station. Here Base station controlled clustering protocol help the cluster head to identify the position of the sensor nodes in a cluster. According to information received from the Base station, cluster heads transceiver works adaptively selects the modulation based on distance between nodes and transmit the information Base station and to other nodes. IV.3. Selection of Modulation Technique with Energy Efficient Error Control Code Modulation scheme is considered across the network and the levels adjusted to achieve lower required energy per bit. One common modulation scheme is considered and the target bit rate is adjusted to achieve lower energy per bit. This approach leaves lots of energy shortage in node lifetime. Hence another approach is to consider a dissimilar position based modulation scheme where different nodes may use different modulation schemes under the desired BER value. In position-aware modulation scheme, energy consumption distribution within a cluster is balanced by using various modulation schemes for different nodes within a cluster.

V.

Performance Analysis of Modulation and Coding in Various Channel Conditions

Modulation schemes such as 16-PSK, 16-PAM, 16QAM and 16-FSK along with convolutional and block codes are analyzed to improve the lifetime of the sensor nodes in a WSN. The codes convolutional and block codes are used with modulations schemes to study the performance of WSN. V.1.

Performance Study in AWGN Channel

In this scenario, under AWGN channel 16QAM modulation has been used in all sensors within 40 meters from their assigned cluster head node. 16QAM modulation is used for centrally located sensors that are within 40 meters from their assigned cluster head node. The rest of the sensors that are located more than 40 meters from the cluster head node use the 16FSK modulation. This is an example implementation of our proposed energy efficient modulation and coding selection. Scenario one: A unique power efficient modulation and coding technique are used for all the sensor nodes and following steps are performed,


2366


Analogous modulation techniques used for all deployed sensor nodes in QBCDCP, employ a desired BER value, Convolutional and Golay and RS codes are applied with modulation techniques and Performance of 16-PSK, 16PAM, 16-QAM and 16-FSK modulation and coding technique is studied in an AWGN channel environment and its performance compared with other modulation techniques. In this scenario 16QAM modulation has been used in all sensors within specified distance from their assigned cluster head node. Regardless of simplicity, the use of the same modulation scheme over the entire network will unfavorably affect the energy efficiency of the network and subsequently the network lifetime will be reduced. The performance of 16QAM and coding is studied for different distance between nodes and it also compared with the performance of other modulation techniques. In this scenario distance between nodes are considered as 5m, 10m, 40m and 60m. Scenario two: Different modulation techniques used based on the position of the node for all deployed sensor nodes in QBCDCP, make use of a desired BER value, and apply error control codes with modulation techniques and performance of transmission scheme is studied in an AWGN channel environment. V.2.

Performance Study in RALEIGH Channel

In Rayleigh channel, 16-FSK modulation has been used in all sensors within 40 meters from their assigned cluster head node. 16-FSK modulation is used for centrally located sensors that are within 40 meters from their assigned cluster head node. The rest of the sensors that are located more than 40 meters from the cluster head node use the 16-QAM modulation. Scenario one: A unique power efficient modulation and coding technique are used for all the sensor nodes and following steps are performed, Analogous modulation techniques used for all deployed sensor nodes in QBCDCP, utilize a desired BER value, apply error control codes with modulation techniques and Performance of 16-PSK, 16-PAM, 16-QAM and 16-FSK modulation and coding technique is studied in the Raleigh channel environment. In this scenario 16-FSK Modulation has been used in all sensors within specified distance from their assigned cluster head node. The performance of 16-FSK and coding is studied for different distance between nodes and it also compared with the performance of other modulation techniques. In this scenario distance between nodes are considered as 5m, 10m, 40m and 60m. Scenario two: different Modulation and Coding Techniques for Sensor Nodes and following steps are performed; Different modulation techniques used based on the position of the node for all deployed sensor nodes in QBCDCP, exercise a constant desired BER value and performance is studied by applying error control codes with modulation scheme in Raleigh channel environment.


There is also a trade-off between modulation power efficiency versus the receiver complexity. Therefore, using the most power efficient modulation for all sensors in the network may not be desirable. After this different modulation techniques with the position based scheme are used for the event detection nodes. In this scenario 16-QAM and 16-FSK modulation is used for centrally located sensors that are within 40 meters from their assigned cluster head node. The rest of the sensors that are located more than 40 meters from the cluster head node use other type of modulation techniques. This is an example implementation of our proposed energy efficient transmission scheme .The performance of 16-QAM, 16FSK and other modulation techniques applied for sensor nodes with error control code are analyzed for better performance of the WSN. V.3.

Calculation of Transmitting Energy per Bit

For any modulation system, the BER can be epitomizing as a function of Eb/No which is the ratio of the energy per bit to the noise power spectral density. For a given Eb/No there can be a large difference between the required BER of different modulation schemes and vice versa. [18].The choice of digital modulation scheme will significantly affect the characteristics, performance and resulting physical realization of a communication system. An energy efficient modulation scheme for WSNs is chosen by the aid of BER Vs Eb/No plot using MATLAB. Here BER is taken as 10-1 and the corresponding Eb value is found and its value used to calculate the transmission energy per bit. The performance comparison of modulation with and without coding for AWGN channel and Raleigh channel is shown in the following tables. The transmission energy per bit is calculated by the equation of a log-distance path loss model, the required energy per transmitted bit in the ith sensor node may be written as: ()=

·

·

4

( ),

( ),

(1)

where, KTx is a constant coefficient,. Eb is the needed energy per bit of the receiver. de(i),i and βe(i),i denote the distance and the path loss exponent between sensor i and its assigned cluster head node. βe(i),i depends on the environment ,whereas λ denotes the signal wavelength. V.4.

Calculation of Network Lifetime

Sensor Node lifetime is calculated using the formula shown in Eq. (2). The network lifetime is defined as the time span from the sensor deployment to the first loss of coverage [21]. It is also defined as the time span from the deployment to the instant when the network is considered nonfunctional, for example the instant when the first sensor


2367


dies, a percentage of sensors die, the network partitions, or the loss of coverage occurs [17] and they derived the average network lifetime as: []= [ ]=

[

] [

+

[ [

] ]

]

(2) (3)

where, S is the total number of sensors, E0 is the initial energy, E[Ew] is the expected wasted energy (i.e., The total unused energy in the network when it dies), and E[Er] is the expected reporting energy consumed by all sensors in a randomly chosen data collection.

VI.

Simulation Study

In this study, 500 sensors placed using a twodimensional uniform distribution in a 100×100 m2 field. The sensors are distributed into clusters with cluster head nodes placed in the center of each cluster. The packet size is set to 128 bytes. The initial battery energy level of each sensor is 100J. The path loss exponent is set to 3. By using the Eq. (3), the network lifetime is calculated for different modulation schemes with error control codes. Figs. 3, 4, 5 and 6 show the lifetime (days) at BER = 10-1 for different modulations along with convolutional, Golay and RS codes at different distances like 5m,10m, 40m and 60m under the AWGN channel. It is clear that the value of Eb/No (dB) is same for convolutional and Golay codes under Rayleigh channel. The result shows that 16FSK with Golay codes in AWGN and 16QAM with Golay codes in Rayleigh channel is more energy efficient than other combination of modulation and coding techniques. It is also observed that AWGN channel is more efficient than a Rayleigh channel. Figs. 7, 8, 9 and 10 show the lifetime (days) at BER = 10-1 for different modulations along with convolutional, Golay and RS codes at different distances like 5m, 10m, 40m and 60m under Rayleigh channel. Figs. 11 and 12 show the comparison of modulation and coding in AWGN & Rayleigh channels. Tables I and II show the average remaining energy and the network lifetime for both the situation with Golay and RS code on AWGN channel. Tables III and IV show the average remaining energy and the network lifetime for both the situation with Golay and RS code on Rayleigh channel. Tables V, VI, VII and VIII show the average remaining energy and the network lifetime for both the homogeneous and heterogeneous modulations with RS, Golay code in AWGN channel and Raleigh channel.

VII.

The performance of various combination modulation schemes along with error control codes studied to identify the suitable energy efficient information transmission method in a clustered WSN. The simulation and mathematical result shows that 16FSK with Golay codes in AWGN and 16QAM with Golay codes in Rayleigh channel is more energy efficient than other combination of modulation and coding techniques. The future work is to determine the energy efficient approaches to further improvement in sensor node lifetime and to increase the overall performance of transmission schemes for energy efficient clustered mobile WSN.

Fig. 3. Lifetime for Different modulation and coding at d=5m for AWGN channel

Conclusion and Future Work

This paper presents how the energy efficient transmission method helps to improve the longevity of WSN.




2368


Type of Modulations 16PSK

16PAM

16QAM

16FSK


16PAM

16QAM

16FSK


16PAM

16QAM

16FSK


16PAM

16QAM

16FSK

TABLE I LIFETIME (DAYS) FOR DIFFERENT MODULATION AND CODES UNDER AWGN CHANNEL AT DISTANCE D=5&10 d=5m Type of Codes Eb/No (dB) eTx (J) Lifetime (Days) eTx (J) Convolutional 4.5 1.11 44956 8.91 Golay 4.5 1.11 44956 8.91 RS 11 2.72 18347 21.8 Convolutional 8.5 2.11 23650 16.8 Golay 8.5 2.11 23650 16.8 RS 13 3.22 15498 25.8 Convolutional 3.4 0.84 59406 6.74 Golay 3 0.74 67433 5.94 RS 10 2.48 20122 19.8 Convolutional 2.9 0.72 69307 5.74 Golay 2.5 0.62 80485 4.95 RS 8 1.98 25203 15.8

d=10m Lifetime (Days) 5601 5601 2290 2971 2971 1935 7405 8402 2521 8694 10082 3159

TABLE II LIFETIME (DAYS) FOR DIFFERENT MODULATION AND CODES UNDER AWGN CHANNEL AT D=40&60 Type of Codes Eb/No (dB) d=40m eTx (kJ) Lifetime (Days) eTx (kJ) Convolutional 4.5 0.57 89 1.93 Golay 4.5 0.57 89 1.93 RS 11 1.39 37 4.71 Convolutional 8.5 1.08 47 3.64 Golay 8.5 1.08 47 3.64 RS 13 1.65 31 5.56 Convolutional 3.4 0.43 117 1.46 Golay 3 0.38 132 1.28 RS 10 1.27 40 4.28 Convolutional 2.9 0.37 136 1.24 Golay 2.5 0.32 157 1.07 RS 8 1.01 51 3.42

d=60m Lifetime (Days) 27 27 12 15 15 10 35 40 13 41 48 16

TABLE III LIFETIME (DAYS) FOR DIFFERENT MODULATIONS AND CODES UNDER RAYLEIGH CHANNEL AT DISTANCE D=5 & 10M d=5m Type of Codes Eb/No (dB) eTx (J) Lifetime (Days) eTx (J) Convolutional 7 1.73 28845 1387 Golay 7 1.73 28845 13.87 RS 13 3.22 15498 25.8 Convolutional 12 2.98 16746 23.8 Golay 12 2.98 16746 23.8 RS 13.5 3.34 14941 26.7 Convolutional 5 1.24 40243 9.91 Golay 5 1.24 40243 9.91 RS 12 2.98 16746 23.8 Convolutional 6 1.49 33491 11.9 Golay 6 1.49 33491 11.9 RS 12 2.98 16746 23.8

d=10m Lifetime (Days) 3599 3599 1935 2098 2098 1870 5036 5036 2098 4194 4194 2098

TABLE IV LIFETIME (DAYS) FOR DIFFERENT MODULATIONS AND CODES UNDER RAYLEIGH CHANNEL AT DISTANCE D=40 & 60M d=40m Type of Codes Eb/No (dB) eTx (kJ) Lifetime (Days) eTx (kJ) Convolutional 7 0.89 57 2.99 Golay 7 0.89 57 2.99 RS 13 1.65 31 5.56 Convolutional 12 1.52 34 5.14 Golay 12 1.52 34 5.14 RS 13.5 1.71 30 5.78 Convolutional 5 0.63 80 2.14 Golay 5 0.63 80 2.14 RS 12 1.52 34 5.14 Convolutional 6 0.76 67 2.57 Golay 6 0.76 67 2.57 RS 12 1.52 34 5.14

d=60m Lifetime (Days) 18 18 10 11 11 10 24 24 11 20 20 11



2369


TABLE V AVERAGE REMAINING ENERGY AND THE NETWORK LIFETIME FOR BOTH THE HOMOGENEOUS AND HETEROGENEOUS MODULATIONS WITH GOLAY CODE IN AWGN CHANNEL Homogeneous modulation with Golay Heterogeneous modulation with Golay code (16code (16-QAM) QAM & 16-FSK) Average remaining energy over the initial 25% 98.125% energy after 160 days Lifetime 40 days 157 days TABLE VI AVERAGE REMAINING ENERGY AND THE NETWORK LIFETIME FOR BOTH THE HOMOGENEOUS AND HETEROGENEOUS MODULATIONS WITH RS CODE IN AWGN CHANNEL Homogeneous modulation with RS code Heterogeneous modulation with RS code (16-QAM) (16-QAM & 16FSK) Average remaining energy over the initial 8.125% 31.875% energy after 160 days Lifetime 13 days 51 days TABLE VII AVERAGE REMAINING ENERGY AND THE NETWORK LIFETIME FOR BOTH THE HOMOGENEOUS AND HETEROGENEOUS MODULATIONS WITH GOLAY CODE IN RAYLEIGH CHANNEL Homogeneous modulation with Golay code Heterogeneous modulation with Golay code (16-FSK) (16-FSK & 16-QAM) Average remaining energy over the initial 12.5% 50% energy after 160 days Lifetime 20 days 80 days TABLE VIII AVERAGE REMAINING ENERGY AND THE NETWORK LIFETIME FOR BOTH THE HOMOGENEOUS AND HETEROGENEOUS MODULATIONS WITH RS CODE IN RAYLEIGH CHANNEL Homogeneous modulation with RS code Heterogeneous modulation with RS code (16-FSK) (16-FSK &16-QAM) Average remaining energy over 6.875% 21.25% the initial energy after 160 days Lifetime 11 days 34 days


Fig. 7. Lifetime for Different modulation and coding at d=5m for Rayleigh channel





2370


References [1]

[2]

[3]

[4]


[5]

[6]

[7]

[8]


[9]

[10]

[11]

[12]

Fig. 11. Lifetime (days) for Different modulations with Golay code under AWGN & Rayleigh channel [13]

[14]

[15]

[16]

Fig. 12. Lifetime for Different modulations with RS code under AWGN & Rayleigh channel

[17]


Narendrakumar, A., Thygarajah, K., Cooperative fuzzy based high quality link routing in wireless sensor networks, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 2987-2992. Shivaprakasha, K.S., Kulkarni, M., Energy efficient routing protocols for wireless sensor networks: A survey, (2011) International Review on Computers and Software (IRECOS), 6 (6), pp. 929-943. Faheem, M., Din, Z.U., Shahid, M.A., Ali, S., Raza, B., Sakar, L., Energy based efficiency evaluation of cluster and tree based routing protocols for wireless sensor networks, (2013) International Review on Computers and Software (IRECOS), 8 (4), pp. 1012-1022. M. Sheik Dawood, Sajin Salim, S. Sadasivam and G. Athisha ,Energy Efficient Modulation techniques for Fault Tolerant TwoTiered Wireless Sensor Networks, Journal of Asian Scientific Research, vol.2 ,n.3, pp.124-131,2012. Suchita Varade, Kishore Kulat ,BER Comparison of Rayleigh Fading, Rician Fading and AWGN Channel uses Chaotic Communication based MIMO-OFDM System,International Journal of Soft Computing and Engineering (IJSCE) , Vol.1,n.6, January 2012. Nuzhat Tasneem Awon, Md. Mizanur Rahman, Md. Ashraful Islam and A.Z.M. Touhidul Islam, Effect of AWGN & Fading (Raleigh & Rician) channels on the BER performance of a WiMAX communication system (IJCSIS), International Journal of Computer Science and Information Security, Vol. 10, n. 8, 2012. Sudhir Babu and Dr. K.V Sambasiva Rao ,Evaluation of BER for AWGN, Rayleigh and Rician Fading Channels under various modulation schemes, International Journal of Computer Applications (0975 – 8887) ,Vol 26,n..9, July 2011. Kun Yang and Lingyang Song, Adaptive Modulation and Coding for Two-Way Amplify-and-Forward Relay Networks, in proceedings of IEEE International Conference on Communications, China: CTS, 2012. M .Sheik Dawood, R.Aiswaryalakshmi, R. Abdul Sikkandhar and G.Athisha,A survey on energy efficient modulation and coding techniques for wireless sensor network, Journal of Global Research in Computer Science, Vol 4, n.. 1, pp.63-66, 2013. Sanjeev Kumar and Ragini Gupta, BER Analysis of ReedSolomon Code for Efficient Communication System, International Journal of Computer Applications, vol.30,n.12 ,pp. 0975-8887, 2011. Mohammad Rakibul Islam, Error Correction Codes in Wireless Sensor Network: An Energy Aware approach, International Journal of Computer and Information Engineering, vol.4, n.1, 2010. M .Sheik Dawood, R.Aiswaryalakshmi, R. Abdul Sikkandhar and G.Athisha ,A Review on Energy Efficient Modulation and Coding Techniques for Clustered Wireless Sensor Networks, International Journal of Advanced Research in Computer Engineering & Technology, pp.319-322,2013. Padmavathy and Chitra, M.,Performance Evaluation of Energy Efficient Modulation Scheme and Hop Distance Estimation for WSN,International Journal of Communication Networks and Information Security (IJCNIS), vol.2 ,n.1, pp.44-49,2010 Sheik Dawood. M., S. Sadasivam and G. Athisha, Energy Efficient Wireless Sensor Networks based on QoS Enhanced Base Station controlled Dynamic Clustering Protocol, International Journal of Computer Applications, Vol. 13, n.4, pp. 0975 – 8887, 2011. Jamshid Abouei†, J. David Brown, Konstantinos N. Plataniotis and Subbarayan Pasupathy, On the Energy Efficiency of LT Codes in Proactive Wireless Sensor Networks, IEEE transactions on wireless communications, 2009. Maryam Soltan, Inkwon Hwang, and Massoud Pedram, Heterogeneous Modulation for Trading-off Energy Balancing with Bandwidth Efficiency in Hierarchical Sensor Networks, In Proceedings of the International Symposium on a World of Wireless, Mobile and Multimedia Networks, 2008. Maryam Soltan, Inkwon Hwang and Massoud


2371


[18]

[19]

[20] [21]

[22] [23] [24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

Pedram,Modulation-Aware Energy Balancing in Hierarchical Wireless Sensor Networks,In proceedings of 3rd International Symposium on wireless Pervasive Computing,2008. Gopinath Balakrishnan, M. Yang, Y. Jiang, and Y. Kim,Performance analysis of error control codes for Wireless Sensor Networks, In Proceedings of International Conference on Information Technology (ITNG’07), pp.876–879,2007. S. Mukesh, M. Iqbal, Z. Jianhua, Z. Ping, and Inam-Ur-Rehman, Comparative analysis of M-ary modulation techniques for wireless adhoc Networks, in Proceedings of the IEEE 2007 Sensors Applications Symposium, pp.1–6, 2007. Yunxia Chen and Qing Zhao, On the Lifetime of Wireless Sensor Networks, IEEE Communications letters,vol. 9 ,n.11,2005. James Yang, Amir K. Khandani and Noel Tin, Adaptive Modulation and Coding in 3G Wireless Systems, Canada, user Report number UW-E&CE#2002-15, 2002. Proakis, J.G. Digital Communications, (McGraw-Hill Inc, 2007) Rapparport, T. S. Wireless Communications, Principles and Practice, (Prentice Hall, 1996) Bhardwaj, M and Chandrakasan, A.P, Bounding the Lifetime of sensor Networks via Optimal Role Assignments, In Proceedings of the 21st IEEE INFOCOM, 2002. Vivek Mhatre, Catherine Rosenberg, Design guidelines for sensor Networks: communication, clustering and aggregation, International journal of Ad Hoc Networks, Vol. 2, pp. 45-63, 2004. Daniel F. Macedo, Luiz H. A. Correia, Aldri L. Dos Santos, Antonio A. F. Loureiro, José Marcos S. Nogueira and Guy Pujolle, Evaluating Fault Tolerance Aspects in Routing Protocols for Wireless Sensor Networks, published in Fourth Annual Mediterranean Ad Hoc Networking Workshop, 2005. Liu. M., Cao. J., Chen, G., Wang, X,.An Energy Aware routing protocol in Wireless Sensor Networks, Sensors journal, Vol. 9, pp.445-462, 2009. Hussain, S.; Martin, A.W., Hierarchical Cluster based Routing in Wireless Sensor Networks, Journal of Network, Vol. 2, pp. 87-97, 2007. Abraham O. Fapojuwo, Senior Member, IEEE, and Alejandra Cano-Tinoco, Energy Consumption and Message Delay Analysis of QoS Enhanced Base Station Controlled Dynamic Clustering Protocol for Wireless Sensor Networks, IEEE transactions on wireless communications, vol. 8,n.10, 2009. Said Ben Alla, Abdellah Ezzati, A Qos-Guaranteed Coverage and Connectivity Preservation Routing Protocol for Heterogeneous Wireless Sensor Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 363-371. Reza Mohammadi, Reza Javidan, Adaptive Quiet Time Underwater Wireless MAC: AQT-UWMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (4), pp. 236-243. Ghandi Manasra, Osama Najajri, Samer Rabah, Hashem Abu Arram, DWT Based on OFDM Multicarrier Modulation Using Multiple Input Output Antennas System, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (5), pp. 312-320.



Department of ECE, Sethu institute of technology,

2

Department of ECE, PSNA College of Engineering and Technology. M. Sheik Dawood received the Master degree in Digital communication and network from Madurai Kamaraj University in 2002. He is a research scholar of Anna University. Currently, he is an Associate Professor at Sethu institute of technology. His interests are in wireless sensor network research. E-mail: [email protected]

Dr. G. Athisha received the Master Degree in applied electronics from the Anna University, in 1998. She received the Ph.D. degree from the Anna University. Currently, she is a professor and Head of ECE Department at PSNA college of engineering &Technology. Her research interests are Sensor Networks, Network security and Network processors. E-mail: [email protected]


2372


Energy Aware Zone Routing Protocol Using Power Save Technique AFECA Ravi G., K. R. Kashwan Abstract – The Mobile Ad-hoc Networks can be determined by random movement of nodes and absence of both, clock synchronization mechanism and infrastructure between them. Due to its unique characteristics, routing protocol design for MANET is a major challenge. Compact batteries are sources of power supply to these mobile stations .hence the objective is to produce energy efficient routing protocol without compromising on the quality of sevice.In this paper a novel routing protocol is proposed, that integrates the Adaptive Fidelity Energy Conserving Algorithm with Zone routing protocol. The parameters per-node energy consumption and packet delivery ratio of both protocols are compared using Ns-2 simulation tool. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: MANET, ZRP, Energy Consumption, AFECA, RAS

I.

Introduction

A MANET is an independent network of mobile nodes that does not depend on infrastructure and routing [1]. These kinds of networks are gaining popularity due to its wide applications. Be it the rescue mission during natural calamities or in battle fields where the possibility of establishing infrastructure is null. They are also used by civilians during conferences and meetings for sharing important information’s. With the tremendous increase in the usage of the laptop, smart phones and other mobile devices among civilians. It’s sure that MANET will be successful network establishment in near future. The MANET network is characterized by random movement and absence of centralized control units. Hence each node has to cooperate with its neighbor to perform the functions needed for the very existence of the network [1]-[20]. I.1.

Routing Protocols in MANET

Due to its unique feature, the process of routing the data packets to its destination is complex. Thus the routing protocol plays a major part in maintaining the quality of service based on end to end delay and throughput. The nodes should have information about the dynamic changes of the network. For updating this information and to establish route to the sink node, the routing protocol transmits hello messages. These add to the consumption of energy. Therefore it is obvious that the routing protocol has effect on energy consumption of the network too. The routing protocol [2] available for MANET are classified into three types based on the way, they maintain the routes to other nodes:


2373

1. Proactive routing or table driven routing: This type always maintains path information to all nodes. Examples of such protocols are DSDV, WRP. 2. Reactive routing or on demand routing, where the path information are searched only when needed. Examples are AODV, DSR. 3. Hybrid routing is a combination of proactive and reactive protocols. Examples are ZRP, HARP. One of key challenges of MANET is the efficient use energy resources in battery operated mobile nodes.so the routing protocols play an important role in energy consumption. I.2.

Related Work

Energy efficiency can be achieved by either making the idle nodes to sleep or by controlling the transmit power dynamically. The former is called power save techniques and later is power control techniques. The reactive routing protocols are combined with power save techniques like SPAN, BECA and AFECA [3], [4] to obtain minimized per- node consumption. There are many energy aware routing protocols [5]-[11]. These are energy aware algorithms and are not routing protocols. But they are combined with existing routing protocols to obtain the desired result. Maximum life time routing [12] aims to increase the overall life time of nodes in the network, balancing the energy consumption across the nodes. The best route is decided based on the total energy consumption in transmitting the packets. The combination of AFECA/SAPN [13] with AODV resulted in low per-node consumption. In AFECA more number of retransmissions are done when destination node is in sleep state. It consumes more bandwidth and cause collisions because of retransmissions. In addition SPAN elects a Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Ravi G., K. R. Kashwan

coordinator from the network in time and then rotates. So packets drop may occur during coordinator changes. To avoid this, the source node repeats transmission to the sleeping node till acknowledgement is received. This repeated transmission of data leads to more energy consumption. In case of a route request or route reply, route establishing time is increased.

H

K B

A G

I S

II.

Proposed Routing

In this proposed paper a hybrid routing protocol is applied instead of AODV. The choice is Zone Routing Protocol (ZRP), utilizes reactive routing for intra zone and also proactive routing for inter zone. The drawback of AFECA is overcome by utilizing simple circuitry based on RF technology that can activate a sleeping node.

C

E

L

D F J S-Reference Node

Peripheral node Nodes out of zone

II.1.

Zone Routing Protocol

ZRP is a hybrid routing protocol [14], [15] wherein the whole process of routing is carried by two process: 1.intra routing protocol (IARP) uses reactive routing for data detained to nodes within the zone and 2. Inter routing (IERP) uses proactive for data forwarding to the destination outside the zone. The zone is specified by the radius (r) which is hop count and every node has its own zone. For a radius of 2, the nodes that are reachable with maximum two hops constitute the zone of that particular node and zones overlapping exist. The nodes are classified into peripheral and interior nodes with respect to the zone. The nodes that are at less than ‘r’ hop distance are called interior nodes and the nodes can be reached with a minimum of ‘r’ hops constitute peripheral nodes. Fig. 1 shows an example of routing zone for node S with radius of 2. Hence all the nodes that are reachable with maximum two hops represent the nodes within the zone which is marked by the dotted border. The node from A to J belongs to routing zone of S, but nodes K, L fall outside the routing zone. The nodes F to J are peripheral nodes because hop count is equal to 2, the zone radius. Here there are totally 5 peripheral nodes. Other nodes are outside the zone of the reference node. Bordercasting prevents the unwanted flooding of messages to all neighbor nodes, instead border cast to all peripheral nodes. Bordercast resolution protocol constructs a tree to decide the route for query request. The query control mechanism controls the quires to area of network that are not previously covered. Neighbor Discovery Protocol (NDP) is a part of IARP, used to detect neighbor nodes. NDP transmits ‘HELLO’ messages at regular interval and each node updates its neighbor table, while receiving a message. Neighbors not receiving message for specified time interval, are deleted from the table. When a node is ready to transmit a packet, it first checks whether the destination is within zone or not.


Fig. 1. Reference node with its Zone

If it is in zone, the route information is readily provided by the IARP. In case the destination is outside zone, IERP is started the route discovery process for route establishment. The route request is flooded by using bordercasting and query control mechanism. If a node has the route to the destination, the source will receive route reply message from the node. Adaptive fidelity Energy Conserving Algorithm [13] is an improvement over the already existing BECA. In BECA the sleep time is constant, but in AFECA it varies depending upon the node density. AFECA has three different states: 1.active, 2.listen and 3.sleep. The each node switches between different states depending upon the successful forwarding of messages. If the node is idle for a particular interval, it will go to sleep state. The node uses time interval Ta, Tl and Ts values which represent the active, listen and sleep states. Thus the sleep time of AFECA is dynamically adapted based on its neighbors, which is given by: Tsa = Ts× Random (1,N) where Tsa is adaptive sleep time and N is number of neighbors. The larger sleep time will consume lower energy. Based on RF technology [16],[17], we incorporate a remote activated switch (RAS) , to activate the node which is in sleeping state. Instead of the node getting wake up periodically to check the pending traffic, nodes wake up by RAS when necessary. Fig. 2 shows the block diagram of remote activated switch. While the node enters idle, it goes to sleep state, so that the transceiver and electronic part is turned off. The signals are passed to logic circuit; the wake up signals will be obtained and then modulated by RAS. These signals will compare the sequence with received sequence. If matches with the node sequence, it turn on the transceiver part of corresponding node. The


2374


power source for RAS may be supplied by battery, which resides in the electronic unit. It can explained with comparing Fig. 4(a) and Fig. 4(b) here let scenario bet that ,node N1 has data to forward to node N2, which in turn has to forward to node N3. All the nodes transmit hello messages at periodic interval which is equal to beacon interval and the hello messages are discarded after ‘r’ hops. Fig. 4(a) represents the normal timeline of the network without RF tag. To incorporate the AFECA into standard procedure, the duration of Active Window (AW) is set equal to the Tl value of AFECA and beacon interval is made equal to sum of Tl and Ts where Ta is independent. Separate timers are used to monitor routines of nodes where beacon is monitored by TSF timer. This timer is set at the initial stage of the network establishment to obtain the synchronization of the nodes present in the network. Fig. 4(b) shows this novel protocol designed with the primary aim of reducing energy consumption. The node N1 sends ATIM (Announcement Traffic Indication Message) to node N2 and receives ACK for the same in the ATIM window. Node N1 and node N2 stay active as they need to take part in the data transfer, while Node N3 goes to sleep mode. Once the entire data is received without error, node N2 sends ACK. This entire cycle ends at a position somewhere at the middle of the power saving state as shown in Fig. 4(a). Now Node N2 wants to transmit data to node N3.But node has to wait till the next TBTT, because the node N3 is currently in sleeping state. This condition leads to latency and decrease in the throughput too, as the node N2 retransmits the RTS. As we use AFECA, the possibility of nodes sleeping and the retransmission of RTS messages are high. The Node N3 sends an ACK to the Node N2 indicating its return to active state. The Node N2 starts to send the data to Node N3. Thus by using the RF switch the latency is avoided and the throughput is also increased.

The time saved is clearly shown in Fig. 2, which is much longer. Fig. 3 gives a pictorial representation of this novel protocol designed with the primary aim of reducing energy consumption. However to validate the actual performance of this novel protocol, the energy consumption should be reduced without a drastic reduction in the throughput and overhead, otherwise it is futile.

III. Algorithm The routing protocol can be represented by the steps as shown in Table I. In this algorithm step numbers are used to indicate the looping of the functions. The step 5.1 is used for waking up the sleeping node by using the RF circuit.

Receiver

Power status Electronics Battery

Transceiver

Fig. 2. Block diagram of remote activated switch ROUTING LAYER LOGICAL LINK LAYER MAC/PHY LAYER

ZRP AFECA with Wake up signal IEEE 802.11 with periodic wake up

Fig. 3. Detailed OSI layers AW AW

AW

ATIM

N

Logic State

DATA

1

ATIM

N

2

ACK

ACK

No action leading to latency and decrease in throughput

N

3

ACK

Fig. 4(a). Without Remote activated switch



2375


AW

AW

ATIM

N

N

DATA

1

ATIM

WAKE UP SIGNAL 2

ACK

ACK

ACK

N3 ACK

Fig. 4(b). With Remote activated switch

1. 2.

3. 4. 5.

TABLE I ALGORITHM NDP determine the neighbors , Zone and Updates its routing information. If traffic is available for a particular node 1) Node in active mode 2) If node is destination a) Accept and send ACK 3) Else a) IF destination inside zone, use IARP to deliver the packet b) Else use IERP and border casting to find the route to destination and then deliver the packet. 4) After Ta sec Go to Step 2. Else, change node state to listen mode after Tl sec. If traffic is available for node? 1) Return to step 2.1 Else change node state to sleep 1) If traffic is available for a sleeping node, use wake up signal to activate it remotely [step 2.1]. 2) Else, return to listen state [step 4] after Ta s.

These two parameters have an opposite effect. Thus an increase in energy efficiency pulls down the PDR value and vice versa. In addition throughput and overhead are also compared. Figs. 5, 6, 7 and 8 compare average consumed energy, PDR, throughput and normalized overhead respectively. IV.1. Energy Efficiency

III.1. Simulation Setup The simulations are done by Ns-2 software. ZRP and EAZRP protocols are compared. The parameters average consumed energy, PDR, throughput and normalized overhead are compared. Simulation area1000×1000 m2 Transmission range covers up to 250 m, Maximum number of hops to reach anywhere is 12. Numbers of Nodes, the parameters are taken by varying the number of nodes from 35 to 65 in steps of 10. Radius 2. Movement nodes follows random way point model with speed at 5 m/s. Traffic CBR. Packet size 2000 kbytes. Packet interval 0.1 s. Simulation is run for 200 s and results are provided by 20 simulations. The listen, sleep and initial active time are 1s, 5s, and 30s respectively.

IV.

Results

The main parameters to be examined for evaluating a routing protocol are energy efficiency and PDR.


From Fig. 5, we infer that the average consumed energy has lower values for EAZRP than ZRP alone. Thus the desired reduction in energy is obtained due to the combined effort of the AFECA, and it is evident that as the radius of zone is increased the energy efficiency decreases. This is due to fact that proactive part of the routing does not aid the AFECA. From this, as radius increases the proactive protocol dominates and consequently the variation in energy consumption is decreased. From Fig. 9, it is evident that unmodified ZRP uses 8% in low and high density environment whereas modified EAZRP uses only 5% of energy consumption. The difference in energy consumption is 3% less for EAZRP. This shows that, the introduction of AFECA in ZRP saves more energy than ZRP. Different energy levels consumed by nodes 35 to 65 for both protocols are tabulated in Table II. Total average energy consumed by this simulation of ZRP is 68.75% and EAZRP 34.78%. IV.2. Packet Delivery Ratio (PDR) It is evident from Fig. 6 that, ZRP maintains delivery ratios of 96% and 73%, whereas EAZRP has delivery ratios of 92% for low density of nodes and 70 % for high density simulations respectively. The Delivery ratio may drop due to the following reasons, 1) For instance when a packet is available for a sleeping node, more retransmissions is required for delivery the packet. This lead to increase in number of retransmissions, thus decrease in the PDR. 2) Collision of the data packets. Moreover the usage of the proactive protocol, increase the overheads as they have to periodically update the topological changes. These overheads consecutively increase the possibility of collision of data packets.


2376


TABLE II ENERGY CONSUMED (IN JOULES) No. of Nodes in network ZRP 35 3.9 45 4.5 55 5.5 65 7.1

EAZRP 3.45 4.3 5.2 6.1

Fig. 7. Comparison of Throughput

Fig. 5. Comparison of Average consumed energy

Fig. 8. Comparison of normalized overhead

IV.4. Performance Measure

Fig. 6. Comparison of PDR

All these reasons cause the throughput to decrease. Fig. 5 shows the PDR for the range of 35 to 65 nodes. The negative influence of the sleeping nodes on PDR is prevailed over by the usage of the Remote activated switch. The switch aides to get rid of the unnecessary retransmission of signals when a node is in sleep state. It first awakes the node using wake up signal and then transmits the data. RAS reduce the number of retransmissions due to which PDR increases. Thus the usage of RAS plays a vital role in maintaining the PDR of the network. IV.3. Throughput Fig. 7, Shows the comparison of throughput for various numbers of nodes. The throughput of the EAZRP is larger than the ZRP. This result is attributed due to the effect of remote activated switch. Fig.8 shows the control overhead of both protocols.


The two most vital parameters of any network are energy efficiency and packet delivery ratio. These two parameters represent two completely different performances of the network i.e. protocol is more energy efficient may have less PDR and vice versa. Hence to do factual comparison of two protocols, performance measure has to be calculated, which is a percentage of the product of the remaining energy and PDR. The remaining energy can be calculated by differences between the consumed energy value and the initial energy of node 100 joules. This new calculated value gives a methodology to directly compare and to evaluate the protocols. From Fig. 9, it is shown that the performance measure for EAZRP is better than ZRP. Another observation about the protocol performance measure is that for less number of nodes the difference between the values of the two protocols is low. As the node increases, the difference is also increased. The reason for this can be attributed to the same trend shown by the energy efficiency. Though the variation in PDR was more or less same for various nodes, it increased for the energy efficiency. Thus the performance measure which is the combination of both these parameters follows the pattern.


2377


[13]

[14]

[15] [16]

[17]

Fig. 9. Performance measure

V.

Conclusion

[18]

We proposed an Energy Aware Zone Routing Protocol by combing zone routing protocol and AFECA .we analyzed our proposed routing algorithm with various parameters and its performances for different number of nodes. Finally it can be concluded that the EAZRP is a promising solution for increasing the energy efficiency and lifetime of the network.

[19]

[20]

Ad-Hoc Networks, IEEE INFOCOM, vol.2 pp.1089 – 1097, 2004. Kristensen, M.Bouvin N, Energy Efficient MANET Routing Using a Combination of Span and BECA/AFECA, Journal Of Networks, Vol. 3, No. 3, March 2008 L.wang, S.Olriu. A Two Zone Hybrid routing protocol for Mobile Ad hoc Networks, IEEE Trans. Parallel and distributed Systems, Vol.15, n.12, PP. 1105-1116, 2004. N.Beijar, Zone Routing Protocol (ZRP).[Online],Available: http://citeseer.ist.psu.edu/538611.html, May 2002 Carla F. Chiasserini, Ramesh R. Rao, Combining Paging with Dynamic Power Management, in Proceeding of IEEE INFOCOM pp. 996-1004, 2001 C.F. Chiasserini and R.R. Rao , A distributed power management policy for wireless ad hoc networks, IEEE Wireless Communication and Networking Conference, vol. 3 ,pp.1209 1213, 2000. S.Rajeswari, Y.Venkataramani, Adaptive energy Concerve Routing Protocol for Mobile Ad Hoc Networks, WSEAS transactions on Communications, Vol. 11, PP. 463-475 Dec 2012. Eduardo da Silva, Renan Fischer e Silva, Luiz Carlos Pessoa Albini, Security Through Virtualization on Mobile Ad Hoc Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (4), pp. 270-275. Bechir Ben Gouissem, Sofiene Dridi, Salem Hasnaoui , Wireless Flexible TDMA for Industrial Networks, (2011) International Journal on Communications Antenna and Propagation (IRECAP), 1 (2), pp. 189-195.

References [1]

Stefano Basagni, Marco Conti, Silvia Giordano, Ivan Stojmenovic. Mobile Ad Hoc Networking (John Wiley & Sons, 2004). [2] Elizabeth M.Royer, Chai-Keong Toh, A Review of Current Routing Protocols for Ad Hoc Mobile Wireless Networks, IEEE Personal Communications, pp. 46-55. April 1999 [3] Benjie Chen, Kyle Jamieson, Hari Balakrishnan, Robert Morris, Span: An energy-efficient coordination algorithm for topology maintenance in ad hoc wireless networks ACM Wireless Networks Journal, 8(5), pp. 481-494, September 2002. [4] [5] Ya Xu, John Heidemann, and Deborah Estrin. “Adaptive energy-conserving routing for multi-hop ad hoc networks.” Technical Report 527, USC/Information Sciences Institute, October 2000. [5] Ya Xu, John Heidemann, Deborah Estrin, Geography-informed energy conservation for ad hoc routing, in Proceedings of 7th Annual International Conference on Mobile Computing and Networking, pp. 70-84. July 2001 [6] S.D.Muruganathan, D.C.F.Ma, R.I Bhasin, A.O Faojuwo, A Centralized Energy –Efficient Routing Protocols for Wireless Sensor Networks, IEEE Commination Magazine, vol 43, PP.S8S13. Mar 2005. [7] X.Hou, D.Tipper, Gossip based sleep protocol for Energy Efficient Routing in Wireless Ad Hoc Networks, Proceeding’s IEEE Wireless Communication and Networking Conference, PP. 1305-1310. Mar 2004. [8] Qing Zhao, Member, Lang Tong, David Counsil, Energy-Aware Adaptive Routing for Large-Scale Ad Hoc Networks: Protocol and Performance Analysis, IEEE Transactions On Mobile Computing, vol. 6, no. 9, pp. 1048-1059, September 2007. [9] Shih-Lin Wu, Pao-Chu Tseng , Jheng-Yu Yang, An efficient power saving MAC protocol for IEEE 802.11 Ad Hoc Wireless Networks, Journal of Information Science and Engineering 23, 2007. [10] Neeraj Tantubay, Dinesh Ratan, Gautam Mukesh, Kumar Dhariwal, A Review of Power Conservation in Wireless Mobile Adhoc Network (MANET), IJCSI International Journal of Computer Science Issues, Vol. 8, No 1, pp. 378-383, July 2011 [11] Ren-Hung Hwang, Chiung-Ying Wang , Chi-Jen Wu ,Guan-Nan Chen, A novel efficient power-saving MAC protocol for multihop MANETs ,International Journal Of Communication Systems ,2011. [12] Arvind sankar, Zhen Liu, Maximum Lifetime Routing in Wireless


Authors’ information Department of ECE-PG, Sona College of Technology, Salem, Tamilnadu, India - 636005. G. Ravi obtained his B.E. in Electronics and Communication Engineering (Bharathiyar university of Coimbatore, 2002), M.E., in Communication Systems (Sona college of Technology, Salem, 2007).Presently he is a Assistant professor in the department of Electronics and Communication Engineering (Post Graduate Studies), Sona College of Technology , TPT Road, Salem -636005, Tamil Nadu, India. He is a member in IEEE since 2009.He is currently doing his research work on ad hoc Networks which includes Energy Efficient Routing Protocols, Power Management and Reliability. E-mail: [email protected] Kishana R. Kashwan received the degrees of M. Tech.in Electronics Design and Technology and Ph.D. in Electronics and Communication Engineering from Tezpur University (a central university of India),Tezpur, India, in 2002 and 2007 respectively. Presently he is a professor and Dean of Post Graduate Studies in the department of Electronics and Communication Engineering (Post Graduate Studies), Sona College of Technology (An Autonomous Institution Affiliated to Anna University), TPT Road, Salem -636005, Tamil Nadu, India. He has published extensively at international and national level and has travelled to many countries. His research areas are VLSI Design, Communication Systems, Circuits and Systems and SoC / PSoC. He is also director of the Centre of Excellence in VLSI Design and Embedded SoC at Sona College of Technology. He is a member of Academic Council, Research Committee and Board of Studies of Electronics and Communication Engineering at Sona College of Technology. He has successfully guided many scholars for their master’s and doctoral theses. Kashwan has completed many funded research projects. Currently, he is working on a few funded projects from Government of India. Dr.Kashwan is a member of IEEE, IASTED and Senior Member of IACSIT. He is a life member of ISTE and Member of IE (India).


2378


Design of Vertical Handoff Initiation and Decision Algorithm in Heterogeneous Wireless Networks S. Aghalya1, A. Sivasubramanian2 Abstract – Vertical handoff decision (VHD) algorithms are essential components of the architecture of the forthcoming Fourth Generation (4G) heterogeneous wireless networks. These algorithms need to be designed to provide the required Quality of Service (QoS) to a wide range of applications while allowing seamless roaming among a multitude of access network technologies. This work mainly focuses on vertical handoff initiation and decision. The handoff initiation is critically important to keep the unnecessary handoffs and their failures at a low level. On the other hand, to maximize the end-user’s satisfaction levels, the decision to select the best network among other available candidates also plays an important role. The proposed algorithm provides a complete framework to perform vertical handoffs in a heterogeneous wireless network. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Heterogeneous Wireless Networks, Vertical Handoff, Quality of Service, Handoff Initiation, Handoff Decision, Prediction

Nomenclature S(x) N(x) WX D J P T

Score function Normalized function of the parameter x Weight assigned to the parameter x Delay Jitter Packet loss ratio Throughput

I.

Introduction

So far significant research has been done to achieve seamless mobility while an Mobile Station (MS) moves across different heterogeneous wireless networks. This work mainly focuses on vertical handoff initiation and decision. Horizontal handoff decisions between homogeneous wireless networks are made mainly on the basis of Received Signal Strength (RSS). Vertical handoff decisions are based on more than one network’s parameters, including, but not limited to, RSS, MS-Velocity, Security, Cost, and QoS parameters. These decisions often incorporate network-operators’ policies and end-users’ preferences as well [1]-[22]. Many of the existing handoff algorithms do not exploit the benefits of multi-criteria and the inherent knowledge about the sensitivities of these handoff parameters in a heterogeneous wireless environment. In addition, while performing vertical handoffs, these algorithms do not take into account the QoS of an ongoing session to maximize the end user’s satisfaction based on their preferences.


2379

In nearly all the existing multi-criteria handoff schemes, assigning different weights helps prioritize network parameters. Most of the time, the assignment of these weights is done manually without considering how much of a weight is needed for a certain network parameter. This could lead to a degraded handoff performance if one parameter is given higher weight value as compared to another, especially during an ongoing user-session, such as a Voice over IP (VoIP) conversations, where achieving a minimum level of quality of service is essential. Thus, calculation for the correct weights for network parameters is an important task when operating in a heterogeneous wireless environment. Furthermore, nearly all handoff schemes utilize crisp values for these weights, ignoring the fact that typical values of parameters in a wireless network are not precise and are characterized by inherent uncertainty. Therefore, in order to guarantee the quality of the currently utilized service, proper weight assignment, especially for QoS related parameters, is of utmost importance and should be done very carefully. The handoff can be divided into handoff initiation and target network selection. Most of the research work deals with the target network selection, ignoring the handoff initiation and necessity estimation. To support seamless mobility while an MS roams in a heterogeneous wireless network, vertical handoff necessity estimation and decision to select the best target network are two important aspects of the overall mobility framework. The handoff initiation and necessity is critically important to keep the unnecessary handoffs and their failures at a low level. On the other hand, to maximize the end-user’s Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

S. Aghalya, A. Sivasubramanian

satisfaction levels, the decision to select the best network among other available candidates also plays an important role. The proposed algorithm provides a complete framework to perform vertical handoffs in a heterogeneous wireless network.

II.

Related Work

In this section, a candid discussion is provided to highlight some of the closely related schemes. The scheme in [2] utilizes fuzzy logic with Multiple Attribute Decision Making (MADM) approach to select the best network. The associated weights of the network parameters are obtained by using Analytical Hierarchy Process (AHP) process. This scheme does not take advantage of all network parameters and also does not provide any solution to the handoff necessity estimation problem. The scheme is evaluated through numerical examples only, without performing any simulation. The authors in [3] proposed a handoff initiation scheme by combining multiple parameters of all available networks in a cost function. AHP is then used to rank all the available networks, including the current Point of Attachment (PoA) or Base station (BS) or Access Point (AP). The target network selection is done with velocity and available bandwidth as two inputs, which might not produce optimal results. The authors in [4] proposed a modular handoff decision scheme. The scheme mainly concentrates on QoS related parameters to make handoff decision. Since, RSS, MS-Velocity, and other important parameters are ignored, the scheme might not provide optimal selection decisions. In addition, no handoff necessity estimation scheme is provided in this work. The research work in [5] uses parallel Fuzzy logic controllers (FLC) to normalize a subset of important network parameters to rank the network alternatives. No attention is given to the handoff initiation process and the QoS parameters. A QoS-aware fuzzy-logic based multicriteria algorithm is proposed in [6]. AHP is utilized to calculate the priority weights of network attributes. Only QoS related parameters are considered to create four FLCs for each of the four different traffic types. The absence of RSS, MS Velocity, and other important parameters may result in non-optimal handoff decisions. A vertical handoff decision algorithm based on fuzzy logic, in conjunction with Grey Prediction Theory, is presented in [7]. Since the scheme uses only the predicted values of RSS to estimate the necessity of handoffs, unnecessary handoffs are likely to occur. No consideration is given to any other important network parameters including QoS. A fuzzy based MADM scheme is provided in [8] to efficiently deal with uncertainty that is inherent with wireless networks. Parallel FLCs are utilized with two different ranking algorithms: SAW and TOPSIS. However, the scheme pays no attention to the weight elicitation process and arbitrary weights are assigned to


conduct simulations for VoIP and Web based traffic classes. Although the scheme considers QoS related parameters, RSS and other important network parameters are ignored while ranking the candidate networks. The authors in [9] utilize AHP for both weight elicitation and network selection processes. RSS is the only criterion that is used to trigger the handoff. In [10], an algorithm is proposed in which predictive RSS, that is, PRSS plus hysteresis (PRSS+hys) is only a merit to decide whether to start a handoff while the comparison of quantitative decision values is used to select a target network. This algorithm considers the predictive RSS of the serving network and not of the neighbor networks. It makes this algorithm is not practical because when a mobile station is in current network, it should know how strong the predictive RSS of the neighbor network to decide whether to handoff earlier. All the above-mentioned schemes have certain deficiencies and there is no one scheme that provides a complete solution to the vertical handoff problem. Some schemes lack in utilizing important parameters to perform handoff decisions, and some do not give any importance to the handoff necessity estimation. The proposed algorithm provides a complete framework to perform vertical handoffs in a heterogeneous wireless network by incorporating the following: • A polynomial regression based RSS prediction module is created to minimize handoff failures and call dropping probabilities. • A handoff necessity estimation module is created. This module calculates the handoff necessity based on the predicted RSS (PRSS), MS-Velocity and MS’s distance from the current PoA. • AHP method is used to assign weights to the various parameters considered in this scheme. • MADM based algorithm is utilized to perform network selection. This algorithm utilizes a rich set of network parameters to make network selection decisions. • Simulated scenarios are provided in addition to numerical examples to demonstrate the utility of the proposed scheme.

III. Vertical Handoff Necessity Estimation Module A multi-attribute vertical handoff decision is more complex than a simple RSS based horizontal handoff as the former involves attributes from different wireless technologies. In addition, the MS in a heterogeneous wireless environment has the capability to establish and maintain connectivity with many overlay networks that offer varying QoSs. Hence, estimating the necessity of a vertical handoff and choosing the right initiation time reduces the number of handoffs and improves the overall QoS. The vertical handoff process should be triggered when any of the following conditions become true [11]: • When the MS detects the availability of a new


2380


wireless network or exits the coverage area of the serving network. • When the MS detects a change in user-preferences. • When the MS detects a request that is made for a new service or if the required QoS for an existing session degrades. • When there is a severe signal degradation or complete signal loss from the current wireless network. The proposed vertical handoff necessity estimation module is capable of performing handoffs when most of the conditions discussed above become true. In the following, a detail of each block providing support to the Vertical handoff Necessity Estimation module is provided. III.1. System Parameters The proposed scheme utilizes a few carefully chosen parameters that are critical to maximize the end-users’ satisfaction while performing efficient handoffs [12]. These parameters include network RSS, MS-Velocity, distance between the BS/AP and MS, network loadingconditions, power consumption provided by the network, service cost, and QoS parameters including network throughput, delay, jitter, and Packet Loss Ratio (PLR). Received Signal Strength: This parameter is used in both horizontal and vertical handoffs. It is related to the QoS of an application. RSS is inversely proportional to the distance between the mobile station and the base station and could result in excessive and unnecessary handoffs. QoS: Handing off to a network with better network conditions and higher performance would usually provide improved service levels. Network related parameters such as network throughput, delay, jitter and packet loss may need to be considered for effective network usage. Network loading: Based on available bandwidth, the loading conditions are determined. It is a good measure of available communication resources at the base station. MS Velocity: Velocity is an important factor for handoff decision since it relates to the network connection duration metric and location of the mobile station. A mobile station traveling at a very high speed may result in excessive handoffs between wireless networks. Power consumption: This may be a significant factor for handoff since wireless devices operate on limited battery power. When the battery level decreases, handing off to a network with lower power requirements would be a better decision. Cost: The cost of services offered is a major consideration to users. Different network operators and service providers may employ different billing strategies. It may affect the user’s choice of access network, and consequently handoff decision. The Vertical handoff necessity estimation module utilizes the MS-velocity, distance between the BS/AP and the MS and RSS as inputs. It is assumed that these values are available to the


MS through some mechanism; for example, the GPS module installed in most modern MSs is capable of estimating the MS’s velocity. These parameters are monitored and evaluated by the handoff necessity estimation scheme to determine if any of the vertical handoff triggering conditions mentioned above are true. For simplicity, we assume that the MS is equipped with multiple wireless interfaces and it can connect to different types of networks, but at a given instant of time it is connected to only one network type. The types of networks include Wireless Local Area Network (WLAN), Wireless Metropolitan Area Network (WMAN), and Wireless Wide Area Network (WWAN). With the exception of distance between the MS and the PoA, these parameters are also utilized in the Target Network Selection module to determine the best network among a list of candidate networks. III.2. Weight Calculation for System Parameters From a decision-making perspective, the end-users can specify their needs and preferences by assigning priority weights to each system parameters. To maximize end-users’ satisfaction, higher weights are assigned to network RSS and QoS. Furthermore, since QoS requirements vary for various types of traffic classes, different weights with respect to traffic types, need to be calculated and assigned for QoS related parameters. The proposed scheme considers four different types of traffic classes with different characteristics and QoS demands. These traffic classes are defined by 3GPP TS-23.107 specification [13] and summarized in Table I. TABLE I TRAFFIC CLASSES WITH VARYING QOS REQUIREMENTS Traffic classes Comments Streaming One-way transport Example: A user watching a video clip from YouTube or listening to his favorite radio channel over the Web End-to-end delay is not important Jitter and Throughput plays an important roleInteractive Interactive Two-way transport that relies on request/response mechanisms Example: User chatting with another user using Yahoo messenger or performing a financial transaction over the Web Delay and PLR are important Jitter and throughput are relatively less importantnversational Conversationa Two-way transport Example: VoIP and video l conferencing between end-users Delay and Jitter are critically important. PLR and throughput are relatively less importantNd Background One-way transport Example: User sending SMSs or emails PLR is very important Delay, Jitter and Throughput are relatively less important

The proposed scheme utilizes AHP algorithm to calculate weights for different system parameters. The AHP method is introduced by Saaty [14] to find a solution for the complicated problems by dividing such problems into a hierarchy of easy to analyze decision


2381


factors and alternatives. AHP performs pair-wise comparisons between the attributes, transforms these comparison scores into weights of decision criteria, and prioritizes all alternatives on each criterion to obtain the overall ranking of alternatives. The order of preference for system parameters is given as: RSS, QoS, Velocity, Network Loading, Power consumption, and Cost; where RSS and QoS are given equal importance to maximize end-users’ satisfaction. The details of weight calculation process for four traffic classes is given as follows: Weights for Conversational Traffic Class Table III shows the AHP decision matrix for system parameters. Note that these values are a sample of endusers’ subjective assignments of relative importance, and may be changed based on their preferences. These values are assigned using the AHP scale of importance given in Table II. TABLE II AHP FUNDAMENTAL SCALE OF IMPORTANCE Intensity of importance Definition 1 Equal Importance 3 Moderate Importance 5 Strong Importance 7 Very Strong Importance 9 Extreme Importance 2,4,6,8 Intermediate values

The weights for the four QoS parameters are calculated. This is shown in Eq. (1):

WQoS CONV

WD   0.394  W   0.394  J   WP   0.164      WT   0.049 

(1)

Finally, the overall weightings for all the parameters for the conversational traffic are shown in Eq. (2):

WRSS   0.324    0.324 W  W 0.324  0.394  0.128  D  QoS    W  W   J  QoS  0.324  0.394  0.128  WQoS  WP  0.324  0.164  0.053        W  WQoS  WT   0.324  0.049   0.016       0.146  0.146  WVELO    0.112  WLOAD  0.112        0.062  WPC  0.062    W  0.033  0.033   COST 

(2)

Weights for Interactive Traffic Class Table V shows the AHP decision matrix for the relative importance of QoS’s sub-parameters based on the characteristics of Interactive traffic class and the QoS requirements mentioned in Table I. The weights for the four QoS parameters are calculated. This is shown in Eq. (3):

TABLE III AHP DECISION MATRIX FOR SYSTEM PARAMETERS Power Network Parameters RSS QoS Velocity Consumptio Cost loading n RSS 1 1 3 4 5 7 QoS 1 1 3 4 5 7 Velocity 1/3 1/3 1 2 3 5 Network 1/4 1/4 ½ 1 3 5 loading Power 1/5 1/5 1/3 1/3 1 3 Consumption Cost 1/7 1/7 1/5 1/5 1/3 1

WQoS  INTER

Table IV shows the AHP decision matrix for the relative importance of QoS-parameters based upon the characteristics of Conversational traffic class. As mentioned in Table I, delay and jitter are critical for Conversational traffic class and higher values of these two parameters could result in an unacceptable quality of service. PLR is relatively less important as compared with delay and jitter; humans have the capability of making up the contents of the ongoing conversation, regardless of moderate packet loss. On the other hand, the throughput requirement for Conversational traffic is relatively low and can be supported by all types of networks. TABLE IV AHP DECISION MATRIX FOR QOS PARAMETERS Parameters Delay Jitter PLR Throughput Delay 1 1 3 7 Jitter 1 1 3 7 PLR 1/3 1/3 1 5 Throughput 1/7 1/7 1/5 1


WD  0.313  W   0.050  J   WP  0.520      WT  0.118 

(3)

Finally, the overall weightings for all the parameters for the interactive traffic are shown in Eq. (4):

WRSS   0.324    0.324 W  W  QoS D   0.324  0.313  0.101  W  W   J  QoS  0.324  0.050  0.016  WQoS  WP  0.324  0.520  0.168        W  WQoS  WT   0.324  0.118   0.038       0.146  0.146  WVELO    0.112  WLOAD  0.112        0.062  WPC  0.062    W  0.033  0.033   COST 

(4)


2382


TABLE V AHP DECISION MATRIX FOR QOS PARAMETERS Parameters Delay Jitter PLR Throughput Delay 1 7 1/2 3 Jitter 1/7 1 1/5 1/3 PLR 2 8 1 5 Throughput 1/3 3 1/5 1


WQoS  STREA

Weights for Background Traffic Class Table VI shows the AHP decision matrix for the relative importance of QoS’s sub-parameters based on the characteristics of Background traffic class and the QoS requirements mentioned in Table I. TABLE VI AHP DECISION MATRIX FOR QOS PARAMETERS Parameters

Delay

Jitter

PLR

Throughput

Delay Jitter PLR Throughput

1 1 9 5

1 1 9 5

1/5 1/5 1 1/3

1/9 1/9 3 1


WQoS  BACK

WD  0.067  W   0.067  J   WP  0.604      WT  0.264 

WD   0.051  W   0.193  J   WP   0.296      WT   0.460 

(7)

Finally, the overall weightings for all the parameters for the Streaming traffic are shown in Eq. (8):

WRSS   0.324    0.324 W  W 0.324  0.051  0.017  D  QoS    W  W   J  QoS  0.324  0.193  0.063  WQoS  WP  0.324  0.296  0.096        W  WQoS  WT   0.324  0.460   0.149       0.146  0.146  WVELO    0.112  WLOAD  0.112        0.062  WPC  0.062    W  0.033  0.033   COST 

(8)

(5) III.3. RSS Prediction Using Polynomial Regression

Finally, the overall weightings for all the parameters for the background traffic are shown in Eq. (6):

WRSS   0.324    0.324 WQoS  WD  0.324  0.067  0.022     W  W   QoS J     0 . 324  0 . 067 0 . 022   WQoS  WP  0.324  0.604  0.196        W  WQoS  WT   0.324  0.264   0.086       0.146 0.146  W    VELO    0.112  WLOAD  0.112       0 . 062 0 . 062     W  PC     W  0.033  0.033   COST 

(6)

Weights for Streaming Traffic Class Table VII shows the AHP decision matrix for the relative importance of QoS’s sub-parameters based on the characteristics of Streaming traffic class and the QoS requirements mentioned in Table I. TABLE VII AHP DECISION MATRIX FOR QOS PARAMETERS Parameters

Delay

Jitter

PLR

Throughput

Delay Jitter PLR Throughput

1 5 6 7

1/5 1 2 2.5

1/6 1/2 1 2

1/7 1/2.5 1/2 1

The proposed scheme utilizes predicted RSS values measured from the current PoA, as well as from the target networks. These predicted values are obtained using polynomial regression. The proposed scheme utilizes these predicted values to determine if a future handoff is necessary or not. The polynomial regression based predictive RSS handoff approach consists of two steps: the preprocess step and the RSS prediction step. Step 1: The preprocess step In prediction, some previous RSSs are important for determining the next predictive RSS. To intensify the polynomial regression-based curve fitting, the preprocess of accumulated generating operation, which is based on [15], is adopted for achieving the accuracy of the prediction results. The preprocess generates a new data sequence by summation of previous n data, which is formulated by: n

S ' n 

 S  p

(9)

p 1

where S(p) means the original data sequence, and S'(n) is the new sequence after executing the preprocess. The purpose of the pre-process is to smooth the fitting curve. Step 2: The RSS prediction step The new data sequence is used as the input data for curve fitting in the RSS prediction step. The predictive RSS in the new sequence is denoted as RSS ' prediction .



2383


It is computed by:

RSS ' prediction  Fpre  t  1

(10)

which is a polynomial function (10). By finding this polynomial function, the prediction of the preprocessed RSS, RSS ' prediction , that is, S '  n  1 , is determined. The predictive RSS of the original sequence, RSS prediction , i.e., S  n  1 , can be determined by the reverse transformation of (9) as:

S  n  1  S '  n  1  S '  n  Table VIII shows RSS samples measured from different types of networks and their predicted values using Polynomial Regression. While a continuous drop pattern for RSS can be observed for both WLAN and WMAN networks, results calculated using prediction method could help in reducing the unnecessary call drops due to the predicted value of weak RSS.

Network Type WLAN WMAN WWAN

TABLE VIII RSS SAMPLES RSS Samples (dbm) [-110 -110 -112 -113] [-140 -150 -151 -155] [-110 -111 -100 -95]

PRSS (dbm) -114.69 -157.08 -86.84

III.4. VHO Factor Calculation Using Fuzzy Logic In the proposed scheme, a Mamdani Fuzzy logic Inference System (FIS) is utilized to calculate the value of Vertical Handoff Factor and determine the necessity of handoffs based on the current conditions of serving PoA. A Mamdani FIS is composed of the functional blocks [16]:  A fuzzifier which transforms the crisp inputs into degrees of match with linguistic values;  A fuzzy rule base which contains a number of fuzzy IF-THEN rules;  A database which defines the membership functions of the fuzzy sets used in the fuzzy rules;  A fuzzy inference engine which performs the inference operations on the fuzzy rules;  A defuzzifier which transforms the fuzzy results of the inference into a crisp output. PRSS, Velocity, and Distance between MS and PoA are used as the input parameters. The crisp values of the input parameters are fed into a fuzzifier in a Mamdani FIS, which transforms them into fuzzy sets by determining the degree to which they belong to each of the appropriate fuzzy sets via membership functions (MFs). Next, the fuzzy sets are fed into a fuzzy inference engine where a set of fuzzy IF-THEN rules is applied to obtain fuzzy decision sets. The output fuzzy decision sets


are aggregated into a single set and passed to the defuzzifier to be converted into a precise quantity, the vertical handoff factor, which determines whether a handoff is necessary. Fuzzy sets for each parameter are created based on the different network types. The Universe of Discourses (UoDs), for the input parameters, are selected based on the published standards for different network types (IEEE 802.11, IEEE 802.16, UMTS) [17] and include the lowest and the highest values of the parameter that can be measured at the MS. Different Linguistic Variables such as low, medium, and high are created to partition these UoDs. Typical operating ranges of the attributes for three types of networks, utilized in this work are given in Table IX. For power consumption and cost, we use a range of [1, 10], 10 being the highest level of power consumption and the most expensive network utilized. TABLE IX OPERATING RANGES OF THE PARAMETERS Parameter WLAN WMAN RSS (dbm) -110 – -55 -160 – -100 Delay (ms) 100-150 10-50 Jitter (ms) 10-30 3-12 6 PLR per 10 bytes(%) 3-7 1-8 Throughput (Mbps) 50-150 20-100 Network Range (m) 0-100 0-350 Velocity (mps) 0-10 Traffic load (%) 0-100 Power consumptions 1-10 Cost 1-10

WWAN -150 – -90 10-75 5-15 1-5 0.1-3 0-750

For WLAN, the fuzzy set “Low” is defined from 120dbm to -90dbm. The fuzzy set “Medium” is defined from -100 to -60dbm and the fuzzy set “High” is defined from -70 to -55. The universe of discourse for the WLAN network coverage or distance between WLAN and MS is defined from 0 to 100m, such as 0-40 (Near), 20-80 (Medium) and 60-100 (Far). The universe of discourse for the MS-Velocity is defined from 0 to 10m. The fuzzy set “Low” is defined from 0-4, “Medium” from 2-8 and “High” from 5-10. The fuzzy set values for the output decision variable handoff factor are Higher, High, Medium, Low, and Lower. The universe of discourse for the handoff factor is defined from 0 to 1. Since there are three fuzzy input variables and three fuzzy sets for each fuzzy variable, the maximum possible number of rules in our rule base is 33 = 27. The fuzzy rule base contains IF-THEN rules such as:  IF PRSS is weak, velocity is high, and distance is far, THEN handoff factor is higher.  IF PRSS is weak, velocity is high, and distance is medium, THEN handoff factor is high.  IF PRSS is strong, velocity is low, and distance is near, THEN handoff factor is lower.  IF PRSS is strong, velocity is medium and distance is medium, THEN handoff factor is low. The crisp handoff factor computed after defuzzification is compared against a used to determine if a handoff from the serving PoA is required. This threshold value can be adjusted according to the sensitivity of the network types. A higher value of this International Review on Computers and Software, Vol. 8, N.10

2384


IV.

Network Selection Algorithm

A suitable access network has to be selected once the handoff necessity is estimated. We formulate the network selection decision process as a MADM problem that deals with the evaluation of a set of alternative access networks using a multiple attribute target network selection function (MNSF) defined on a set of parameters [18]. The MNSF is an objective function that measures the efficiency and improvement in quality of service to mobile stations by handing off to a particular network. It is calculated for all alternative target access networks that cover the service area of the mobile station. The network that provides the highest MNSF value is selected as the best network to handoff from the serving network based on certain user and system parameters. The main purpose of the algorithm is to determine and select an optimum wireless access network for a particular high-quality service. The proposed scheme utilizes some carefully chosen parameters that are critical to maximize the end user’s satisfaction while performing handoffs. These parameters include network RSS, QoS, MN velocity, network load, power consumption and cost. The QoS parameters include network throughput, delay, jitter and packet loss ratio (PLR).The optimum wireless network must satisfy maximize Si(x), where Si(x) is the score function evaluated for network i and x is the vector of input parameters. The score function Si(x) is the function of input parameters and can be expressed as:

The MS calculates the handoff necessity information in the handoff initiation algorithm when the MS detects a new network or the user changes his preferences. If the algorithm indicates the need for a handoff from the current network to a target network, then MNSF, that is, score function is calculated for the current network and other target candidate networks. Vertical handoff takes place if the target network receives a higher score function.

V.

Simulation and Results

The performance evaluation of the proposed scheme is presented here. The two modules, vertical handoff necessity estimation and network selection algorithm are simulated and evaluated. Four types of traffic classes, Conversational, Streaming, Background, and Interactive, using three types of wireless networks, WLAN, WMAN, and WWAN, are utilized in evaluating the performance of the proposed scheme. Fig. 1 shows AHP-based weights assignments for different QoS parameters based on the characteristics of different traffic classes. It can be seen that the weighting scheme assign higher weight to throughput with respect to the Streaming class, higher weights to delay, and jitter for traffic based on Conversational class and higher weight to PLR for Interactive and Background traffic classes. Scenario 1: For the Vertical handoff necessity estimation, it is assumed that the end user is currently watching a recorded video (streaming) using his/her own WLAN. Different parameters for this scenario are shown in Table X. Weight Assinments 0.25 0.2 weights

threshold will prevent necessary handoffs, resulting in high probability of call drops. On the other hand, a low value will result in frequent handoffs, resulting in unnecessary wastage of system resources. Thus, a balanced value for this threshold is required. In this work, the threshold value of 0.75 is utilized. If handoff factor is greater than 0.75, handoff is initiated. Since the WLAN has a smaller coverage range, when the mobile user is moving out of the WLAN coverage area, an accurate and timely handoff decision is needed to maintain the connectivity.

QoS-Delay QoS-Jitter QoS-PLR QoS-Throughput

0.15 0.1 0.05 0

Si(x) = f(RSSi, QoSi, Velocityi, loadi, 1/poweri, 1/costi)= 4 2 1 = wx N  xi   wy N    yi  i 1 i 1



conv

back

stre

traffic type



Fig. 1. Weight Assignments for QOS parameters

where N(x) is the normalized function of the parameter x and wx is the weight assigned to that parameter, with xi are RSS, QoS, velocity and network load and yi are power consumption and cost. Normalization is needed to ensure that the sum of the values in different units is meaningful. The parameters are normalized with respect to minimum or maximum values of the parameters: 4

Si  x  

inter

 x 

2

 ymin   yi 

 wx  xmaxi    wy  i 1

i 1


TABLE X DIFFERENT PARAMETERS FOR CURRENT POA Parameters Values Current PoA WLAN Traffic class Streaming Weight scheme AHP MS-Velocity (m/s) 0 (Low) MS-PoA Distance (m) 10 (Near) RSS Samples (dbm) -58.5, -55.3, -57.6, -59.8 PRSS using poly.reg (dbm) -62.21 (High) Delay (ms) 100 (Low) Jitter (ms) 10 (Low) PLR (loss per 106 bytes) 3 (Low) Throughput (mbps) 130 (High)


2385


TABLE XI PARAMETER VALUES Parameters values Current PoA WLAN Traffic class Streaming Weight scheme AHP MS-Velocity (m/s) 2 (Low) MS-PoA Distance (m) 85 (Far) RSS Samples (dbm) -90.5, -93.7, -98, -99.2 PRSS using poly.reg (dbm) -108.05 (un detectable) Delay (ms) 120 (High) Jitter (ms) 20 (High) PLR (loss per 106 bytes) 4 (Medium) Throughput (mbps) 30 (Low)

As the user is walking, the vertical handoff target network selection scheme senses the availability of three different networks. The parameter values for these three networks are presented in Table XII.

WWAN -116.10 10 4 3 1.5 40 7 7

score function

1 0.8 WLAN

0.6

WMAN 0.4

WWAN

0.2 0 conv

inter

back

stream

traffic class

Fig. 2. Scoring of available networks based on traffic classes (velocity 2m/s)

Scenario 3: Assume the same user steps in a vehicle that moves at a velocity of 5m/s. Although RSS and some other parameters change rapidly due to the dynamic structure of wireless networks, these values are kept constant by focusing only on the effects of velocity on the network selection process. As the velocity increases it is observed that the probability of rejection of WLAN is increased. The algorithm prefers WMAN for streaming, interactive and background traffic and WWAN for conversational traffic, since it does not tolerate higher values of delay and jitter. It can be observed in Fig. 3. The score values of the networks for the four traffic classes at a higher velocity of 10m/s are given in Fig. 4. Scoring of available networks based on traffic classes 1 0.8 WLAN

0.6

WMAN 0.4

WWAN

0.2 0 conv

inter

back

stream

traffic class

Fig. 3. Scoring of available networks based on traffic classes (velocity 5m/s) Scoring of available networks based on traffic classes

The network selection algorithm finds out the best available network that can support the continuity and quality of the current service in terms of score values. Fig. 2 shows the score values for all available networks produced by the algorithm for all four traffic classes. It is observed that based on the network parameters and their corresponding weights, the mobile station moving with a velocity of 2m/s prefers WLAN for streaming and background traffic and WMAN for interactive and conversational traffic. Although the RSS is weaker for WLAN, other factors like velocity, network loading and cost influence the algorithm to select WLAN as the preferred network.


1 score function

TABLE XII NETWORK PARAMETER VALUES Parameters WLAN WMAN PRSS (dbm) -114.05 -137.40 Delay (ms) 130 20 Jitter (ms) 27 5 PLR 3 4 Throughput (Mbps) 70 60 Network load 20 30 Power consumption 1 5 Cost 3 4 MS-Velocity (m/s) 2

Scoring of available networks based on traffic classes

score function

Based on the parameter values, the overall handoff factor is calculated as 0.25. Since this value is less than the handoff threshold (0.75), which is set for WLAN, the MS will not perform handoff and remain connected to its current PoA. Scenario 2: Assuming the end-user starts walking away from his current PoA while watching the same video, the distance between the WLAN-AP and MS increases and the RSS becomes weaker. The new parameter set is presented in Table XI. Based on RSS samples, the predicted RSS value cannot be sensed by the MS. In this situation, the overall handoff factor is calculated as 0.85. Since this value is greater than the handoff threshold, the scheme will trigger the handoff and execute the target network selection algorithm to find out the best available network that can support the continuity and the quality of the current service.

0.8 WLAN

0.6

WMAN 0.4

WWAN

0.2 0 conv

inter

back

stream

traffic class

Fig. 4. Scoring of available networks based on traffic classes (velocity 10m/s)

As the velocity of the mobile station increases, the algorithm prefers WWAN since it provides a larger coverage area than WLAN and WMAN. It avoids International Review on Computers and Software, Vol. 8, N.10

2386


frequent handoffs. But the streaming traffic prefers WMAN, since its throughput requirement is high. To illustrate the performance of the proposed algorithm, the metrics such as Number of handoffs and Handoff call blocking probability are considered. Number of handoffs: It is defined as the number of handoffs that the mobile node has performed during a call connection. A lower value indicates the guaranteed continuity and quality of service. Handoff call blocking probability: It is the probability that the requested handoff to the target network is blocked. This occurs due to many reasons such as the sudden drop of signal and non-availability of a channel or other resources. It is defined as the ratio between rejected handoffs to the total requested handoffs in a system. This metric is very important since the loss of connection results in reduced overall satisfaction of the end user. The performance of the proposed algorithm is evaluated under different velocities ranging from 0 to 10m/s. The average rate of new calls is fixed at 10 calls/sec for each velocity. From Fig. 5 it is observed that the proposed algorithm results in the fewest number of vertical handoffs compared to PRSS+hys algorithm. But the number of handoffs increase with speed. The handoff blocking probability is reduced for the proposed algorithm. At lower value of speed, a lower value of handoff blocking probability is provided. The handoff blocking probability increases for medium and higher speed users. It is shown in Fig. 6. PRSS+HYS

no.of handoffs

proposed 80 70 60 50 40 30 20 10 0 0

2

4

6

8

10

12

velocity(m/s)

Fig. 5. Number of handoffs under various velocities PRSS+HYS handoff blocking probability

proposed 1 0.8 0.6 0.4 0.2 0 0

2

4

6

8

10

12

velocity(m/s)

Fig. 6. Handoff blocking probability under various velocities

VI.

Conclusion

In this paper, two modules, such as, handoff initiation Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

and target network selection are developed. The first module determines whether a handoff is necessary by considering PRSS values, velocity of MS and distance between MS and PoA. Second module determines the best target network by considering various system parameters. It is observed that WLAN is the preferred network for background and streaming traffic classes and WMAN for conversational and interactive traffic classes for slower moving MS, WMAN is the preferred network for interactive, background and streaming traffic classes for medium speed MS and WWAN is the preferred network for conversational, interactive and background traffic classes for fast moving MS. It is also observed that the proposed algorithm outperforms the other in reducing the number of handoffs and handoff blocking probability.

References [1]

Enrique Stevens-Navarro, Vincent W.S. Wong and Yuxia Lin, “A Vertical Handoff Decision Algorithm for Heterogeneous Wireless Networks”, In Proc. of IEEE Wireless Communications and Networking Conference (WCNC'07), Hong Kong, China, March 2007. [2] M. L. Chan, Y. F. Hu and R. E. Sheriff, "Implementation of fuzzy multiple objective decision making algorithm in a heterogeneous mobile environment," in Wireless Communications and Networking Conference, 2002. WCNC2002. 2002 IEEE, 2002, pp. 332-336 vol.1. [3] Z. Yan, H. Luo, Y. Qin, H. Zhou, J. Guan and S. Zhang, "An adaptive multi-criteria vertical handover framework for heterogeneous networks," in Proceedings of the International Conference on Mobile Technology, Applications, and Systems, Yilan, Taiwan, 2008, pp. 14:1-14:7. [4] T. Thumthawatworn, A. Pervez and P. Santiprabhob, "Modular handover decision system based on fuzzy logic for wireless networks," in Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2011 8th International Conference on, 2011, pp. 385-388. [5] Shih Jung Wu, "An intelligent handover decision mechanism for heterogeneous wireless networks," in Networked Computing and Advanced Information Management (NCM), 2010 Sixth International Conference on, 2010, pp. 688-693. [6] K. Vasu, S. Maheshwari, S. Mahapatra and C. S. Kumar, "QoS aware fuzzy rule based vertical handoff decision algorithm for wireless heterogeneous networks," in Communications (NCC), 2011 National Conference on, 2011, pp. 1-5. [7] X. Liu and L. Jiang, "A novel vertical handoff algorithm based on fuzzy logic in aid of grey prediction theory in wireless heterogeneous networks," Journal of Shanghai Jiaotong University (Science), vol. 17, pp. 25-30, 2012. [8] T. Oliveira, S. Mahadevan and D. P. Agrawal, "Handling network uncertainty in heterogeneous wireless networks," in INFOCOM, 2011 Proceedings IEEE, 2011, pp. 2390-2398. [9] S. Dhar, A. Ray and R. Bera, "Design and Simulation of Vertical Handover Algorithm for Vehicular Communication," International Journal of Engineering Science and Technology, vol. 2, pp. 5509-5525, 2010. [10] L.Xia, J.Ling-ge. H.Chen and L.Hong-wei, “An Intelligent Vertical Handoff Algorithm in Heterogeneous Wireless Networks”, International Conference on Neural Networks and Signal Processing, 2008, pp.550-555. [11] Y. Nkansah-Gyekye and J. I. Agbinya, "A vertical handoff decision algorithm for next generation wireless networks," in Broadband Communications, Information Technology & Biomedical Applications, 2008 Third International Conference on, 2008, pp. 358-364. [12] N. Nasser, A. Hasswa and H. Hassanein, "Handoffs in fourth


2387


[13] [14]

[15]

[16] [17]

[18] [19]

[20]

[21]

[22]

generation heterogeneous networks," Communications Magazine, IEEE, vol. 44, pp. 96-103, 2006. J. Rinne, "3GPP Specification Details, TS 23.107," . T. L. Saaty and L. G. Vargas, “Models, Methods, Concepts & Applications of the Analytic Hierarchy Process”, Boston: Kluwer Academic Publishers, 2001. T.-C. Chang, K.-L. Wen, and M.-L. You, “The Study of Regression Based on Grey System Theory,” IEEE International Conference on Systems, Man, and Cybernetics, Vol. 5, pp. 43074311, Oct. 1998. S. R.Jang and C.T Sun, “Neuro-Fuzzy Modeling and Control”, Proceedings of the IEEE, March 1995. J. D. Martínez-Morales, U. Pineda-Rico and E. Stevens-Navarro, "Performance comparison between MADM algorithms for vertical handoff in 4G networks," in Electrical Engineering Computing Science and Automatic Control (CCE), 2010 7th International Conference on, 2010, pp. 309-314. E. Triantaphyllou, Multi-Criteria Decision Making Methods: A Comparative Study. 2000. Nursel Akçam, Murat Oğul, Optimizing Data Transport by Using MPLS-TE-FRR and QoS, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (5), pp. 283-289. D. David Neels Pon Kumar, K. Murugesan, K. Arun Kumar, Jithin Raj, Performance Analysis of Fuzzy Neural based QoS Scheduler for Mobile WiMAX, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 377-385. Said Ben Alla, Abdellah Ezzati, A Qos-Guaranteed Coverage and Connectivity Preservation Routing Protocol for Heterogeneous Wireless Sensor Networks, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 363-371. A. El Fallahi, Implementation of the UMTS Technology in the GSM Existing Network: Capacity/Interference Optimization, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 372-376.



Associate Professor, Department of Electronics and Communication Engineering, St.Joseph’s college of Engineering, Anna University, India E-mail: [email protected] 2

Professor, Department of Electronics and Communication Engineering, St.Joseph’s college of Engineering, Anna University, India. E-mail: [email protected] S. Aghalya has received B.E degree in ECE from University of Madras in 1991 and M.E.in Optical Communication from Anna University in 2002. Currently she is working as Associate Professor, in the department of Electronics and Communication Engineering at St. Joseph’s College of Engineering, Chennai, India. She has 19 years of experience in teaching. Her area of research interest is Heterogeneous wireless communication networks. A. Sivasubramanian has received B.E. degree in ECE from University of Madras in 1990 and M. E. in Applied Electronics from Bharathiar University in 1995 and Ph.D degree in Optical Communication from Anna University Chennai in 2008. Currently he is working as Professor & Head, in the department of Electronics and communication engineering at St. Joseph’s College of Engineering, Chennai, India. He has 20 years of experience in teaching and guiding projects for undergraduate and postgraduate students. He has added many international and national publications to his credit. His areas of interest include optical communication, optical networks, Bio-optical Engineering, Wireless sensor and computer networks.


2388


Analysis of Depth Based Routing Protocols in Under Water Sensor Networks J. V. Anand, S. Titus Abstract – Underwater Acoustic Sensor Networks (UW-ASNs) have the potential to enable many applications such as environmental monitoring, undersea exploration and distributed tactical surveillance. Under water acoustic communication is preferred over Electromagnetic communication (radio and optical communication) due to higher propagation delay along with high attenuation of radio communication while the effect of scattering is higher at optical communication. Fundamental differences between underwater acoustic propagation and terrestrial radio propagation might need a new criteria for the design of networking protocols. Under water acoustic communication is difficult due to the fact of Energy consumption, delay, and propagation speed along with node mobility in under water environment. The prime objective is to find out a good performance metrics considering the problems associated with the Depth Based Routing, and to analyze Energy Efficient Depth Based Routing with a priority value based on the distance between source node and the forwarding node, Novel Energy Efficient Depth Based Routing considering its depth, residual energy, fitness value between source node, forwarding node and the sink. NS2 based Aquasim in a 500*500 environment, is considered, with a maximum of 100 nodes positioned in random way point. The results using simulation has been taken with varying Number of Nodes, Report Interval, Packet Size, Pause Time and Transmission range which shows that Novel Energy Efficient Depth Based Routing, and Energy Efficient Depth Based Routing outperforms Depth Based Routing at various situations since the conventional routing used considers only depth. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Depth Based Routing, Energy Efficient Depth Based Routing, Holding Time, Novel Energy Efficient Depth Based Routing and Residual Energy

I. I.1.

Introduction

Basics of Acoustic Communications

The process of sending and receiving messages below water, relying on sound, is known as under water Acoustic Communication. Acoustic waves replace radio waves (at speed of 1.5×103 m/s). Communication speed is decreased, from speed of light to speed of sound, results in high propagation delays of five orders of magnitude which can be problematic for real-time applications [1]. In Oceanic literature, shallow water refers to water with depth lower than 100 m, while deep water refers to deeper oceans. There are two common kinds of geometric spreading in underwater environment: spherical (omni-directional point source), which characterizes deep water communications and cylindrical (horizontal radiation only), which characterizes shallow water communications. Acoustic networks the power required for transmitting is typically about 100 times more than the power required for receiving [2] and also nodes move 1–3 m/s by water current [3].


2389

V. Hadidi et. all proposed a new multipath scheme for shallow water, which can guarantee certain end-to-end packet error rate while achieving a good balance between the overall energy efficiency and the end-toend packet delay with localization algorithm [4]. The deployment of ocean sensor networks was previously discussed with location based strategies and heuristic algorithms [12]. The objective of this paper is to find a routing associated with feasible solution, incorporating the transmission power, receiving power, spherical spreading type for energy efficiency and also by considering the node mobility by random waypoint model for optimal routing in underwater flooding based networks. The evaluation framework for effectiveness is based on techniques (location free) like DBR Depth Based Routing, EEDBR Energy Efficient Depth Based Routing and Novel Energy Efficient Depth Based Routing. Depth analysis had been made considering the Packet Delivery Ratio, Dropping Ratio, Delay, Jitter, Throughput, Energy Consumption, Packet size, Report Interval, Pause Time, and Transmission range.NEEDBR considering it nature outperforms the EEDBR and DBR.


J. V. Anand, S. Titus

II.

Related Works

II.1.

Architectural Overview

The two dimensional architecture, is a group of sensor nodes that are anchored to the bottom of the ocean with deep ocean anchors. In the case of three dimensional architecture used to measure the ocean column at preset depth, this paves way for Sensing coverage, Communication coverage in a wireless scenario [1][5]. Underwater sensor nodes are interconnected to one or more underwater sinks (uw-sinks) by means of wireless acoustic links.

Challenges

Propagation speed of acoustic communication

Node mobility due to oceanic currents

The network devices, in charge of relaying data from the ocean bottom network to a surface station, are used with the help of vertical and horizontal links that are further used to monitor the surface at the bottom of the ocean [1]. II.2.

The routing protocols proposed for terrestrial mobile and ad hoc networks, usually fall into two categories: proactive and reactive. Unfortunately, protocols belonging to both of these extremes are not suitable for underwater sensor networks. Proactive or table driven protocols, require large signalling overhead, in order to establish end-toend routes, especially for the first time and every time, when any change occur in the topology. Reactive protocols incur a higher latency and still require sourceinitiated flooding of control packets, to establish paths. GPS is inapplicable to underwater environment, due to rapid absorption of high frequency in water [5]. II.3.

Dropping Ratio

Delay

Jitter

Energy consumption

Network Lifetime

Underwater Protocols Based on Routing

Abdul Wahid and Kim Dongkyun stated the taxonomy of routing protocols as flooding based, multipath based, Cluster based and miscellaneous based protocols [6]. Flooding based protocols transmits to all the nodes in a region. The duplicate packets are transmitted and more energy is consumed. Multipath based approach that is more than one path from a source node to a destination node is established which improves the metrics like robustness, and packet delivery ratio but the problem here is contention. Cluster based sensor nodes are grouped into clusters and classified as cluster head along with cluster members. Miscellaneous based approach of assigning priorities to packets based on the characteristics on sensed data [6].

Performance metrics

Packet Delivery ratio

Taxonomy of Routing Protocol in Terrestrial Networks

Throughput

II.4. Evaluation of Flooding Based Protocol for Effectiveness in Location Free Routing

Underwater Protocols based on Localization

Mohammad Taghi Kheirabadi and Mohd Murdha Mohamad stated the taxonomy of routing protocols based on localization in under water acoustic sensor networks that is classified by Location based (Geographical) routing assumes that each node knows geographical information about itself and sink. Location-free (non geographical) routing protocols do not employ the fully geographical information for routing. This can be classified as Pressure based equipped with pressure sensors and Beacon based were beacon messages between sensors and beacon nodes. [7].

Location free (Depth based) Depth Based Routing (DBR) Energy Efficient Depth Based Routing (EEDBR) Novel Energy Efficient Depth Based Routing (NEEDBR)

III. Existing Depth Based Routing Protocols Fig. 1. Schematic work flow

Depth Based Routing (DBR) Forwarder Node Criteria- Depth



2390


Flooding Criteria - selects a neighbouring node, receives the packet, calculates its depth via pressure sensor and compares to the embedded depth in the data packet. If its depth is lesser than the depth in the data packet, node is located in the positive progress area and it is a candidate for packet forwarding; otherwise, it simply discards the packet [8]. Depth Based Multihop Routing (DBMR) Forwarder Node Criteria -Depth/energy. Flooding Criteria - only one node is selected as the next hop node to reduce the communication overhead. The sleep /wake duty cycle is used for rest of the nodes for communication [9]. Energy Efficient Depth Based Routing (EEDBR) Forwarder Node Criteria-Depth/energy. Flooding Criteria - the nodes on the list are sorted based on their residual energy, which shows their priorities. In order to prevent redundant data packet forwarding, each candidate node considers a holding time according to its residual energy and priority in which a shorter holding time is assigned to a node with more residual energy [10]. III.1. Problem of the Previous Work In DBR, when nodes are chosen as the next hops they will influence the energy consumption of the network as well as the lifetime. DBR faces problems in both situations, as when nodes start to increase, energy consumption becomes high and when nodes start to decrease, delivery ratios gets affected with this sparseness [8]. In Depth-based multi-hop, till sensors go to sleep state, energy consumption cannot be achieved for time critical data due to this duty cycle.[9] Energy Efficient Depth Based Routing priority value taken, considers only the residual energy and does not consider the fitness of sending and forwarding nodes [10]. Also the priority value assigned, is for Same depth and different energy, Different depth and same energy, Different depth and different energy, Same depth and same energy this cannot be estimated as energy efficient since pressure sensor moves in the horizontal direction than in vertical direction. Energy Efficient Fitness Based Routing had been proposed which does not incorporate the node mobility of nodes and the criteria for transmission power and reception power.

IV.

Routing Protocols for Depth Analysis in Underwater Acoustic Sensor Networks IV.1. Depth Based Routing

Depth-based routing (DBR) [8] was the first pressure routing protocol proposed for underwater environment. In this protocol, each node is equipped with an expensive


pressure sensor to calculate locally the depth of the node. DBR only employs depth information for performing greedy routing in UWASNs. In the architecture of DBR, multiple stationary sinks are deployed on water surface, while ordinary nodes are randomly scattered in different depths, and they can move freely with water flow. Algorithm used for Routing in DBR All Packets in the nodes are designed based on SID” which the identifier of the source node and a PSN Packet Sequence Number” which represents a unique sequence number that is assigned by the source node to the packet. Packet Sequence Number together with Sender ID is required to differentiate packets in data forwarding later. These values are stored in a Packet History Buffer Q2. To verify its neighbour list due to node mobility, the sender broadcasts the message and upon receiving the broadcast message each node updates its neighbour list. The sender also has an expiry time in which if it cannot find a neighbour within a certain time its clock expires. After updating its neighbour list each node updates its depth information and current node forwards the packet only if it is of lesser depth than the previous hop and its unique ID is not in Q2. If the unique ID is already in Q2 it checks whether its hop count is less than the previous hop count between the current node and the previous node and satisfies the depth criteria. At a node an incoming packet is inserted into Q1 (Priority Queue); if its unique ID is not in its Packet History Buffer Q2 and if it was sent from a node whose depth of previous hop is greater than the current forwarder node. If a packet which is currently in Q1 is received again during the holding time, the packet will be removed from Q1. If the new copy is from a node with a smaller depth or to a depth similar to dp (previous hop) and lesser than dc (current node). Or its scheduled sending time will be updated if the new copy is from a lower node depth dp greater than dc. After a node sends out a packet as scheduled, the packet is removed from Q1 and its unique ID is inserted into Q2. In Holding Time Calculation, a node uses holding time to schedule packet forwarding. At a node, the holding time for a packet is calculated based on the difference between the depth of the packet’s previous hop and the depth of the current node. Nodes with different depths will have different holding times even for the same packet. In order to reduce the number of hops along the forwarding paths to the water surface, DBR tries to select the neighbouring node with the minimal depth to be the first one to forward a packet. It also tries to prevent other neighbouring nodes from forwarding the same packet to reduce energy consumption. Based on the above analysis, the holding time must satisfy two conditions: (1) the holding time should decrease with the increase of depth and (2) the difference between the holding times of two neighbouring nodes should be long enough so that


2391


forwarding the node with the smaller depth can be heard by the other node timely (before the lower node starts its own packet forwarding). Depth Threshold In order to further control the number of nodes involved in packet forwarding, we introduce another global parameter, Depth Threshold dth, in DBR. A node will forward a packet only if the difference between the depth of the packet’s previous hop dp and the depth of the current node dc is larger than the threshold dth. dth can be positive, 0, or negative. If dth is set to 0, all the nodes with smaller depths than the current node, are qualified forwarding candidates. If dth is set to −R, where R is the maximal transmission range of a sensor node, DBR becomes the flooding protocol. Clearly, the depth threshold represents the trade-off between packet delivery ratio and energy consumption. With a larger threshold, less nodes will be involved in packet forwarding, and thus less energy is consumed; but the packet delivery ratio will be lower. On the other hand, with a smaller threshold, more nodes will participate in packet delivery, and thus a higher packet delivery ratio can be achieved; but the energy consumption will be higher. IV.2. Energy Efficient Localization-Free Routing (EEDBR) for UWASNs EEDBR [10] (Energy Efficient Depth Based Routing) is a sender routing protocol in which sender node selects a set of next hop nodes based on their depth and residual energy.

sender of the data packet. In addition, the residual energy information about the sensor nodes is used to select the node having high residual energy among its neighbours. The selection of nodes having high energy, attempts to balance the energy consumption among the sensor nodes. In EEDBR, since each sensor node has the information about its neighbours’ depth and the residual energy, a sending node can select the most suitable next hop forwarding nodes. Therefore, the sending node selects a set of forwarding nodes among its neighbours having smaller depth than itself. The set of forwarding nodes is included as a list of IDs in the data packet. Upon receiving the data packet, the forwarding nodes hold the packet for a certain time based on their residual energy. The sensor node, having more residual energy has a short holding time. The holding time (T): = 1−

×

+

where P is the priority value:

=

= 1/ ( ) × ℎ

×

: Residual energy Depthdiff - current node forwards the packet only if it is of lesser depth than the previous node, dsf - distance between the sending and forwarding nodes. The nodes having the same residual energy will have different holding times even for the same packet. IV.3. Novel Energy Efficient Depth Based Routing

Algorithm used for routing in EEDBR All the Packet in the nodes are designed based on SID ” as the identifier of the source node and a PSN Packet Sequence Number” which represents a unique sequence number that is assigned by the source node to the packet and Residual energy which is being calculated as initial energy subtracted by the remaining energy. The operations such as transmitting, receiving, processing, and idle listening consumes different energies in the sensor nodes with different residual energy levels. Therefore, the residual energy information of the sensor nodes needs to be updated frequently and is done in the Knowledge Acquisition Phase. To verify its neighbour list due to node mobility the sender broadcasts the message upon receiving the broadcast message each node updates its neighbour list. The sender also has an expiry time by which if it cannot find a neighbour within a certain time, its clock expires. In the Data Forwarding Phase the data packets are forwarded from a source node towards a destination/sink node on the basis of the depth and the residual energy information of the sensor nodes. Information about the depth of sensor nodes allows the selection of those forwarding nodes which are closer to the sink than the


Energy Efficient Fitness Based Routing [11] is a sender routing protocol in which sender node forwards the packet considering its depth, residual energy, distance between the sending, forwarding nodes and sink. But it does not incorporate the mobility of nodes and the criteria for transmission power and reception power since the residual energy lies in the fact of transmission power and reception power. The proposed Novel Energy Efficient Depth Based Routing is similar to Energy Efficient Fitness Based Routing, but our simulation incorporates node mobility, transmission power and reception power. Algorithm used for Routing in NEEDBR All the Packet in the nodes are designed based on SID” as the identifier of the source node and a PSN Packet Sequence Number” which represents a unique sequence number that is assigned by the source node to the packet and Residual energy which is being calculated as initial energy, subtracted by the remaining energy. The operations such as transmitting, receiving, processing, and idle listening consumes different energies in the sensor nodes with different residual energy levels.


2392


Therefore, the residual energy information of the sensor nodes needs to be updated frequently. To verify its neighbour list due to node mobility, the sender broadcasts the message upon receiving the broadcast message each node updates its neighbour list. The sender also has an expiry time by which if it cannot find a neighbour within a certain time its clock expires. In the process of Data Forwarding the fitness is being calculated by considering the properties of depth residual energy and distance between the sending, forwarding nodes and sink: =

TABLE I SIMULATION SPECIFIC PARAMETERS Protocol DBR, EEDBR, NEEDBR Area 500×500 Maximum no of nodes 100 Node Speed 1-3 m/s Initial energy 1000 J Transmission Power 1 Watt Receiving Power 0.01 Watt Idle Power 0.08 Watt

Depth Based Routing Constant Energy Different Depth

× ℎ( )

where: =

×

ℎ

×

Residual energy Depthdiff current node forwards the packet only if it is of lesser depth than the previous node dsf -distance between the sending and forwarding nodes ℎ = 1/ dfd - distance between the forwarding nodes and sink h(nf) is the inverse of distance between the forwarding nodes and sink because lesser the distance between two nodes, more appropriate will be the node to forward the packet: =

(

×

ℎ

×

Fig. 2. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis in packets. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. Packet delivery ratio increases as the depth increases

)/

IV.4. Holding Time Calculation Priority Value for holding time:

Packet delivery ratio= P=1/g(nf )

Number of Packets Send ×100 Number of Packets Received

The nodes with higher depth and more residual energy waits for more time and the nodes with higher residual energy will wait for less time. Higher the fitness, lesser time the forwarding nodes wait. The ultimate waiting time is calculated as: =

+ =

D is the maximal distance of forwarding node within the upper half of nodes transmission range. V is the velocity of sound.

V.

Simulations Results and Discussion

The simulation in our scenario are based on ns2 with aquasim patch [13] in which we rely on sound for transmission with random way point model.


Fig. 3. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis packets. The simulation has been performed varying the depth and with the same energy level of 1000 Joules. Dropping ratio is the least as the depth increases


2393


Jitter=

Fig. 4. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per second. The simulation has been performed varying the depth of 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules

Throughput =

sending time – last recveived packet time Receiver Count

Fig. 7. The graph has been plotted with No of nodes along X-axis and Average Energy in Joules along Y-axis. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules

(Average packet size×8) Data duration

Average Energy Consumption=

Total Consumed Energy Number of nodes

Different Depth Different Energy levels The diverse values of depth has been analysed along with subsequent increase in energy levels.

Fig. 5. The graph has been plotted with No of nodes along X-axis and Delay along Y-axis in seconds. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules

Fig. 8. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis in packets. The simulation has been performed varying the depth100m, 200m and 300 m and with the varying energy level from 100 Joules 200 Joule and 300 Joules. Packet delivery ratio increases as the depth increases along with energy

Delay = Receiving time(j)- Sending time(j)

Fig. 6. The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis seconds.. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. There is a minimum value of jitter at 150m, 200m, 250m than at depth100m


Fig. 9. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis packets. The simulation has been performed varying the depth 100m, 200m and 300m with the varying energy level of 100 Joules, 200 Joules and 300 Joules. Dropping ratio is the least as the depth increases that is at a depth of 300 at energy 300 Joules


2394


Packet delivery ratio=


Throughput =

Fig. 10. The graph has been plotted with No of nodes along X-axis and Delay along Y-axis in seconds. The simulation has been performed varying the depth 100m, 200m and 300 m with the varying energy level of 100 Joules 200 Joules and 300 Joules. The delay value is the least at a depth of 100 metres and increases at 300 metres and 200 metres

(Average packet size × 8) Data duration

Fig. 13. The graph has been plotted with No of nodes along X-axis and Average_ Energy along Y-axis in joules. The simulation has been performed varying the depth and with the varying energy level from 100 Joules, 200 Joules and 300 Joules. There is a minimum value of Average energy consumption at depth 300m than at 100m and 200 m

Average Energy Consumption =



Energy Efficient Depth Based Routing Constant Energy Different Depth

Fig. 11.The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis in seconds. The simulation has been performed varying the depth 100m,200m and 300m with the varying energy level from 100 Joules.200 Joules and 300Joules There is a minimum value of jitter at depth 300 metres than at 200 metres and 100 metres

Jitter=

Fig. 14. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis. The simulation has been performed varying the depth100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. Packet delivery ratio increases as the depth increases

sending time – last recveived packet time receiver count

Fig. 12. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per second. The simulation has been performed varying the depth and with the varying energy level from 100 Joules. Throughput increases as the depth increases along with energy

Fig. 15. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. Dropping ratio is the least at the depth increases



2395




Jitter=


Fig. 19. The graph has been plotted with No of nodes along X-axis and Average Energy along Y-axis in Joules. The simulation has been performed varying the depth and with the same energy level of 1000 Joules. There is a minimum value of Average energy consumption at depth 100m than at 250m, 150m and 200m

Fig. 16. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per second. The simulation has been performed varying the depth100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. Throughput increases as the depth increases


(Average packet size×8) Throughput = Data duration


Different Depth Different Energy levels

Fig. 20. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis. The simulation has been performed varying the depth and with the varying energy level from 100Joules. Packet delivery ratio increases as the depth increases along with the energy

Fig. 17. The graph has been plotted with No of nodes along X-axis and Delay along Y-axis in seconds. The simulation has been performed varying the depth and with the same energy level of 1000 Joules. The delay value is the least at a depth 100 than at 150,200 and 250




Fig. 21. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis in packets. The simulation has been performed varying the depth and with the varying energy level from 100 Joules. Dropping ratio is the least as the depth increases that is at a depth of 300 metres at energy 300 Joules

Fig. 18. The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis in seconds. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m and with the same energy level of 1000 Joules



2396


Throughput =

Fig. 22. The graph has been plotted with No of nodes along X-axis and Delay along Y-axis. The simulation has been performed varying the depth 100m, 200m and 300m and with the varying energy level from 100 Joules, 200 Joules and 300 Joules. The delay value is the least at depth 300 metres and energy 300 Joules


Fig. 25. The graph has been plotted with No of nodes along X-axis and Average Energy along Y-axis in Joules. The simulation has been performed varying the depth and with the varying energy level from 100 Joules. There is a minimum value of Average energy consumption at depth 100 metres than at 200 metres and 300 metres

Delay = Receiving time(j)- Sending time(j) Average Energy Consumption =


Novel Energy Efficient Depth Based Routing Constant Energy Different Depth

Fig. 23. The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis. The simulation has been performed varying the depth from 100m 200m and 300m with the varying energy level from 100 Joules, 200Joules and 300Joules. There is a minimum value of jitter at depth 300 metres and energy 300 Joules

Jitter=

Fig. 26. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis in packets. The simulation has been performed varying the depth and with the same energy level of 1000 Joules. Packet delivery ratio increases as the depth increases


Packet deliveryratio=

Fig. 24. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per seconds. The simulation has been performed varying the depth from 100m, 200m and 300m with the varying energy level from 100 Joules, 200Joules and 300 Joules. Throughput increases as the depth increases


Fig. 27. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis. The simulation has been performed varying the depth100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. Dropping ratio is the least as the depth increases



2397


Fig. 28. The graph has been plotted with No of nodes along X-axis and Delay in seconds along Y-axis. The simulation has been performed varying the depth100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. The delay value is the least at depth

Fig. 31. The graph has been plotted with No of nodes along X-axis and Average Energy along Y-axis in Joules. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. There is a minimum value of Average energy consumption at depth 100m than at 150m,200m, 250m

Delay = Receiving time(j)- Sending time(j) Average Energy Consumption =


Different Depth Different Energy

Fig. 29. The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis in seconds. The simulation has been performed varying the depth100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. There is a minimum value of jitter at depth 250 m, 200m 150m than at100m

(sending time – last recveived packet time) Jitter= receiver count

Fig. 32. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis in packets. The simulation has been performed varying the depth from 100m, 200m and 300m with the varying energy level from 100 Joules, 200Joules and 300Joules.Packet delivery ratio increases as the depth increases


Fig. 30. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per second. The simulation has been performed varying the depth 100m, 150m, 200m, and 250m with the same energy level of 1000 Joules. Throughput increases as the depth increases

Throughput =


Fig. 33. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis in packets. The simulation has been performed varying the depth at100m, 200m, 300m and with the varying energy level from 100 Joules. Dropping ratio is the least as the depth increases




2398


Fig. 34. The graph has been plotted with No of nodes along X-axis and Delay along Y-axis in seconds. The simulation has been performed varying the depth and with the varying energy level from 100 Joules. The delay value is the least at a depth 100m at 300m and 200m

Fig. 37. The graph has been plotted with No of nodes along X-axis and Average Energy along Y-axis in Joules. The simulation has been performed varying the depth and with the varying energy level from 100 Joules. There is a minimum value of Average energy consumption at depth 300 than at 100 and 200



Overall Performance Metrics

Fig. 35. The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis in seconds. The simulation has been performed varying the depth from 100m, 200m and 300m and with the varying energy level from 100 Joules, 200Joules and 300Joules. There is a minimum value of jitter at depth 300m, 200m than at 100m

Jitter=

Energy 1000 Depth 150

Fig. 38. The graph has been plotted with No of nodes along X-axis and Packet delivery ratio along Y-axis in packets. The simulation has been performed constant depth of 150 metres and with the same energy level of 1000 Joules. Packet delivery ratio is higher in NEEDBR than at EEDBR and DBR

(sending time – last recveived packet time) receiver count


Fig. 36. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per second. The simulation has been performed varying the depth and with the varying energy level of 100 Joules. Throughput increases as the depth increases

Throughput =


Number of Packets Send *100 Number of Packets Received

Fig. 39. The graph has been plotted with No of nodes along X-axis and Dropping Ratio along Y-axis in packets.. The simulation has been performed constant depth and with the constant energy level of 1000 Joules. Dropping ratio is the least for NEEDBR, than at DBR and EEDBR




2399


Throughput =

Fig. 40. The graph has been plotted with No of nodes along X-axis and Delay along Y-axis in seconds. The simulation has been performed same the constant depth of 150 meters and with the constant energy level of 1000 Joules. The delay value is the least at EEDBR and NEEDBR than at DBR


Fig. 43. The graph has been plotted with No of nodes along X-axis and Average Energy along Y-axis in Joules. The simulation has been performed same depth 150 m and with the same energy level of 1000 Joules. There is a minimum value of Average energy consumption by EEDBR is lesser than DBR and NEEDBR




Overall Comparisons

Fig. 41. The graph has been plotted with No of nodes along X-axis and Jitter along Y-axis in seconds. The simulation has been performed with the constant depth 150 metres and with the constant energy level of 1000 Joules. There is a minimum value of jitter at NEEDBR and EEDBR than at DBR

Jitter=


Fig. 44. The graph has been plotted with Transmission range along Xaxis and Throughput along Y-axis in bits per second. Throughput is the maximum at NEEDBR than at EEDBR and DBR.

Fig. 42. The graph has been plotted with No of nodes along X-axis and Throughput along Y-axis in bits per seconds. The simulation has been performed constant depth 150 metres and with the constant energy level of 1000 Joules. Throughput is the maximum at NEEDBR than at EEDBR and DBR


Fig. 45. The graph has been plotted with Transmission range along Xaxis in metres and Delay along Y-axis in seconds. The delay value is the least at EEDBR and NEEDBR than at DBR


2400


Throughput =


Delay = Receiving time(j)- Sending time(j) Packet Size

Delay = Receiving time(j)- Sending time(j) Report Interval

Fig. 49. The graph has been plotted with packet size for 50, 75, 100, 125, 150 along X-axis in bytes and Jitter along Y-axis in seconds. The packet size increases jitter increases but the jitter(Variance of delay) for DBR is higher than EEDBR and NEEDBR

Fig. 46. The graph has been plotted with report interval for 2,3,4,5 and 6 time interval along X-axis in seconds and Packet delivery ratio along Y-axis.in packets. Packet delivery ratio for NEEDBR is very high than EEDBR and DBR at the end of the simulation time for 100 nodes



Jitter=


Pause Time Pause time is the time at which node mobility is stopped.

Fig. 47. The graph has been plotted with report interval for 2,3,4,5 and 6 time interval along X-axis in seconds and Jitter along Y-axis in seconds. The report interval increases jitter increases but the jitter (Variance of delay) it is being measured as the time gap between the packets for DBR is higher than EEDBR and NEEDBR

Jitter=

Fig. 50. The graph has been plotted with number of nodes for 50, 60,70,80,90 100, along X-axis with average energy in joules along Yaxis in seconds. The average energy consumed by NEEDBR is lower than EEDBR and DBR at the end of our simulation time.


Fig. 48. The graph has been plotted with report interval for 2,3,4,5 and 6 time interval along X-axis and Delay along Y-axis seconds. The delay for DBR is higher than EEDBR and NEEDBR. NEEDBR takes into account of the Residual energy and distance between the sender, forwarder and sink. NEEDBR is considered the best


Fig. 51. The graph has been plotted with number of nodes for 50, 60,70,80,90 100, along X-axis with Delay along Y-axis in seconds. The delay for DBR is higher than EEDBR and NEEDBR. NEEDBR takes into account of the Residual energy and distance between the sender, forwarder and sink. NEEDBR outperforms the others



2401


VI.

Conclusion

In this paper simulations are being compared with Depth Based Routing since it is most cited routing protocol in Depth Based analysis where the greedy based approach uses depth as the only criteria for forwarding packets to the sink. The Energy Efficient Depth based Routing considers the depth, Residual Energy and Priority value, based on distance between the forwarding node and sending node. Whereas the Novel Energy Efficient Depth Based Routing considers the depth, Residual Energy and Priority value based on the distance between sending node, forwarding node and sink node where the ultimate aim of delivering the packet lies on transmission towards the nearest sink which is satisfied and the decision rules for holding time of the packet that incorporates the above decision rules is found better for NEEBDR and EEDBR than DBR.

References [1]

Ian F. Akyildiz, Dario Pompili, Tommaso Melodia”Underwater acoustic sensor networks: research challenges” Published by Elsevier Ad Hoc Networks Vol 3 (2005) pg 257–279. [2] J. Partan, J. Kurose, B.N. Levine, A survey of practical issues in underwater networks, in: Proceedings of ACM International Workshop on Underwater Networks (WUWNet), September 2006, pp. 17– 24. [3] M. Ayaz, I. Baig, A. Abdullah, and I. Faye, “A survey on routing techniques in underwater wireless sensor networks,” Journal of International Journal of Distributed Sensor Networks 21 Network and Computer Applications, vol. 34, no. 6, pp. 1908– 1927, 2011. [4] Hadidi, V., Javidan, R., Keshtgary, M., Hadidi, A., A suitable architecture and deployment considerations for shallow water Acoustic sensor Networks, (2011) International Review on Computers and Software (IRECOS), 6 (6), pp. 965-978. [5] I. F. Akyildiz, D. Pompili, and T. Melodia, “State-of-the-art in protocol research for underwater acoustic sensor networks,” in Proceedings of the 1st ACM International Workshop onUnderwater Networks, pp. 7–16, September 2006. [6] Abdul Wahid, Kim Dongkyun “Analyzing Routing Protocols for Underwater Wireless Sensor networks” International Journal of Communication Networks and Information Security (IJCNIS) Vol. 2, No. 3, December 2010. Pg 1999 [7] Mohammad Taghi Kheirabadi and Mohd Murtadha Mohamad “Greedy Routing in Underwater Acoustic Sensor Networks: A Survey Hindawi Publishing Corporation “International Journal of Distributed Sensor Networks Volume 2013, Article ID 701834, 21 pages [8] H. Yan, Z. J. J. Shi, and J. H. Cui, “DBR: depth-based routing for underwater sensor networks,” in Proceedings of the 7th international IFIP-TC6 networking conference on AdHoc and sensor networks, wireless networks, next generation internet, vol.4982, pp. 72–86, 2008. [9] G. Liu and Z. Li, “Depth-based multi-hop routing protocol for underwater sensor network,” in Proceedings of the 2nd International Conference on Industrial Mechatronics and Automation(ICIMA ’10), pp. 268–270, May 2010 [10] A. Wahid and D. Kim, “An energy efficient localization-free routing protocol for underwater wireless sensor networks” International Journal of Distributed Sensor Networks, vol. 2012,Article ID 307246, 11 pages, 2012. [11] Md. Ashrafuddin, Md. Manowarul Islam and Md. Mamun-orRashid “ Energy Efficient Fitness Based Routing Protocol for Underwater Sensor Network I.J. Intelligent Systems and Applications, 2013, 06, 61-69 Published Online May 2013 in MECS.


[12] Zeng, B., Yao, L., Wang, R., An energy efficient deployment scheme for ocean sensor networks, (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 507-513. [13] Peng Xie, Zhong Zhou, Zheng Peng, Hai Yan, Tiansi Hu, JunHong Cui, Zhijie Shi, Yunsi Fei, Shengli Zhou “Aqua-Sim: An NS-2 Based Simulator for Underwater Sensor Networks” OCEANS 2009, MTS/IEEE Biloxi-Marine Technology for Our Future: Global and Local Challenges. IEEE, 2009.

Authors’ information M.A.M College of Engineering. J. V. Anand, is a fulltime research scholar in M.A.M College of Engineering Tiruchy He completed his Bachelor degree Electronics and Communication at Karunya University Coimbatore India in 2009, Post graduation in Communication systems at Hindustan University Chennai, 2011. He joined his research programme in Anna University with M.A.M College as his study centre in 2013. His area of interest is Wireless sensor networks, Adhoc networks. E-mail: [email protected] Dr. S. Titus, professor and Head in the Department of Electrical and Electronics Engineering, in M.A.M College of Engineering Tiruchy. He did his Bachelor’s Degree in Electrical and Electronics Engineering at Sathyabama Engineering College Chennai and Masters Degree in Power Electronics and Drives at Shanmuga Engineering College, Tanjore. He pursued his Doctors of Philosophy in Power system Optimization at Anna University Chennai through Government College of Engineering Coimbatore. His Research interests are power Electronics & Drives, Power System Optimization, and Wireless Sensor Networks. He has good teaching and research experience. He has conducted many AICTE funded research programme in the field power Electronics Drives, wind Energy and Solar Photovoltaic cells. Dr.S.Titus is Life Member of Indian Society for Technical Education (ISTE- LM37286), Member of Institute of Electrical and Electronics Engineering (IEEE-92025358) Member of Institute of Engineers (MIEM-143709-1). E-mail: [email protected]


2402


Study of Energy Efficient Protocols Using Data Aggregation in Wireless Sensor Network Nagendra Nath Giri1, 2, G. Mahadevan3 Abstract – With the wide range of application, and yet exploring, wireless sensor network has attracted many researcher to design a remotely operated sensing device on such areas, where it is not possible for human to reach. However, in this process, one of the most troubleshooting issue in WSN becomes its power constraint as the nodes are backed up by battery that cannot be change or difficult to recharge in wireless environment. From past decade there has considerable amount of research work addressing towards the power consumption and depletion issues in WSN. Hence, this paper discusses only the standard techniques that have evolved in the past with a claim of efficient power preservation techniques. The paper has also discussed some of the recent techniques identified that has the potentials of saving the energy depletion. Finally, the trade-offs towards the prior work done and current need is discussed in this paper. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Battery, Energy Conservation, Network Lifetime, Wireless Sensor Network

I.

Introduction

The areas of Wireless Sensor Networks (WSN) have witnessed a far-reaching abundance of utilities and interest in current research and industry. Such networks are usually deployed over a diverse geographic area ranging from meters to several hundreds of kilometers through deploying small, low cost devices that can observe and influence the physical world around them by gathering status information and then transforming this into radio signals. Such information is then transmitted to a local sink (also called as base station) that may be connected to a gateway to send the information to external network such as internet [1]-[70]. The information thus received may be analyzed and appropriate decision/action may be taken depending on the type of application. Typically, a sensor mote is a tiny device that includes three basic components: i) a sensing subsystem for data acquisition from the physical surrounding environment, ii) a processing subsystem for local data processing and storage, and iii) also a wireless communication subsystem for data transmission. In addition, a power source supplies the energy needed by the sensing device to perform the programmed tasks. This power source often consists of a battery with a limited energy budget. Unfortunately, these sensors badly experiences resources constraints and power limitation as these sensors are usually deployed in remote places that are not easy to reach [2]. Inevitably, there is a finite network life time duration for such devices and new sensor motes have to be deployed to substitute the old ones. It is some of these limitations that has shown a rising interest from the scientific community to research in such devises that would enhance the durability and coverage Manuscript received and revised September 2013, accepted October 2013

2403

of the devices by using various new technology developments in this area. The main significance is on enhancing the life time of sensor motes and to deploy the restricted resources efficiently by adopting mechanisms, algorithms and protocols that consider these limited resources as main priorities and challenges to produce efficient and reliable networks. Moreover, it could be exponentially inconvenient to recharge the battery as the sensor motes may be deployed in a hostile or impractical environment. At the network layer, the purpose is to explore ways for energy efficient route setup and reliable relaying of data from the sensor motes to the sink, in order to maximize the lifetime of the network. The major differences between the wireless sensor network and the traditional wireless network sensors are very sensitive to energy consumption. Moreover, the performance of the sensor network applications highly depends on the lifetime of the network. In a well-designed network, the sensors in a certain area exhibit similar behaviors to achieve energy balance. In other words, when one sensor dies, it can be expected the neighbors of this sensor mote will run out of energy very soon, since they will have to take over the responsibilities of that sensor and we expect the lifetime of several months to be several years. Thus, energy saving is crucial in designing life time wireless sensor networks. In the proposed research paper, we highlight an extensive review of existing system or mechanism towards energy conservation. Wireless sensor networks make use of a competent form of technology that has no structures or protocols or adhering to a specific standard. This makes it an interesting topic for research and thus considerable resources are being placed on its study by research scholars and manufacturer’s alike.


Nagendra Nath Giri, G. Mahadevan

There are a number of applications and utilities for such devices and networks such as; military, health monitoring, indoor and outdoor fire fighting applications, security applications, and environmental, agricultural, climate changes and studying animal behavior. In section 2 we give an overview of generic issues in WSN. Section 3 highlights about the power issues in WSN describing some of the prominent attributes considered in previous research work. The technique of energy conservation is discussed in Section 4 followed by research gap illustration in Section 5. Section 6 makes some concluding remarks.

II.

sensor motes has to consider network density as a factor to determine the required number of sensors to cover a certain area, which depends on the nature of application as well. The density can be calculated by [5]:





  R   N R2 / A

(2)

where N is the number of sensors; R is the sensor range. μ(R) is the density function to find the number of sensors within sensor range, A is the area.

Visualizing Issues in WSN

A wireless sensor network can be elaborated as: ‘a network of devices’ denoted as sensor motes, which can perceive the environment and communicate the information collected from the monitored field (e.g., an area or volume) through wireless links. The data is forwarded, possibly via multiple hops, to a sink (sometimes denoted as controller or monitor) that can use it locally or is connected to other networks (e.g., the Internet) through a network gateway. The sensor motes can be stationary or moving sometimes. The sensor motes can be aware of their location or not. The typical schema of the wireless sensor network is as exhibited in Fig. 1. The sensor mote can be homogeneous or may act as heterogeneous as supported by different applications [3]. Sensor networks are a distributed small sensing devices provided with shortrange wireless communications, memory and processors. This kind of network differs from conventional ad-hoc networks in the following way:  Number of sensor motes deployed in WSN is higher  Sensor motes are densely deployed and usually in harsh environment  Sensor motes have finite and limited life span  Topology of the network may change frequently  WSN work in a broadcast fashion, while ad-hoc is point to point  WSN has limited power and range resources  May not have a global ID To build a WSN some supporting attributed assists in influencing the design [3][4]: 1. Fault Tolerance: It is the ability to adapt sensor mote failures without affecting the cumulative network function. Fault tolerance could be calculated through the following equation:

Rk  t   exp  k t 

(1)

where Rk is the reliability (fault tolerance), λk is the fault rate for sensor mote k, t is the time period. 2. Scalability: It represents the cumulative network ability to maximize the dimension of the network or adjoin a new number of sensor motes that is very important, but scalability or increasing number of


Fig. 1. Typical schema of the wireless sensor network

3.

Product cost: Although the core unit of sensor is cost effective but the cumulative cost of the entire device (sensor sensor mote) is highly expensive. The cost factor increases with the increase in features within them. 4. Hardware Constraints: Essentially, a sensors consist of; sensing unit (sensor, ADC), processing unit (simple micro-controller, small memory), transceiver unit with short range communication capability and power unit (usually it is two AA batteries). Some applications have extra components such as; location finding system (e.g., GPS device), power generator (e.g. solar panels) and mobilize. 5. Power consumption: WSNs consume power in three parts: i) Sensing: This is almost fixed power. ii) Data communication: The major energy is used in this part. A sensor transceiver comprises of:  Transmitter and receiver which is consumed approximately the same power.  Mixer, frequency synthesizer, voltage control oscillator.  A power amplifier. All these consume sensor mote power in addition to the start up power. The startup power can be calculated by the Eq. (3) [5]: PC  NT  PT TON  TST   POUT TON     N R  PR  RON  RST  

(3)


2404


where PT/ PR are the consumed power by transmitter and receiver respectively, POUT is the power at transmitted antenna, TON/RON is transmitter/receiver wake up time, TST /RST is transmitter/receiver start-up time and NT/NR is number of times transmitter or receiver is switched on per unit time, which depends on the task and medium access control (MAC) scheme. iii) Data Processing: The power consumption in data processing is much less than power consumption for data communication. Due to the low cost and size requirements of sensor manufacturing, CMOS technology is usually used for micro-processor and this limits the expended power thus giving greater efficiency. Other factors that influence the design are security, network type, and quality of service, self organizing network, data rate and throughout, routing, modeling, size and application. Support for very large numbers of unattended autonomous nodes, adaptability to environment and task dynamics in are the fundamental challenges of WSNs as they have limitations of dynamic network topology, limited battery power, and constrained wireless channel capacity. The configuration of sensor nodes would regularly change in terms of position; reach ability, power availability, and even task details. Because these sensor nodes interact with the physical surroundings, they would experience a significant range of task dynamics. Node mobility, node failures, and environmental obstructions cause a high degree of dynamics in WSN. This includes frequent network topology changes and network partitions. The partitioned sub networks need to persistently running independently and the management protocol must be robust enough to adapt this situation. Sensor motes are energy constrained and subject to inhospitable environments; they can store or reproduce very limited energy from the surroundings. That is why they fail due to depleted batteries or due to environmental influences. Restricted size and energy typically means restricted resources (CPU performance, memory, wireless communication bandwidth and range). Thus, there is a need to guarantee that network protocol overhead is kept to a minimum so that energy is conserved. The number of packets transmitted/received/processed at each node should be abridged since energy is consumed in these operations. Other issue in WSN is that the transmission distance of micro sensor nodes can be very short in compare to the conventional macro sensors and other handheld devices. So, the transmitted power is low, and hence requires significantly different architectures for intelligent resource efficiency. While some applications such as image sensors demand a high transmission data rate, most sensing applications will require very low data rates compared to conventional multimedia traffic. Existing radio architectures are not suitable for these very low data rates since they have significant energy overhead in powering on and off. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Wireless sensor network will exist with the plenty of nodes per user (or more). At such heavy quantity, it is impossible to pay attention to any individual node. Furthermore, even if it is possible to consider each node, sensors may be inaccessible, since they are incorporated in physical structures, or thrown into hostile terrain. Thus, for such a system to be effective, it must provide unattended operation and self-configuration functionality. There are many large scales unattended systems exist now a day. For example, automated process and pharmaceutical companies may contain hundreds of largely unsupervised computers as part of SCADA. It can still monitor the different process variables according to the system design. In our case, here, WSN is even bigger and wireless, so we require more consideration. WSN middleware should support the implementation and basic operation of WSN as discussed above. However, this is not a trivial task, as WSNs have some exceptional properties different from adhoc networks. To illustrate this point, the differences between sensor networks and adhoc networks should be noted as i) Sensor motes are densely deployed in WSN as compare to the adhoc network nodes, ii) The WSN has considerably larger number of sensor nodes than adhoc network, iii) WSN network topology alters more frequently than adhoc network, iv) WSN nodes are inclined to fail more than adhoc network nodes, and v) WSN nodes are inadequate in resources such as limited in power, and memory.

III. Congestion Issues In WSN The instance of congestion issues also gives rise to energy issues in wireless sensor network. The more the network is congested, the more retransmission happens for the purpose of establishing the communication with the destination node leading to drainage of unwanted power. First of all, it is significant to split traffic on downstream (from the base station / server to the sensor motes) and upstream (from the sensor motes to the base sink / server). It is quite obvious that the downstream traffic has one-to-many nature while upstream many-toone. The upstream traffic can be further categorized into four sections: event-based, continues, query-based and hybrid [6]. It is easy to imagine a sensor network for example spread through a large deployment area. Sensor motes are distributed randomly through the distribution zone. The sensing devices are highly energy constrained, but it is very difficult to maintain them. Because of randomness nature, sometimes it is quite difficult to forecast the exact quantity of sensors that are supposed to be connected in stabilized links with one router or some transitional networking device that plays a significant role of the sub-network coordinator. There can be numerous scenarios of network performance, effluent from the hybrid data delivery including continuous, event-based, and query-based [6].


2405


Some of the challenging congestion scenarios are as follows:  Sensors regularly exchange service information or data and sleep for most periods of time. If the network is idle most of the time, one intermediate device (proxy) can be considered to support 10000 of nodes and process all information from them properly.  Another challenging congestion environment is when some event took place and majority of sensor nodes in one particular area have to send information to the server. In this situation, a congestion collapse can take place. All information will be important and should be delivered in time. This information will be supportive not only for the minimized extent of damage but also for investigation of the incidence. This scenario effects greatly on amount of intermediate devices in the network as if all sensor motes at once will become active. Then limits of the intermediate devices buffer size and processing capability will become critical parameters of the whole system.  Server can query information at specific time, nodes can start to send information, interrupt each other, and collision in the channel will occur. Even if it will be enough intermediate devices and there won’t be any buffer queues, this will definitely lead to congestion. All sensors are divided between routers, proxies or some other intermediate devices which play the role of sub networks coordinators [5]. At each moment sensor can connect some sub network or disconnect from it (for example, one device can broke up, while a new one can be sent to the territory).

IV.

Power Issues in WSN

Energy in WSN is a very scarce resource for such sensor systems and has to be managed wisely in order to extend the life of the sensor motes for the duration of a particular mission. Energy consumption in a sensor mote could be due to either “useful” or “wasteful” sources [7]. Useful energy consumption may occur due to transmitting or receiving data, processing query requests, and forwarding queries and data to neighboring sensor motes. Wasteful energy consumption can occur due to one or more of the following facts. One of the major sources of energy waste is ‘idle listening’ [8], that is, (listening to an idle channel in order to receive possible traffic) and secondly reason for energy waste is ‘collision’ [9] (When a sensor mote receives more than one packet at the same time, these packets are termed collided), even when they coincide only partially. All packets that cause the collision have to be discarded and retransmissions of these packets are required which increase the energy consumption. The next reason for energy waste is ‘overhearing’ [10] (a sensor mote receives packets that are destined to other Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

sensor motes). The fourth one occurs as a result of ‘control-packet overhead’ [7] (a minimal number of control packets should be used to make a data transmission). Finally, for energy waste is ‘over-emitting’ [11], which is caused by the transmission of a message when the destination sensor mote is not ready. Considering the above-mentioned facts, a correctly designed protocol must be considered to prevent these energy wastes.

V.

Approaches of Energy Conservation

Based on the discusses issue and breakdowns caused due to energy factors in WSN, several techniques [12][13] have to be exploited, even simultaneously, to reduce the power consumption in wireless sensor networks. At a very general level, two main enabling techniques namely: duty cycling and data-driven techniques were identified along with data driven techniques. A. Duty-cycling: Normally, a sensor radio has 4 operating modes: transmission, reception, idle listening and sleep [14]. Measurements showed that the most power consumption is due to transmission and in most cases; the power consumption in the idle mode is approximately similar to receiving mode. On the contrary, the energy consumption in sleep mode is much lower. Duty-cycling can be achieved through two different and complementary approaches. From one side, it is possible to exploit sensor mote redundancy which is typical in sensor networks and adaptively select only a minimum subset of sensor motes to remain active for maintaining connectivity. Sensor motes that are not currently needed for ensuring connectivity can go to sleep and save energy. Finding the optimal subset of sensor motes that guarantee connectivity is called topology control [15]. On the other hand, active sensor motes (i.e. sensor motes selected by the topology control protocol) do not need to maintain their radio continuously on. They can switch off the radio (i.e. put it in the low-power sleep mode) when there is no network activity, thus alternating between sleep and wakeup periods. Throughout duty cycling operated on active sensor motes as power management will be considered. Therefore, topology control and power management are complementary techniques that implement duty cycling with different granularity. However, the main issue associated with on-demand schemes is how to inform the sleeping sensor mote that some other sensor motes are willing to communicate with it. To this end, such schemes typically use multiple radios with different energy/performance tradeoffs (i.e. a low-rate and low power radio for signaling and a high-rate but more power hungry radio for data communication). Examples are e.g. Geographical Adaptive Fidelity (GAF) [15], Geographic Random Forwarding (GeRaF) [16], Span [17], Adaptive Self-Configuring Sensor Networks Topologies (ASCENT) [18].


2406


B. MAC Protocols for WSNS: The deployment of the MAC protocol in a WSN is subject to various constraints such as energy, topology, and network changes. Minimizing energy to extend the network lifetime is its primary goal. The design of the MAC protocol should prevent energy wastage due to packet collisions, overhearing, excessive retransmissions, control overheads, and idle listening. A wide range of MAC protocols have been proposed to achieve high channel utilization, and energy efficiency. However, both TRAMA [19] and Z-MAC [21] require a random access period and a schedule exchange period. In addition, time synchronization must be achieved in the network. In comparison with other contention-based protocols, TRAMA has higher delay and is suited for applications that are not time sensitive. B-MAC [20] and Z-MAC both adapt well to topology changes while TRAMA does not. B-MAC has higher throughput under low contention environment while Z-MAC performs better in high contention environments. Low power reservation-based MAC, low power distributed MAC, and TRAMA minimize energy with sleep cycles when sensor motes do not have data to transmit or receive. CC-MAC [24], on the other hand, filters correlated information and prioritizes packets. Although various MAC protocols have been proposed, there is possible future work for system performance optimization. Cross-layer optimization is an area that needs to be explored more extensively. Cross-layer interaction can reduce packet overhead on each of the layers, thereby reducing energy consumption. Interaction with the MAC layer can provide other layers with congestion control information and enhance route selection. Many existing MAC protocols address performance studies of static sensor motes, but there is still a lack of literature for comparing these protocols in a mobile network. By enhancing the MAC protocol, one can significantly improve communication reliability and energy efficiency. MAC protocols usage is shown in Table I and the other recent approaches for enhancing cumulative network lifetime of WSN are as following Table I. C. Data Driven Approach: Data-driven approaches can be used to enhance the energy efficiency even more. In fact, data sensing impacts on sensor motes’ energy

Attributes

TRAMA [19]

Channel access mode

Time-slotted random and scheduled access Yes

Time Synchronization Protocol type Energy conservation

consumption in two ways: Unneeded samples. Sampled data generally have strong spatial and/or temporal correlations [25], Therefore, there is no need to communicate the redundant information to the sink causing to decrease the power consumption of the sensing subsystem. Reducing communication is not enough when the sensor itself is power hungry.

VI.

Problems in Biological Utility

The WSN based utilities have made tremendous impact for biological problems. Some of these include biological task mapping and scheduling, biomedical signal monitoring etc. A brief illustration of these applications has been presented in this section.  Biological Task Mapping: WSNs find widespread applications in the area of biological sensing. Specifically, there is recent research going on in the concept of “labs on a chip”, supported by latest technologies like nano-techniques. The use of WSNs for biological applications have been accelerated due to the advancements in Micro Electro-Mechanical Systems (MEMS), embedded systems, microcontrollers and various wireless communication technologies. Few authors [34] have presented a Biological Task Mapping and Scheduling algorithm, in which a group of nodes was used to execute an application. In this work, it was assumed that the application could be broken down into smaller tasks with different weights and hence a general model was considered for complex applications. In order to achieve and enhance the desired performance objectives, assigning of resources to tasks is known as Task mapping and the sequence of execution of the tasks is known as task scheduling. Task mapping and scheduling are of much importance in high performance computing. A near-optimal solution for task mapping can be obtained using heuristic techniques. But the constrained resources of WSNs require the design objectives to be different. However the simulation model that was built was applicable only if the nodes in the WSN were separated with a distance set to 150m.

TABLE I MAJOR MAC PROTOCOLS USED Low power Low power B-MAC [20] Z-MAC [21] Reservation-based Distributed MAC CC-MAC [24] MAC[22] [23] Clear channel Time-slotted Time-slotted Multi-channel access Time-slotted assessment random and contention contention (CCA) scheduled based slot reservation based slot reservation access No Yes No No No

TDMA/CSMA CSMA Schedule sleep Low power intervals and listening (LPL) turn radio off time for energy when idle, collision efficiency avoidance scheduling

TDMA/CSMA Low power listening (LPL) time for energy efficiency


TDMA CSMA/CA CSMA/CA Sensor motes sleep Power saving mode Dropping highly and wake up based on with low power correlated information assigned data slot wake up radio for packet to reduce energy channel listening and use in transmission normal radio for data transmission


2407


Authors Zhang,Li [26]

Publisher Springer, 2012

Clad, Gallais [27]

IEEE, 2012

Roslin, Gomathy[28] Azimi[29]

European Journal of Scientific Research, 2012 Journal of Academic and Applied Studies, 2012 International Journal Of Advanced Smart Sensor Network Systems, 2012 International Conference of the Chilean Computer Science Society,2011 Int. Jr. of Advanced Computer Engineering & Architecture, 2012 International Journal of Computer Applications, 2012

Singh, Sharma [30] Abreu, Arroyo [31] Ray, De [32]

Rani [33]

TABLE II RECENT RESEARCH WORK Problem Focused Technique Used Evaluation of Energy Designed a new Consumption Stochastic Model Energy during maximum leaf spanning Data collection tree Energy Efficient Neural network Topology Control Energy Consumption Self-Organizing Map Energy efficient routing scheme

Genetic Algorithm

minimum energy network connectivity (MENC) problem Energy Efficient Cluster Head Selection

Particle Optimization

Energy Optimization

Fuzzy Logic

 Biomedical Signal Monitoring: WSNs have revolutionized the field of medicine in many ways. Telemedicine is the field which involves the treatment and care of patients from a distance and also aids in biomedical diagnosis. The application of WSNs has significantly improved this field. The basic principles and features required at the time of development of a functional model for the monitoring of biological signals have been presented in [35]. To develop modern equipments for monitoring patients in remote places using wireless technologies, the network topology, sensors specific signal reception and analysis has been considered.

VII.

Problems in Commercial Utilities

Some of the commercial applications of WSNs include vehicular monitoring, cultural property protection, event detection and structural health monitoring. These applications have a profound impact on ordinary day-today affairs:  Smart Parking: WSNs are widely used in the applications like intelligent parking for the purposes such as effective usage of existing parking lots instead of making expensive investments in new installations and to make provisions for coupling with cheap sensor nodes which can track the vehicles effectively. Existing solution for parking application uses magnetometers and video cameras. The detections of magnetometers are not very accurate as they are influenced by environmental factors. Video camera which is the alternate is expensive and it is not feasible to transmit large amount of data in a wireless environment through multiple hops. Another factor which affects the application of magnetometers and video cameras is that in a parking lot, apart from entry and exit of vehicles there may be other moving objects, which is a great challenge. Detection of vehicles in a parking lot using magnetic sensors along with ultrasonic sensors together has been presented in

Enhanced version

Swarm

LEACH

Remarks Result is not optimized fully QoS parameters are not evaluated QoS parameters, resource variation not addresses QoS parameters, resource variation not addresses QoS parameters, resource variation not addresses QoS parameters, resource variation not addresses, FND not optimized Works only in static sensor mote scenario, QoS parameters, resource Good Results but QoS Parameters are not addressed

[36]. It was proved that accurate vehicular detection was possible with the combined use of ultrasonic sensors and magnetometers but it did not provide any solution for better parking management. A WSN based vehicle parking system has been presented in [37]. Monitoring of remote parking, mechanism for parking reservation and automated guidance are some of the latest features provided by the system. However, the system should be made fault tolerant, by incorporating mechanisms for identifying defaulters.  Vehicular Telemetric: A detailed overview of vehicular telemetric over heterogeneous wireless sensor networks has been presented in [38]. In this work, an advanced architecture that collaboratively uses multiple radios and access technologies, known as advance heterogeneous vehicular network architecture has been discussed. Sufficient light has been thrown on the various challenges and factors involved in the development of the functional components of advance heterogeneous vehicular network and its related protocols. These included radio link control and congestion control, routing, security and other application development. In order to realize advance heterogeneous vehicular network architecture for vehicular telemetric over heterogeneous wireless networks the research issues to be explored include enhanced multi-channel MAC protocols for dedicated short range communications, dynamic spectrum sharing between dedicated short range communications and WiMax, Heterogeneous wireless access for vehicular telemetric, multimedia transmission and QoS support and data congestion in vehicular telemetric.  Security of Intra-Car: Fuel efficiency and reduction in the weight of automotives can be achieved by replacing wired sensors and their cables with wireless sensors. However, the inherent vulnerability of the wireless platform makes the security issues of such a replacement, highly questionable. Security problems



2408


for intra-car wireless sensor networks have been addressed in [39]. In this work, selection of appropriate security algorithms for WSNs using a systematic methodology and determination of the best combination with regard to execution time and security has been presented.  Event Detection: Tracking is a typical characteristic of wireless sensor networks, especially for instant tracking of events. Much work has been done in WSN, with sensor nodes having identical sensing units. However, the utilization of different types of sensor nodes is an area yet to be explored. A fully distributed protocol Collaborative Event Detection and Tracking (Collect), for event detection and tracking in wireless heterogeneous sensor networks has been presented in [40]. However, solutions to sensor node deployment, data dissemination and routing in Wireless Heterogeneous Sensor Networks (WHSNs) are the issues yet to be addressed.  Structural Health Monitoring: The process of detection of damage for civil, aerospace and other engineering systems is referred to as Structural Health Monitoring (SHM). Any change in the material or geometric properties of these systems due to internal factors (aging) or external factors (natural calamities, pollution) is termed as damage. The normal operation of an SHM system includes low power, long-term monitoring of a structure to provide periodic updation of its health condition. However, during critical events such as earthquakes and other natural disasters, real-time rapid structural conditional screening can be done using SHM system. A WSN based application for long-term, online SHM based information processing approach is presented in [41]. A novel WSN based application for SHM is presented in [42]. In order to do away with the limitations of traditional sensing networks, both the power and data interrogation commands are transmitted through a mobile agent, which is sent to each sensor node to perform individual functions. Prototype systems used to interrogate capacitive-based and impedance-based sensors for SHM applications have been discussed in this paper. The construction of WSN platform with vibration sensing and Global Positioning System (GPS) positioning for SHM application has been presented in [43]. The challenges involved in WSN based application for SHM include rigid bandwidth requirements, extended network lifetime and limiting multi-hop data exchange.

VIII. Problems in Environmental Utilities Environmental applications include the monitoring of atmospheric parameters, tracking of the movements of birds and animals, forest fire detection, habitat surveillance etc.:  Greenhouse Monitoring: To ensure that the automation system in a greenhouse works properly, it is necessary to measure the local climate parameters

at various points of observation in different parts of the big greenhouse. This work if done using a wired network will make the entire system clumsy and costly. However, a WSN based application for the same purpose using several small size sensor nodes equipped with radio would be a cost effective solution. Such an application has been developed in [44]. Data analysis, DSP based control solutions and more complex network setups are the areas yet to be explored.  Habitat Surveillance: WSNs find widespread application in habitat surveillance compared to other monitoring methods due to high deployment density and self-organization of the sensor nodes. The advantage with WSN is that the invisible placement of sensor nodes in the habitat does not leave any noticeable mark which might affect the behavior pattern of the inhabitants. A WSN based application in combination with General Packet Radio Service (GPRS) for habitat monitoring is introduced in [45]. The details of a sensor node that made use of the combination of ARM technology and IEEE 502.15.4 has been given. This paper addressed the energy management issue and developed a low-weight, constant duty cycle policy for energy management. However, developing a WSN based application that will never affect the biological behavior of the inhabitant species is very important, and hence a challenge to be considered.

IX.

Problems in Healthcare Utilities

WSNs are very efficient in supporting various day-today applications. WSN based technologies have revolutionized home and elderly healthcare applications. Physiological parameters of patients can be monitored remotely by physicians and caretakers without affecting the patients’ activities. This has resulted in reduction of costs, improvement of equipments and better management of patients reaping huge commercial benefits. These technologies have significantly minimized human errors, allowed better understanding into origin of diseases and has helped in devising methods for rehabilitation, recovery and the impacts of drug therapy. The recent developments in the application of WSN in healthcare are being presented. The implementation and analysis of a WSN based e-Health application has been described in [46]. The main research issue to be addressed is to increase the degree of awareness of home assistants, caregivers, primary healthcare centers, to understand the patients’ health and activity status to quickly discern and decide on the required action. A simple localization algorithm based on sensor data and Received Signal Strength Indicator (RSSI) was presented. This algorithm was proved experimentally to work fine in home environment. However, the use of multi-sensor analysis, that is expected to give better accuracy, is an area yet to be explored.



2409


A qualitative research on the perceptions and acceptance of elderly persons regarding the usage of WSN for assisting their healthcare is done in [47]. A light-weight, low-cost WSN based home healthcare monitor has been developed in [48]. An attempt to integrate the WSN technology and public communication networks in order to develop a healthcare system for elderly people at home without disturbing their routine activities has been presented in [49]. Improved performance with minimum decision delay and good accuracy using Hidden Markov Model is yet to be addressed. A WSN based home healthcare application is developed in [50]. The main issue that was considered in this research is the development of a working model of home healthcare monitoring system with efficient power, reliability and bandwidth. A WSN based prototype sensor network for monitoring of health, with sensors for heart activity, using 802.15.4 complaint network nodes is described in [51]. The issues regarding its implementation have also been discussed. The paper also describes the hardware and software organization of the presented system and provides solutions for synchronization of time, management of power and on-chip signal processing. However, the areas that are yet to be addressed are improvement in QoS of wireless communication, standardization of interfaces and interoperability. Specific limitations and new applications of the technology can be determined by in-depth study of different medical conditions in clinical and ambulatory settings. Reference [52] presents the micro Subscription Management System (μSMS) middleware, using an event-based service model. This novel approach meets the design constraints of limited resources, efficiency, scalability, dependability and low power consumption by implementing a dynamic memory kernel and a mechanism of variable payload multiplexing for the information events to provide better services. It was observed that application of this approach yielded best results for e-health applications. For continuous and real-time monitoring of health, the authors have developed a smart shirt which measured ECG (Electro Cardio Gram) and acceleration signals [53]. The shirt was made up of conductive fabrics to obtain the body signal as electrodes and consisted of sensors for online health data monitoring. The observed and measured data are transmitted in an ad-hoc network for remote monitoring.

A detailed survey on the latest developments in WSN based industrial applications is presented in [54]. The applications of WSN in petrochemical industry based on ZigBee technology has been discussed in [55]. The application of WSN for safety monitoring of coal mines has been described in [56]. Development of antiinterference and explosion-proof hardware is the focus of further research. A WSN based remote online Automatic Meter Reading (AMR) system is presented in [57]. Latest technologies like wireless RF, ZigBee modules and Code Division Multiple Access (CDMA) telecommunication are used in the remote metering technology. Direct access or physical reading of meters, in order to transmit the usage of parameters like electricity, gas, water etc. is not possible always. Hence, the solutions provided by WSN based remote meters have remarkably changed the way companies, organizations and individuals monitor water, gas and other resources. WSN based real-time and distant energy monitoring and fault diagnosis in industrial motor system has been presented in [58]. Here the electrical signal based motor signature analysis has been integrated with WSNs for best results. Sensor fusion of various sensor measurements, a comprehensive performance analysis of WSNs in industrial environments, evaluation of the effect of the network-error-control mechanisms are the areas yet to be explored. A WSN based approach is developed for detection of faults in metal cutting process in [59]. The machine tools can be maintained in good condition and the occurrence of wearing of tool can be delayed by using appropriate monitoring systems. A WSN based digital system for evaluation of energy usage, condition monitoring, diagnosis and supervisory control for electric systems with Dynamic Power Management (DPM) have been presented in [60]. Two hardware topologies used for signal acquisition, processing and transmission form the basis of this system. They are ISMs (Intelligent Sensor Modules) and RDAUs (Remote Data Acquisition Units). A Dynamic Power Management Protocol is implemented by sensor nodes, to extend the WSN lifetime. A WSN based security system for a power plant using human motion sensor has been presented in [61]. Detection of trespassers and sending notification to administrator is the important function of the system. A WSN based system for measuring and monitoring water quality is presented in [62]. Design of complex underwater acoustic sensor networks that can be used in deep waters is an area yet to be addressed.

XI. X.

Problems in Industrial Utilities

Nowadays, industrial applications are built on distributed architectures and they are required to be inexpensive, flexible and dependable. The system’s performance can be improved by interfacing sensors and actuators directly to the industrial communication network, as data and diagnostics can be made accessible to many systems and also shared on the web.

Problems in Military Utilities

WSNs play a vital role in military Command, Control, Communications, Computing, Intelligence, Surveillance, Reconnaissance and Targeting (C4ISRT) systems. Few challenges faced by WSNs on the battlefield are addressed in [63]. In the battlefield, the WSNs are prone to the attacks, where either the data or corrupting control devices are attacked, leading to large amount of energy consumption and finally to the exit of nodes from work.



2410


The energy efficiency of sensor nodes and the correct modelling of energy consumption are the research issues yet to be explored. WSN based collaborative target detection with reactive mobility has been presented in [64]. A sensor movement scheduling algorithm was developed and its effectiveness was proved using extensive simulations. WSNs have found application in very critical applications such as object detection and tracking. These applications require high detection probability, low false alarm rate and bounded detection delay.

XII.

Discovering Trade-Offs

As one of the most important assets, energy is one of the most critical resources for WSNs. Majority of works in the literatures about WSN routing have been concentrated on energy conservations as an important optimization goal. However, merely saving energy is not enough to effectively prolong the network lifetime. The uneven energy depletion often results in network partition and low coverage ratio which deteriorate the performance. Energy saving in wireless sensor networks has attracted a lot of attention in the recent years and introduced unique challenges compared to traditional wired networks. Extensive research has been conducted to address these limitations by developing schemes that can improve resource efficiency. Hence, the any proposed system should be planned only by implementing in order to overcome the following tradeoffs that was witnessed in the past research work as illustrated below:  The design of a WSN platform must deal with challenges n energy efficiency, cost, and application requirements. It requires the optimization of both the hardware and software to make a WSN efficient. However, some work which is related to hardware includes using low cost tiny sensor motes while software addresses issues such as network lifetime, robustness, self-organization, security, fault tolerance, and middleware. But in reality the application requirements vary in terms of computation, storage, and user interface and consequently there is no single platform that can be applied to all applications. Sensor motes can fail at any time due to hardware, software, or communication reasons. It is important that there are services to handle these failures before and after they occur.  Cross-layer designs improve performance and optimize interaction between layers. Cross-layer design considers the sharing of information across layers. But very few works are focused on collaboration between all the layers to achieve higher energy saving, network performance, and extend network lifetime.  The physical layer in a WSN must be energy efficient. The physical-layer design starts with the design of the

radio. The design or selection of a radio is very important because the radio can impact the performance of the other protocol layers. Minimizing the energy consumption at the physical layer requires that the circuitry energy and transmission energy be optimized. But a few work is actually focused on designing low power radio design with emerging technologies, exploring ultra-wideband techniques as an alternative for communication, creating simple modulation schemes to reduce synchronization and energy cost, determining the optimal transmission power, and building more energy-efficient protocols and algorithms  Majority of the research work in the area of WSN are related to design of routing protocols as part of power conservation approach. Important considerations for these routing protocols are energy efficiency along with traffic flows. While going through papers published earlier, two categories of routing approaches are explored: location-based routing and cluster-based routing. Location-based routing considers sensor mote location to route data. Clusterbased routing employs cluster heads to do data aggregation and relay the information to the base station. There is little research in QoS routing in sensor networks. QoS guarantees end-to-end delay and energy efficient routing. In applications where sensor motes are mobile, new routing protocols are needed to handle frequent topology changes and reliable delivery.  Provisioning, management, and control services are needed to sustain network connectivity and maintain operations. Efficient algorithms can reduce the cost of localization while sensor motes are able to selforganize and identify themselves in some spatially coordinated system. But no such benchmarked work is found to use optimization technique in extending coverage with energy efficient approach considering QoS parameters. The recent study conducted by Gilbert [65] has discussed many open research issues e.g. Biological Applications, Commercial Applications, Environmental Applications, Healthcare Applications, Industrial Applications, and Military Applications.

XIII. Conclusion This paper highlights an overview of the broad spectrum of various issues, especially power-related issues in the area of WSN. The application of WSN in the multiple real-time area has been witnessed to encounter various critical issues that is usually due to power depletion. Although with the advancement of various techniques explored by previous researchers has yielded some better solution towards preservation of energy consumption in WSN, but the accuracy and reliability of the existing solution cannot be predicted by the various dynamic external scenario in WSN when it comes to real time.



2411


Although there are massive number of research work done in past, but this paper has attempted to introduce the prime techniques usually adopted as a standards by majority of the current researchers.

References [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8] [9]

[10]

[11]

[12]

[13]

[14]

[15] [16]

[17]

[18]

[19]

D. Waltenegus, P. Christian, “Fundamentals of Wireless Sensor Networks: Theory and Practice”, John Wiley & Sons, Technology & Engineering -, pp.336, 2010. S. Christopher, “Capabilities and Limitations of Urban Remote Sensing”, Retreived from, ftp://ftp.ldeo.columbia.edu/pub/small/PUBS/SmallADurbanRS.pdf. B.Chiara, C. Andrea, D.Dardari, R.Verdone, “An Overview onWireless Sensor Networks Technology and Evolution, Sensors Vol. 9, pp.6869-6896, 2009. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, “Wireless sensor networks: a survey, Computer Networks”, Vol.38 pp.393–422, 2002. O.Younis, S.Fahmy, “Distributed Clustering for Scalable, LongLived Sensor Networks”, Computer Science Technical Report. pp. 1575, 2003. V. Vijayaraja.V, R. R.Hemamalini,” Congestion in Wireless Sensor Networks and Various Techniques for Mitigating Congestion - A Review”, IEEE International Conference on Computational Intelligence and Computing Research, 2010. V.Bhuse, A.Gupta, L.Lilien, “DPDSN: Detection of packetdropping attacks for wireless sensor networks”, Proc. Trusted Internet Workshop, 2006. X.Zhang and K.G. Shin, “E-MiLi: Energy-Minimizing Idle Listening in Wireless Networks”, ACM-MobiCom, 2011. A.Haslizan Ab.Halim1, K.Zen, “MAC Protocol to Reduce Packet Collision in Wireless Sensor Network”, Proceedings of the International Conference on Computer and Communication Engineering 2008. C.Cano, B.Bellalta, A.Sfairopoulou, M.Oliver, J.Barceló, “Taking Advantage of Overhearing in Low Power Listening WSNs: A Performance Analysis of the LWT-MAC Protocol”, Mobile Netw Appl DOI 10.1007/s11036-010-0280-4, Springer Science+Business Media, LLC, 2010. K.TaeKimandHee, Y.Youn, “An Energy-Efficient MAC Protocol Employing Dynamic Threshold forWireless Sensor Networks”, International Journal of Distributed Sensor Networks, Article ID 304329, pp.12,2012. I. Khemapech, A. Miller, I. Duncan, “A Survey of Transmission Power Control in Wireless Sensor Networks”, Proceedings of the 8th Annual Postgraduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting (PGNet '07), pp. 15–20, 2007.. G.Anastasi, M.Conti, M.D.Francesco, “Andrea Passarella, Energy Conservation in Wireless Sensor Networks: a Survey”, ScienceDirect, 2009. H-C. Le, “OBMAC: an Overhearing based MAC Protocols for Wireless Sensor network”, International Conference on, Vol., no., pp.547,553, 2007. Y.Xu, J.Heidemann, “Geographyinformed Energy Conservation for Ad Hoc Routing”, ACM Mobicom, 2001. M.Zorzi, R.R. Rao, “Geographic Random Forwarding (GeRaF) for ad hoc and sensor networks: multihop performance”, Mobile Computing, IEEE Transactions on, Vol.2, no.4, pp.337-348, 2003. B.Chen, K. Jamieson, H.Balakrishnan, R. Morris,”Span: An EnergyEfficient Coordination Algorithm for Topology Maintenance in Ad Hoc Wireless Networks”, ACM, 2002. A.Capra and D. Estrin, “ASCENT: Adaptive Self-Configuring sEnsor Networks Topologies”, Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, Vol.3, pp.1278-1287, 2002. V. Rajendran, Obraczka.K, Garcia-Luna-Aceves. J.J., “Energyefficient, collision-free medium access control for wireless sensor networks”, in: Proceedings of the First International Conference on Embedded Networked Sensor Systems (Sensys), Los Angeles, CA, 2003.

[20] J. Polastre, J. Hill, D. Culler, “Versatile low power media access for wireless sensor networks”, in Proceedings of the Sensys’04, San Diego, CA, 2004. [21] H. Dubois-Ferriere, D. Estrin, M. Vetterli, “Packet combining in sensor networks”, in Proceedings of the Sensys’05, San Diego, CA, 2005. [22] S. Mishra, A. Nasipuri, “An adaptive low power reservation based MAC protocol for wireless sensor networks”, in Proceedings of the IEEE International Conference on Performance Computing and Communications, pp. 316–329, 2004. [23] C. Guo, L.C. Zhong, J.M. Rabaey, “Low power distributed MAC for ad hoc sensor radio networks”, in Proceedings of the IEEE Globecom, pp. 2944–2948, 2001. [24] M.C. Vuran, I.F. Akyildiz, “Spatial correlation-based collaborative medium access control in wireless sensor networks”, IEEE/ACM Transactions on Networking, pp. 316–329, 2006. [25] M.C.Vuran,O.B.Akan,I.F.Akyildiz,"Spatio-temporal correlation: theory and applications for wireless sensor networks", Computer Networks Journal,2004. [26] Y. Zhang, W.W.LI, “Modeling and energy consumption evaluation of a stochastic wireless sensor Network”, EURASIP Journal on Wireless Communications and Networking, 2012. [27] F. Clad, “Antoine Gallais and Pascal Mérindol, Energy-Efficient Data Collection in WSN: A Sink-Oriented Dynamic Backbone”, Communications (ICC), IEEE International Conference on, Vol., no., pp.276-280, 2012. [28] S. E. Roslin, C. Gomathy, “IBPN: Intelligent Back Propagation Network Based Cluster Head Selection for Energy Efficient Topology Control in Wireless Sensor Network”, European Journal of Scientific Research ISSN 1450-216X Vol.79 No.4, pp.541-550, 2012. [29] M.A. Azimi.,.M. Ramezanpor,”A Robust Algorithm for Management of Energy Consumption in Wireless Sensor Networks Using SOM Neural Networks”, Journal of Academic and Applied Studies Vol. 2(3), pp. 1-14, 2012. [30] V.K. Singh.,.V. Sharma, “Elitist genetic algorithm based energy efficient routing scheme for wireless sensor networks”, International Journal Of Advanced Smart Sensor Network Systems ( IJASSN ), Vol 2, No.2, 2012. [31] R.C. Abreu, Arroyo.J.E.C, “A Particle Swarm Optimization Algorithm for Topology Control in Wireless Sensor Networks”, International Conference of the Chilean Computer Science Society,2011. [32] A. Ray,D. De, “P-EECHS: Parametric Energy Efficient Cluster Head Selection protocol for Wireless Sensor Network”, Int. Jr. of Advanced Computer Engineering & Architecture Vol. 2 No. 2 , 2012. [33] K.Sheela, S.Rani, N.Devarajan, “Fuzzy Based Optimization for Power Management in Wireless Sensor Networks”, International Journal of Computer Applications (0975 – 888) Vol. 48– No.4, 2012. [34] I.Caliskanelli, J. Harbin, L.S. Indrusiak., “Bioinspired Load Balancing in Large-Scale WSNs Using Pheromone Signalling”, International Journal of Distributed Sensor Networks, Article ID 172012,pp. 14,2013. [35] T. Camilo, R. Oscar, L. Carlos, “Biomedical signal monitoring using wireless sensor networks,” IEEE Latin-American Conf. on Communciations, pp.1-6, 2009. [36] V.W.S, Tang, Y. Zheng , J. CAo, “An Intelligent Car Park Management System based on Wireless Sensor Networks”, 1st International Symposium on Pervasive Computing and Applications, 2006. [37] Tapas, S.V.Srikanth, K.P. Dileep, “Smart Parking Using Wireless Sensor Networks”, IEEE, 2009. [38] I. Cassias, “Project54 Vehicle Telematics for Remote Diagnostics, Fleet Management and Traffic Monitoring”, ProQuest, pp.125 , 2008. [39] H-M.Tsai, “Intra-carWireless Sensor Networks”, Doctorial Thesis from Carnegie Mellon University, 2010. [40] K. P. Shih, S. S. Wang, H. C. Chen, and P. H. Yang, “CollECT: Collaborative Event detection and tracking in wireless heterogeneous sensor networks,” Computer Communications, vol. 31, pp. 3124-3126, September 2008. [41] S. F. Tat., “Amitabha Ghosh2, Erik A. Johnson3 and Bhaskar



2412


[42]

[43]

[44]

[45]

[46]

[47]

[48] [49] [50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

Krishnamachari, Energy-efficient deployment strategies in structural health monitoring using wireless sensor networks”, STRUCTURAL CONTROL AND HEALTH MONITORING Struct. Control Health Monit. 2012; 00:1–14, 2012. D. D. L. Mascaranes, E. B. Flynn, M. D. Todd, T. G. Overly, K. M. Farinholt, G. Park, and C. R. Farrar, “Development of capacitance based and impedance based wireless sensors and sensor nodes for structural health monitoring applications,” Journal of Sound and Vibration, Vol. 329, pp. 2410-2420, 2010. R. Kim, T. Nagayama, H. Jo, B. F. Spencer ., “Preliminary study of low-cost GPS receivers for time synchronization of wireless sensor networks”, Proc. SPIE 8345, Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems, 2012. T. Ahonen, R. Veirrankoski, M. Elmusrati, “Greenhouse monitoring with wireless sensor network,” IEEE/ASME Intl. Conf. on Mechtronics and Embedded systems and Applications, pp. 403408, 2008. R.A.Roseline,M.Devapriya,P.Sumathi, Pollution Monitoring using Sensors and Wireless Sensor Networks : A Survey, International Journal of Application or Innovation in Engineering & Management, Vol.2, Issue.7, 2013. F. Giancarlo, G. Stefano, G.. Raffaele, “Antonio Guerrieri, Agentbased Development of Wireless Sensor Network Applications”, Proc. of the 12th Workshop on Objects and Agents (WOA 2011), Rende (CS), Italia, 2011. S. Chris, S. Robert, B. Wayne , “Perceptions of the Elderly on the Use of Wireless Sensor Networks for Health Monitoring”, ACM Digital Library, 2006. S.Andreas, C. Darren and R. Peter, “SmartAssist - Wireless Sensor Networks for Unobtrusive Health Monitoring”, BMI'10. A. Gaddam, “Wireless Sensor Network based Smart Home for Elder Care”, Doctorial Thesis of Massey University, 2011. R. A. Rashid, S. H. S. Arifin, M. R. A. Rahim, M. A. Sarijari, and N. H. Mahalin, “Home healthcare via wireless biomedical sensor network,” International RF and Microwave Conf Proceedings, pp. 511-514, 2008. G. Krešimir, Ž. Drago, K. Višnja, “Medical applications of wireless sensor networks – current status and future directions”, Medicinski Glasnik, Vol.9, 2012. F. José, C. Iván, S. Miguel “Familiar, Research Experiences about Internetworking Mechanisms to Integrate Embedded Wireless Networks into Traditional Networks”, Retreived from https://www.iab.org/wp-content/IAB-uploads/2011/03/Perez.pdf. F. Norsheila, R. R. Abd, S. M. Adib, Nasir. H. M, “ECG Monitoring System Using Wireless Sensor Network (WSN) for Home Care Environment”,Retreived from trg.fke.utm.my/members/rozeha/publication2008/6.pdf. S. Ivanovitch, G. L.Affonso , P. Paulo, V. Francisco, “Reliability and Availability Evaluation of Wireless Sensor Networks for Industrial Applications”, Sensors,pp. 806-838,2012. P.R.Bhupal, M.Soujanya, “Real-time Monitoring and Preventive Measures of Leakage Accidents in Gas and Petrochemical Industries using Zigbee”, International Journal of Engineering Research and Applications, Vol. 2, Issue 6, pp.830-834, 2012. L.Mo, L. Yunhao, “Underground Coal Mine Monitoring with Wireless Sensor Networks”, ACM Transactions on Sensor Networks”, Vol. 5, No. 2, Article 10, Publication date: 2009. K. Amruta, S.Hate, “Automatic Meter Reading System Using Wireless Sensor Network And GSM”, International Journal of Electronics and Communication Engineering (IJECE), ISSN 2278-9901, Vol. 2, Issue 3,pp. 103-110, 2013. A. Gutiérrez, B. Durocher, Lu. Bin, “Applying Wireless Sensor Networks in Industrial Plant Energy Evaluation and Planning Systems”, IEEE IAS Pulp And Paper Industry Conference In Appleton, 2006. A. Tanel, Serg. Risto, P. Jürgo, O. Tauno, “In-process determining of the working mode in CNC turning”, Estonian Journal of Engineering,Vol.17, pp.1-14, 2011. S.Amit, “Dynamic Power Management in Wireless Sensor Networks”, Wireless Power Management, IEEE Design & Test of Computers, 2001. H. Absar-ul, A. Ghalib.Shah, A. Ather, “Intrusion Detection System using Wireless Sensor Networks”, EJSE Special Issue:

Wireless Sensor Networks and Practical Applications, 2010. [62] Z. Marco, F. Athanasios, D. Gokhan, “On the design of a Water Quality Wireless Sensor Network (WQWSN): an Application to Water Quality Monitoring in Malawi”, Parallel Processing Workshops, ICPPW '09. International Conference on, vol., no., pp.330-336, 2009. [63] B. Tatiana, H. Wen, K. Sali, R. Branko, G. Neil, B. Travis, R. Mark, J. Sanjay, “Wireless Sensor Networks for Battlefield Surveillance”, Land Warfare Conference 2006. [64] T.Rui, X. Guoliang, W. Jianping, C. Hing, “Exploiting Reactive Mobility for Collaborative Target Detection in Wireless Sensor Networks”, IEEE Transaction on Mobile Computing, Vol-9, Iss-3, pp.317-332, 2010. [65] Gilbert, “Research Issues in Wireless Sensor Network Applications: A Survey”, International Journal of Information and Electronics Engineering, Vol. 2, No. 5, 2012. [66] Munsif Ali Jatoi, Forward Error Correction Using Reed-Solomon Coding and Euclid Decoding in Wireless Infrared Communications, (2013) International Journal on Communications Antenna and Propagation (IRECAP), 3 (2), pp. 97-101. [67] Ghandi Manasra, Osama Najajri, Samer Rabah, Hashem Abu Arram, DWT Based on OFDM Multicarrier Modulation Using Multiple Input Output Antennas System, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (5), pp. 312-320. [68] Said Ben Alla, Abdellah Ezzati, A Qos-Guaranteed Coverage and Connectivity Preservation Routing Protocol for Heterogeneous Wireless Sensor Network, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 363-371. [69] Reza Mohammadi, Reza Javidan, Adaptive Quiet Time Underwater Wireless MAC: AQT-UWMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (4), pp. 236-243. [70] D. David Neels Pon Kumar, K. Murugesan, K. Arun Kumar, Jithin Raj, Performance Analysis of Fuzzy Neural based QoS Scheduler for Mobile WiMA, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (6), pp. 377-385.


Asst Professor and Head, Dept of CS, Govt, First Grade College for Women, Hassan, Karnataka, India. 2

Research Scholar ,Anna University, Chennai, India.

2

Principal in Annai College of Engineering & Technology, Tamil Nadu, India. Mr. Nagendra Nath Giri, has received Master of Computer Application (MCA) and M.Phil in Computer Science. He is working as Asst Professor and Head, Dept of CS, Govt. First Grade College for Women, Hassan, Karnataka, India, has 12 years of academic and research experience. He is pursuing Ph.D from Anna University, Chennai, Research area is “wireless sensor networks”. E-mail: [email protected] Dr. G Mahadevan M.E., Ph.D.(CSE) is having 24 years experience in teaching as a Professor and Principal. His contribution to academics has bagged him two national awards. Currently he is guiding 5 Ph.D scholars for AnnaUniversity, Coimbatore, India and 3 Ph.D scholars for PRIST University Thanjavur, India and 3 Ph.D scholars for VTU, India. He has published more than 32 national and international papers. He conducted more than 6 international and 8 national conferences. He is working now as a Principal in Annai College of Engineering and Technology, Tamil Nadu.



2413


EECPS-WSN: Energy Efficient Cumulative Protocol Suite for Wireless Sensor Network Nagendra Nath Giri1,2, G. Mahadevan3 Abstract – With the proliferation of wireless sensor network (WSN), the area of communication application is undergoing random exploration of massive wireless communication tool. However, due to dependency on power and various other resources, reliability and robustness of collection of data in WSN cannot be solved yet to optimal level. This paper presents a non-deterministic technique that address such issues from the micro-level in direction of data aggregator where the problem of collision of equivalent data packet from multiple sensors or redundant packets are always a high probability. Powered by protocol suite, the proposed model has the potential to perform selection of unique incoming data packets and discard the redundant one by deploying dual ranking factors. Finally, compared with most frequently used LEACH, the proposed model ensure Energy Efficient Cumulative Protocol Suite for wireless sensor network (EECPS-WSN). Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Data Aggregator, Energy, LEACH, Wireless Sensor Network

Nomenclature Data Aggregation

LEACH

Multihop

ai d T NID TX_Range Po SIMarea

Data aggregation in Wireless Sensor Network refers to acquiring the sensed data from the sensors to the gateway node Low Energy Adaptive Clustering Hierarchy ("LEACH") is a TDMA-based MAC protocol which is integrated with clustering and a simple routing protocol in wireless sensor networks (WSNs). The goal of LEACH is to lower the energy consumption required to create and maintain clusters in order to improve the life time of a wireless sensor network Multi-hop, or ad hoc, wireless networks use two or more wireless hops to convey information from a source to a destination Number of aggregator nodes in Fig. 1 Distance between 2 aggregator node Time Node ID Transmission range Power Simulation area

I.

Introduction

Due to their vast potential applications, wireless sensor networks have attracted significant attention in recent years [1]-[33]. Since sensor nodes are small devices with limited resources, in particular battery power (or energy), it is crucial to design wireless sensor Manuscript received and revised September 2013, accepted October 2013

2414

networks that save energy, and thus, prolong the network lifetime. As sensors are usually deployed in large numbers with high density, data aggregator (or aggregation) offers a key strategy to curtail the network load, and hence, reduce energy consumption [3]. Many routing algorithms with data aggregator have been proposed to achieve significant energy savings by allowing intermediate nodes aggregate the data streams [4], [5]. These protocols focus mainly on the energyefficient routing and thereby assume that the transmission reliability can be supported by the underlying link layer protocol. However, wireless sensor nodes may often be deployed in harsh and inhospitable physical environments. All kinds of environmental influence, such as temporary obstacles, weather, or channel contention, affect the transmission quality. Therefore, the packet error rate (including packet loss and packet error due to bit error) in sensor networks changes much more dynamically and is higher than other networks. In the harsh situation, the packet error rate may go up to 40 percent [6]. How to reliably deliver the sensory information to the sink in an efficient way is indeed a major challenge in such networks. Especially, in a data aggregator tree, unreliable transmission can result in momentary loss of sensor readings of the entire sub-tree. Node or link failures closer to the sink affect the reliability very drastically since data packets sent out by nodes near the sink contain much more information than those around the leaves. In order to benefit from the energy saving due to data aggregator, we address the problem of offering desired reliability to fused information with minimum energy Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved


consumption. To increase the information reliability, the main idea of our scheme is to repeatedly transmit data on fusing routes without acknowledgments (ACKs). There are two benefits of this scheme. The first advantage is that it can work together with any data aggregator routing algorithm and also any aggregator function. The second advantage is that it does not need any ACK control mechanism, and hence, reduces the latency of data delivery. The key challenge here is how to compute the optimal number of transmissions for each node to minimize the total energy consumption. Energy-efficient and reliable transmission of sensory information is a key problem in wireless sensor networks. To save more energy, in-network processing such as data aggregator is a widely used technique, which, however, may often lead to unbalanced information among nodes in the data aggregator tree. Traditional schemes aim at providing reliable transmission to individual data packets from source node to the sink, but seldom offer the desired reliability to a data aggregator tree. In the proposed research paper, we highlight a novel schema called as EECPS-WSN that aims to ensure energy efficient data aggregator technique done under reliable control condition using non-deterministic technique. In section 2 we give an overview of related work which identifies all the major research work being done in this area. Section 3 highlights about the fundamental of data-aggregator and its associated issues are discussed in Section 4. The proposed system is discussed in Section 5 followed by model design discussion in Section 6. Section 7 discusses about performance analysis that basically discusses the results being accomplished and Section 8 discusses about the comparative analysis being done. And finally in section 9 we make some concluding remarks.

II.

Related Work

This section discusses about all the prominent research work being explored from the literature that claims issues in data aggregator in wireless sensor network. Wimalajeewa [7] has designed a framework that ensures optimal power scheduling for distributed detection in a Gaussian sensor network for both independent and correlated observations by assuming amplify-and-forward local processing at each node. Fan e.t. al [8] have discussed data aggregator of multitarget tracking by considering LEACH as the routing protocol, and FCM Algorithm to do association in cluster heads. The results shows that miss association, missing new target and repeated tracking are avoided and the tracking effect is very well. Chen [9] proposed a new algorithm named FCM (Improving Life Time of Wireless Sensor Networks by Using Fuzzy c-means Induced Clustering). The author divides all nodes into Q clusters by the Fuzzy c-means algorithm and chose the cluster head


according to the node’s energy and the distance between the center position of each cluster and base station in the first round. Xu e.t. al. [10] have proposed a revised cluster routing algorithm named E-LEACH to enhance the hierarchical routing protocol LEACH by considering the remnant power of the sensor nodes in order to balance network loads and changes the round time depends on the optimal cluster size. The simulation results show that the protocol increases network lifetime at least by 40% when compared with the LEACH algorithm. Tan e.t. al. [11] have developed an analytical framework to explore the fundamental limits of coverage of large-scale WSNs based on stochastic data aggregator models. To characterize the inherent stochastic nature of sensing. The main focus of this paper is to investigate the fundamental scaling laws between coverage, network density, and signal-to-noise ratio (SNR). Bagci [12] have introduced a fuzzy unequal clustering algorithm (EAUCF) which aims to prolong the lifetime of WSNs. EAUCF adjusts the cluster-head radius considering the residual energy and the distance to the base station parameters of the sensor nodes. This helps decreasing the intra-cluster work of the sensor nodes which are closer to the base station or have lower battery level. The author have utilized fuzzy logic for handling the uncertainties in cluster-head radius estimation. Hamzeh [13] have proposed a fuzzy processor to be in charge of performing fuzzy instructions. This processor is applied to track the best path online for forwarding packets instead of traditional offline table based forwarding process. Simulation results show the numerous efficiency of our methodology not only in balancing the power dissipation through network, but also in lifetime improvement, traffic management, and network availability. Baek et al. [14] have proposed a stochastic geometric model to study the energy burdens seen in a large scale hierarchical sensor network. The proposed model reveals, how various aspects of the task at hand impact the characteristics of energy burdens on the network and in turn the lifetime for the system. Luo et al. [15] have proposed a routing algorithm called Minimum Aggregator Steiner Tree (MFST) for energy efficient data gathering with aggregation (aggregator) in wireless sensor networks. Analytical and experimental results show that MFST adapts well to varying network conditions including network topology, aggregator cost, and the degree of correlation. Therefore, MFST provides a feasible general routing scheme for wireless sensor networks facing various applications, unpredictable environments, and time evolving reconfigurations. Cheng et al. [16] have proposed a delay-aware network structure for WSNs with in-network data aggregator. An optimization process is proposed to optimize intra-cluster communication distance. Simulation results show that, when compared with other existing aggregation structures, the proposed network structure can reduce


2415


delays in data aggregation processes and keep the total energy consumption at low levels provided that data are only partially fusible. Medhat et al. [17] have presented novel clustering methods for multimodal WSNs, which are multimodal, multilayer, and monitored clustering. Chen and Hu [18] have presented a new data aggregator algorithm based on improved leach protocol and gauss membership function of fuzzy theory in wireless sensor networks. Three types of data aggregator algorithms have compared and it shows that the proposed algorithm is more suitable for applications of environment detection systems in wireless sensor networks. Luo e.t. al. [19] have study the problem of Minimum Energy Reliable Information Gathering with unreliable data aggregator structure. Theoretical proofs and experimental results show that the packages with more information should be delivered with higher reliability. Furthermore, the simulation results reflect that our approximation solutions can guarantee the desired information reliability with high energy efficiency. Rahman [20] has focused on finding the optimal sink position using Particle Swarm Optimization (PSO) based algorithm to locate the optimal sink position with respect to those relay nodes to make the network more energy efficient. The relay nodes communicate with the sink instead of the sensor nodes. Tests show that this approach can save at least 40% of the energy and prolong the network lifetime. Wu and Cheng [21] propose a mobile agent tree routes scheme with data aggregator. Results shows that as the network scale increase, the amount of mobile agent increase too, thus it can overcome the disadvantage of only one mobile agent. Choi e.t. al. [22] have proposed an efficient and accurate data aggregator mechanism through temporal and spatial correlations for Home Automation Networks. we use the temperature data measured on real networks and extend them using Gaussian distribution in order to obtain the more test data sets. The simulation results show that using both types of correlations is more efficient than just using one type for data aggregator in terms of the data transmission amount and accuracy.

III. Data Aggregation Data aggregation is defined as the process of aggregating the data from multiple sensors to eliminate redundant transmission and provide fused information to the base station [23]. Data aggregation usually involves the fusion of data from multiple sensors at intermediate nodes and transmission of the aggregated data to the base station (sink). The algorithm uses the sensor data from the sensor node and then aggregates the data by using some aggregation algorithms such as centralized approach, LEACH (low energy adaptive clustering hierarchy) [24], TAG (Tiny Aggregation) [25] etc. Data aggregation protocols aims at eliminating redundant data transmission and thus improve the lifetime


of energy constrained wireless sensor network. In wireless sensor network, data transmission took place in multi-hop fashion where each node forwards its data to the neighbor node which is nearer to sink. Since closely placed nodes may sense same data, above approach cannot be considered as energy efficient. An improvement over the above approach would be clustering where each node sends data to cluster-head (CH) and then cluster-head perform aggregation on the received raw data and then send it to sink. Performing aggregation function over cluster-head still causes significant energy wastage. In case of homogeneous sensor network cluster-head will soon die out and again re-clustering has to be done which again cause energy consumption.

Fig. 1. Data Aggregation Phenomenon

Although data aggregation results in fewer transmissions, there is a tradeoff – potentially greater delay in the case of some aggregation functions because data from nearer sources may have to be held back at an intermediate node in order to be aggregated with data coming from sources that are farther away. In the worst case, the latency due to aggregation will be proportional to the number of hops between the sink and the farthest source. While data aggregation in wireless sensor networks is clearly advantageous with respect to scalability and efficiency, it introduces some security issues. In particular, the designated aggregator nodes that collect and store aggregated sensor readings and communicate with the base station are attractive targets of physical node destruction and jamming attacks. Indeed, it is a good strategy for an attacker to locate those designated nodes and disable them, because he can prevent the reception of data from the entire cluster served by the disabled node. Even if the aggregator role is changed periodically by some election process, some security issues remain, in particular in the case when the base station is offline and the aggregator nodes must store the aggregated data temporarily until the base station goes on-line and retrieves them. More specifically, in this case, the attacker can locate and attack the node that was aggregator in a specific time epoch before the base station


2416


fetches its stored data, leading to permanent loss of data from the given cluster in the given epoch.

IV.

Problem Description

Commonly adopted inference techniques in multisensor data aggregator applications can be roughly grouped into three main categories: estimation algorithms for entity parameters or attributes, identity estimation algorithms for recognition, and other hypothesis-testing criteria (data-entity association, situation analysis, etc.). Estimation algorithms generally return the values of some quantitative entity parameters or attributes that are particularly significant for the application considered. For instance, in car safety and driver assistance systems, estimations could be made for: kinematic parameters (e.g., the position and the relative velocity) of the objects observed outside the host vehicle or host parameters detected by monitoring the actions of the driver (e.g., the pressure on brake or clutch pedals). Identity estimation techniques rely on special classification algorithms that are used to recognize an object on the basis of some significant extracted features—the shape or patterns of various vehicles detected on a road, for example. In car safety applications, these kinds of algorithms can also be employed to construct a list of the objects (e.g., a tree, a motorcycle, a road sign, etc.) surrounding the host vehicle. Other inference problems are usually focused on testing different hypotheses to make a sensible choice among various alternatives. Whereas the attribute or entity estimation problems are usually only part of the JDL level 1 of processing [26], hypothesis testing techniques may affect multiple levels. Their complexity depends on the complexity of the corresponding inference problem. They can range from determining whether a certain measurement result is related to a given vehicle—a simple data-entity association problem at level 1—to predictions about possible collisions at level 3. Multisensor data aggregator has several qualitative benefits compared to the results achievable using a single sensor alone:  Robust operational performance, improved system reliability, and improved detection: combining multiple, often redundant, measurement data increases the probability of detecting a certain event, even when, for whatever reason, some sensors do not work properly.  Extended spatial and temporal coverage: using multiple sensors, one sensor is often able to detect an event or measure a quantity in a position or at a time in which another cannot.  Increased confidence and reduced ambiguity: if several sensors contribute to a measurement result, the level of confidence of the fused values is higher than that of each individual output.  Increased reliability: a system relying on different sensors is less vulnerable to disruption caused by human actions or natural phenomena. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

 Enhanced spatial resolution: multiple sensors enable the system to create a synthetic aperture, the resolution of which can be better than the resolution achievable using only one sensor. In spite of these advantages, data aggregator suffers also from some problems regarding the accuracy of the achievable estimates or inferences. In fact, given that any data aggregator process combines many different sources of uncertainty through different types of algorithms, estimating and managing the overall uncertainty associated with the data aggregator process is a challenging task. This is due to several reasons:  The traditional uncertainty estimation techniques described in the well-known Guide for the Estimation of Uncertainty in Measurements cannot be applied when the relationships are not expressed by analytical functions [27]. Although several research activities are currently focused on solving this problem, there is no commonly accepted approach for uncertainty estimation in data aggregator systems.  The total uncertainty depends not only on the contributions affecting the raw sensor data, but also on the models describing the input-output characteristic of each sensor.  Data aggregator algorithms operating at a higher inference level cannot correct possible errors inserted into a lower level of processing. For instance, a wrong extraction of certain signal features cannot be compensated for by even the best pattern recognition algorithm.  Finally, the identification tasks based on ANNs, SVMs, or other similar learning-from-examples classification algorithms suffer from the uncertainty associated with the chosen training data set.

V.

Proposed Framework

The proposed system discusses about a schematic framework considering minimal amount of energy in performing data aggregation technique in wireless sensor network (WSN). The proposed data aggregation technique is highly flexible for various types of topologies existing in wireless sensor network. The framework can be mapped with cumulative data aggregation tree (DAT) as, DAT= (SS{s}, e). Where SS is represents the group of sensor nodes in data aggregation tree (DAT) and e represents number of edges highlighting the communication channel between two aggregated sensor nodes. Let us represent indexing of aggregation nodes in DAT as DAT (i) = {1, 2 …i…N}, where N is total number of aggregation nodes in data aggregation tree. The framework considers each aggregation nodes have uniform sensing and transmission range. Therefore, for sensor (i  DAT), the weight of data aggregation tree (DFTW) denotes the outgoing data packet size of the aggregation node. An edge (e  E) is represented as e= (i, j), where i is the child node and j is the parent node of i. All the aggregation nodes are considered to adopt the uniform transmitting power in order to transmit the data International Review on Computers and Software, Vol. 8, N.10

2417


packet and the unit transmission cost for each byte is considered as CT. Hence it can be said that transmission cost CT is scaled in terms of energy consumption while performing data aggregation procedure. In data aggregation tree (DAT), each sensor nodes aggregator of entire data packets with its own to a single data packet and sends it out to its parent node. It is also known that rate of data aggregation highly differs from application to application and the same fact is considered even in our case. There is a higher possibility that a size of aggregated data packet is less than the total size of two original data packet but may be greater in buffer dimension of any original data packet. According to Fig. 2, there are basically 2 feasible techniques of transmitting replica copies of data packet from a sensor node to the sink node as seen in this DAT,  Technique-1: Data packet copies are forwarded without any acknowledgement from the sink.  Technique-2: Deploys acknowledge from the receiver aggregator node and halts transmission of replica of data packets if the acknowledge is already received from the correct data packet. Hence, before performing rate controlling of data generation design for data aggregation technique, it should be known that both the above techniques have almost similar overhead when considered under

maximum channel error rate. The elementary reasons behind this are; Technique-1 consumes much energy on unwanted retransmission. However, when any of the two factors-hops or channel error rate-is large, the likelihood of the acknowledgment packet being lost is also quite high. Therefore, the additional overhead of acknowledgment packet becomes a prominent part of the total energy consumption in the second approach, as compared with the minimization of redundant retransmission. Furthermore, retransmission after loss of acknowledgment shall also bring in longer delivery latency resulting in traffic congestion. Hence, this technique can effectively control the congestion that may possible exists. Therefore, the model cannot be designed with prominent energy preservation by using acknowledgments. Therefore, in order to simplify the control mechanism and reduce buffer size at nodes, the proposed system uses Non-deterministic technique and deploys the multiple transmission procedures without any acknowledgments. In a sensor network, the sink needs abundant information to get the accurate result and make a decision. In reality, however, not all data packets can be successfully delivered to their parent nodes.

Fig. 2. Schematic Illustration of Proposed Model



2418


VI.

Model Design

The proposed system targets to enhance the energy efficiency as well as data reliability in the process of data aggregator in wireless sensor network based by logically choosing the potential data aggregator node. For this purpose, the proposed technique has introduced an algorithm termed as EECPS-WSN (Energy Efficient Cumulative Protocol Suite for WSN) technique that illustrates the performance of data aggregation technique for energy efficiency in the area of wireless sensor network. The term ‘protocol suite’ is framed as cumulative proposed protocol address following issues:  Rate control of data generation  Congestion control / avoidance  Efficient determination of aggregator node  Multihop and multipath effective route selection  Unique key distribution mechanism to sustain the route attack A model is based on optimization based on multi-hop and multi-path route selection using aggregator node ranking process to evaluate the potential aggregator node to eliminate the possibility of data redundancies and thereby minimize promises efficient aggregator node determination process. The evaluation of aggregator node is designed for wireless sensor network that is facilitated to every individual aggregator nodes, which transmits its respective information of aggregator node (AN) where the gathered information is forwarded towards the sink. The considerations of the design of the proposed model are e.g.  All the aggregator nodes have analogous physical charecteristics,  Measure of distance of the aggregator nodes is with respect to each transmission area,  Entire aggregator nodes are assumed to posses definite optimal residual energy  Location of sink is very far away from the coordinates of the entire transmission zone assumed for better data aggregation results in worst scenario. It was already seen in literature that a nondeterministic technique is used [28] for better optimization principles in energy efficiency. But, we will like to enhance the use non-deterministic logic for data aggregation process and hence, we will design multi-hop route optimization process that considers usage of sensor energy and neighborhood density. We design multi-path route optimization stage considering the effective positioning, aggregator node to sink node distance, and inter- aggregator node distance. The prime consideration of the design model is to use a reliable and promising mechanism over energy efficiency issues at the time of data aggregation. Although there are various topologies studied in literature for wireless sensor network, but the proposed framework will like to exhibit a model that addresses data aggregation issues in majority of the topologies in wireless sensor network.


Therefore, the proposed system considers the design using tree-based (DAT) approach, where the prime aim is to check for the potential and reliable aggregator node at the time of performing data aggregation. Because, efficient aggregator node is highly likely to perform a better, reliable, and secure data aggregation. This issue of decision is sorted out by using nondeterministic approach where a protocol suite is considered for both multi-hop and multi-path route selection process. The process of non-deterministic technique considers an optimal decision making system that perceives the conditions of the environment based on the specific characteristics of the aggregator nodes. One of these techniques selects potential aggregator nodes among the nodes of each simulation area by decision making process of the protocol suite. The potential aggregator node may be defined as node with highest data to be aggregated with highest energy with other node for accomplishing final transmission to sink. The main aim of the EECPS-WSN (Energy Efficient Cumulative Protocol Suite for WSN) technique is primarily to prompt various aggregator nodes to participate effectively for the selection of non-redundant data to be aggregated and transmitted forward using proposed protocol suite. The proposed EECPS-WSN schema is as shown in Figure 3.

Fig. 3. Schematic Diagram of proposed system

Not only QoS, the proposed system also considers providing security system. In the initial phase of the network formation, the sink will gather all the required


2419


data from the sensor motes and the aggregator nodes will chose non-redundant data to be aggregated without using any sorts of acknowledgment. For making the data aggregator process compliant of congestion control and avoidance, the proposed framework will consider less proportion of aggregator node quantity of every sensor nodes in all iteration. The most demanding aspect of the suite is to create a non-deterministic reckoning procedure that can efficiently select a potential aggregator node and rule out the chances of other non- aggregator (or even the malicious node) node in every individual rounds of evaluation. Therefore, the process gives much clarity in selecting the potential aggregator node without any ambiguity. The prospect candidate aggregator nodes are those nodes which has maximum residual energy and better chances to get elected as reliable aggregator node based on its data packet contents it has received for the purpose of performing data aggregation. However, the prime criterion is retention of residual power to greater extent. The design of the non-deterministic decision rule using Protocol Suite for selection of aggregator node is aimed to furnish certain conditions for optimal best energy preservation as well as maintenance of nonredundant data packet of aggregator node. Because, we believe that data aggregation technique do consume energy to little extent. Therefore the complete consideration of the associated non-deterministic parameters in data aggregator process in EECPS-WSN is based on grading system e.g. a) Uniform grading system: In this grading system, the aggregator node will be considered to be in rooted in position of required centroid of the simulation area considering the energy factor and b) Non-Uniform Grading System: This aggregator type will maximize the energy conservation scheme for both multi-hop and multi-path route selection process of data aggregation in wireless sensor network. The operational elaborations for the above two terms are as follows  Uniform Grading System: This is one of the most significant grading system in the design of the connected non-deterministic parameters, where the non-deterministic sets evaluates the multiple information of the aggregator node with respect to distance among the candidate aggregator nodes and their residual energy. Communication or the transmissions of updated data packet regarding the respective individual aggregator node’s residual energy, their current position, as well as their transmission relation with other transmission zone are main consideration in this type. As aggregator node will initiate communication on behalf of every other member aggregator nodes, so there is a feasibility of network congestion in network due to high number of network overheads. Moreover, another prominent functional requirement of aggregator node will be to collect all the precise and unique data with respect to energy, position, and neighborhood density node and send it instantly to sink without any delay. Therefore, this aggregator type of non-deterministic operators will consider thwarting all the network overhead,


power issues, and smooth communication from the aggregator node members to other aggregator node and then to sink.  Non-Uniform Grading System: The main goal of this grading system is to increase the likelihood of energy preservation within all the aggregator nodes so that they can perform multi-hop route selection with highest contention of energy. All the aggregator member nodes are sufficiently separated at a spatial distances in the simulation area. Another significant condition of this aggregator type is to make certain the assurance of involvement of the entire member non- aggregator nodes in the data forwarding in unique route to avoid collision of data packet and thereby maximize the cumulative network lifetime of WSN. This aggregator type of the design of the nondeterministic connected parameters considers an optimal position of the sensor motes which is now considered as aggregator node from the previous stage. The secondary priority is focused on unique distribution of data packets at the instant of heavy traffic in communication. The eminent parameters like residual energy, Congestion Factor (GF), Position Factor (PF) with respect to entire transmission zone; distance to data sink will be evaluated in this phase. GF and PF will be determined based on node density as well as localized factors of nodes in the considered simulation area. For the purpose of evaluating PF localization, the sink chooses every wireless sensor node and estimates the computation of the squared distances (d) of other aggregator nodes from those selected nodes. As the energy required for transmission is directly proportional to d2, hence the lower the value of position factor PF, the lower the amount of energy needed by other aggregator nodes for forwarding the aggregated data through that aggregator node. The formulation of the non-uniform grading system also facilitates the scrutiny of network to increase the gross network lifetime and in order to accomplish this fact, a relative distance estimation of aggregator node to another node (may be cluster head or sink) is estimated. The proposed protocol suite can exhibit the design of a typical non-deterministic scheme that permits to demonstrate a discrete function with the values between 0-1. According to the proposed solution, the discrete function of a typical set S1 which is a subset of CF is the characteristic function definite to closed interval [0, 1] for set S1. Hence, empirically, the proposed non-deterministic connected factors can be broadened as follows: S1=∑ {(x, ζS1(x)) | x  CF([0, 1])}

(1)

S2=∑ {(x, ζS2(x)) | x  CF([0, 1])}

(2)

here, S2 is the proposed non-deterministic connected factors in the universal set CF and ζS1 (x) is the degree of non-deterministic decision parameters of x in S2. The ζS2 (x) in the above formula will also consist of all feasible


2420


real numbers in the definite interval of [0, 1] and not just 0 and 1. The above two equations (1) and (2) assists in formulation of the proposed non-deterministic technique that will be targeted for designing proposed protocol suite. Hence, the characteristic function can be represented empirically as follows: Only if x  A: ζS1(x)=1 or else:

(3)

ζS1 (x)=0 Therefore, ζS1 (x) is a characteristic function for set S1, and S1 is a typical set of the universe. The nondeterministic connected factors described above will be an extension of the typical set; the constituents in a nondeterministic connected factors will broaden the perception of a binary characteristic function in a typical set in order to multiple values on the continuous interval [0, 1] considering the entire wireless sensor network: S2=∑ {(x, ζS2(x)) | x  CF([0, 1])}

(4)

Here, S2 is a non-deterministic connected factors in the universe CF, and ζS2 (x) is the non-deterministic decision factors of x in  S2. The adopted design of the non-deterministic logic is designed for evaluating various scales of parameters e.g. transmission distance, candidate nodes, non- aggregator nodes, aggregator nodes, and residual energy. Therefore, it assures reliability of data packets while performing aggregation technique. Therefore, it can be said that the proposed protocol suite technique will receive an extensive usage and can be most frequently deployed for the future enhancement of the experimental model. The proposed protocol suite has two significant characteristic that are considered in aggregator node selection and they are: a) gross energy level of aggregator Node that is extracted from every other sensor node and represented by non-deterministic connected factors that is measured while depicting the non-deterministic decisive factors for excavating unique data packets. The above deployment of the non-deterministic logic promises the accurate energy conservation schema along with proper selection of aggregator node. For this purpose, the system model considers a comparative analysis with the minimum score coefficient of ρ. According to this consideration, when the nondeterministic connected factors is done with the analysis of the selection criterion of the aggregator node in the uniform grading system, the aggregator node whose tolerability criteria is greater than ρ will be successively analyzed in the Non-uniform grading system depending on the location of the aggregator nodes, cluster head to sink distance, and distance between every cluster head observed in every rounds. However, it can be seen that in this stage, the


evaluation for the best candidate aggregator node is done in order to select a set with highest tolerability as the absolute aggregator node in every current round. Another prominent parameter consideration is the Euclidean distance from the entire aggregator node to the other aggregator node. This phenomenon is also assisted by associating every Euclidean distance (cumulative) to the sink. The competency of the protocol suite is the ability which it facilitates the sink for estimating the distance from itself to the entire selected aggregator node. In the uniform grading system, all the above essentials are again considered as the input to the non-deterministic decision rule with an expected output of aggregator node tolerance of filtering the unique aggregated data with less power consumption. In the concluding phase, it is also important to estimate the life of aggregator node. The algorithm of proposed suite is as below: i) Algorithm for Rate control of data generation The algorithm for rate control mechanism is design in highly distributed manner that runs on every sensor in the network. This algorithm is operated by each node at a periodic duration of T seconds and in order to compute the permissible rate for flows generating at any specified sensor node, and for data flow routing in the neighborhood of the node. The algorithm running at a node i at the nth time-step, consists of the following three key steps: 1. Estimate Ri, the per-flow available capacity. 2. Calculating Rmin_i, the minimum per-flow available capacity. 3. Deploy Rmin_i to compute the permissible source rate Ri for flows generated at specified node i, as well as for flows consuming the receiver capacity of node i. The proposed rate control technique potentially depends on a Tsecond to illustrate sensor nodes with an interval to calculate their rate updates. In order to sustain a fair rate assignment, and cumulative data aggregation stability, it is highly crucial that T be large enough to ensure that rate control information has propagated to all sensor nodes in the wireless network within this update interval Tsecond. ii) Algorithm for Congestion control / avoidance The proposed protocol suite also furnishes features that perform congestion control/avoidance for the challenging traffic situation. The system initiates considers designing a transmission tree Ttree positioned in sink Snode. Input: Number of N nodes with position of Sink node Snode. Output: Congestion control/avoidance Steps: START 1. For each node N in Ttree do 2. Initialize parent, leaf and non-leaf nodes to 0 3. End for 4. Snode broadcast a query msg with unit hop.


2421


5. For N node in Ttree that receives the query msg do 6. If length (n)>e Then //e=edge 7. Increment hop. 8. Transmit the msg 9. Else if length (n) 0.05) indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=20.08) and non-IT students (Mean Rank=20.93) with regards to navigation understanding. Based on frequency and percentage, it was found that 75% (n=15) of the IT students had positive opinions of “Agree” (55%, n=11) and “Strongly Agree” (20%, n=4). More non-IT students expressed positive opinions than the IT students with 85% (n=17) stating “Agree” (70%, n=14) and “Strongly Agree” (15%, n=3). Based on these results, it can be concluded that both groups of participants could understand the system navigation (Median=4). The second question (Q2) was about the number of navigation steps. The results of the Mann Whitney U test (U=183, p=0.61>0.05) indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=19.65) and non-IT students (Mean Rank =21.35) with regards to understanding the navigation steps. Based on the frequency and percentage, it was found that 80% (n=16) of the IT students had


positive opinions of “Agree” (55%, n=11) and “Strongly Agree” (25%, n=5). More non-IT students expressed positive opinions than the IT students with 85% (n=17) stating “Agree” (55%, n=11) and “Strongly Agree” (30%, n=6). Based on these results, it can be concluded that both groups of participants could understand the number of steps for navigation (Median=4). More than 10% of participants from both groups expressed a neutral opinion, and 5% had a negative opinion, all of which came from the IT students. Based on their comments, these students had neutral and negative opinions because they were confused about the navigation steps in the browsing process using timeline visualization, which required them to click the clear icon to clear their selections every time they had selected three movies. In this section, the non-IT students suggested that the system should give users the option of whether they want to clear all the selected movies or only clear one or two movies so that users do not need to repeatedly select the same movies they want to use for comparison. This suggestion from the non-IT students was very interesting and should be considered for the improvement of the MSB system. The third question (Q3) was about understanding the icons used in the MSB system. The results of the Mann Whitney U test (U=192, p=0.79>0.05) indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=20.08) and non-IT students (Mean Rank =20.93) with regards to understanding the icons used in the MSB system. Based on frequency and percentage, it was found that 75% (n=15) of IT Students had positive opinions of “Agree” (60%, n=12) and “Strongly Agree” (15%, n=3). More non-IT students had positive opinions than IT students with 80% (n=16) stating “Agree” (65%, n=13) and “Strongly Agree” (15%, n=3). Based on these results, it can be concluded that both groups of participants could understand the icons used in the MSB system (Median=4). These results were also supported by the comments of participants, with both groups of participants saying that they could understand the icons used, but they would have liked it if the icons had been more visible so that they can be immediately noticed. The fourth question (Q4) was about the interface design consistency. The results of the Mann Whitney U test (U=183, p=0.57>0.05), indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=19.65) and non-IT students (Mean Rank=21.35) with regards to interface consistency. Based on the frequency and percentage, it was found that 90% (n=18) of the IT students expressed positive opinions of "Agree" (70%, n=14) and "Strongly Agree" (20%, n=4). More non-IT students had positive opinions than IT students, with 95% (n=19) stating Agree (70%, n=14) and Strongly Agree (25%, n=5). Therefore, based on these results of the Mann Whitney U test, it can be concluded that more than 90% of all participants from both groups found the interface design of the MSB system consistent (Median =4.00).


2456

Munauwarah, Nazlena Mohamad Ali, Hyowon Lee

The fifth question (Q5) was about interface attractiveness. The results of the Mann Whitney U test (U=194, p=0.85>0.05) indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=20.83) and non-IT students (Mean Rank=20.18) with regards to interface attractiveness. Based on the frequency and percentage, it was found that 65% (n=13) of the IT students expressed positive opinions of “Agree” (40%, n=8) and “Strongly Agree” (25%, n=5). The non-IT students also had positive opinions with 60% (n=12) stating “Agree” (35%, n=7) and “Strongly Agree” (25%, n=5). In this part, there were more than 10% of “Neutral” opinions and another 10% of negative opinions comprising “Disagree” from both groups of participants. Even though more than 50% of the participants had positive opinions, most of the participants commented that the Movie Song Browser needed more improvement in its interface design to make it more attractive, such as making it simpler and more presentable, and also having labels in the timeline to make it easier to understand. The last question (Q6) was about user understanding of the overall process in the system. The results of the Mann Whitney U test (U=155, p=0.16>0.05) indicate that there were no significant differences between the assessment results of the IT students and non-IT students with regards to the understanding of the overall process in the system. Based on frequency and percentage, it was found that 95% of the IT students and 100% of the nonIT students could understand the overall process of the system and from the comments made, they actually felt that the way the system presented the contents of the songs using the timeline was interesting. A Mann Whitney U test was also conducted to compare the assessment results between the IT students and non-IT students for timeline visualization of the MSB system. The results of the Mann Whitney U test are shown in Fig. 5. The first question was about the timeline visualization technique as shown in Q1 in Fig. 5. The results of the Mann Whitney U test (U=196, p=0.89>0.05) indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=20.70) and non-IT students (Mean Rank =20.30) with regards to the timeline visualization technique.

Fig. 5. The results of the Mann Whitney U test for timeline visualization


Based on frequency and percentage, it was found that 90% (n=15) of the IT students had positive opinions of “Agree” (15%, n=3) and “Strongly Agree” (75%, n=15). More non-IT students had positive opinions than IT students, with 100% (n=20) stating “Agree” (30%, n=6) and “Strongly Agree” (70%, n=14). Based on these results, it can be concluded that both groups of participants found that the visualization technique applied in the MSB system is helpful in the browsing of songs. (Median=4). Based on comments from the IT participants, they found that the system was helpful and interesting because they could immediately know the position of the song in the movie and the song length without having to watch the whole movie. The non-IT students mentioned that it was very helpful for them and easy to use because they could compare movies based on song content, and could access the contents directly from the timeline visualization. Both groups of participants felt that the timeline visualization helped and supported the browsing process. The second question was about the perceived importance of the timeline visualization in presenting song location and information in the movies, as shown in Q2 in Fig. 5. The results of the Mann Whitney U test (U=196, p=0.90>0.05) indicate that there were no significant differences between the assessment results of the IT students (Mean Rank=20.33) and non-IT students (Mean Rank =20.68) with regards to the importance of the timeline visualization. Based on frequency and percentage, it was found that 85% (n=17) of the IT students expressed positive opinions of “Agree” (10%, n=2) and “Strongly Agree” (75%, n=15). More non-IT students had positive opinions than the IT students, with 95% (n=17) stating “Agree” (25%, n=5) and “Strongly Agree” (70%, n=14). Based on these results, it can be concluded that both groups of participants found that the information displayed in the timeline visualization was important as it made them aware of the songs in the movie timeframe with just a glance. These results were also supported by the comments of participants, with both groups agreeing that the timeline visualization was important because it could help users when browsing for song information and making comparisons, and it could also shorten the time taken for the searching and browsing process. Based on observations made during the experiment, only 10% of the non-IT participants were able to complete the tasks given within 30 minutes when using a regular DVD player; in contrast, 75% of the IT students were able to complete the task within 30 minutes, using a regular DVD player. When doing the task, the majority of the non-IT students watched the video movie carefully and tried to get accurate answers for the task, whereas the IT students also tried to get the right answers for the tasks but they skipped much of the video movie content when they felt the contents were not related to the tasks. This difference resulted in the non-IT students needing more than 30 minutes to complete the tasks. A timeline visualization technique approach applied in the MSB


2457


system can improve the performance of both groups of participants. Based on observations made during the experiment, all participants could complete the task within 5 minutes when using the MSB system, so it can be concluded that the MSB system has the potential to improve participants’ process in browsing video movie content. Another question was based on the experience of participants while using the MSB system. Participants were asked if the system was easy to use and easy to learn, because the interface designs and timeline visualization may affect the usability of the system. The three questions regarding usability are if the MSB system is (Q1) easy to use, (Q2) easy to learn and (Q3) interesting (see Fig. 6). Based on frequency and percentage, it was found that 95%(n=19) of the IT students and 85% (n=17) non-IT students said that the Movie Song Browser system was easy to use, 95% (n=19) of the IT students and 75% (n=15) non-IT students said that the Movie Song Browser system was easy to learn and more than 90% of participants from both groups found the system interesting. Some of the participants said it was difficult to use because they found it a little difficult to understand the timeline as the timeline did not have any labels and there were too many steps involved and this make them feel confused. Furthermore, 20% of the non-IT students found the MSB system difficult to learn. This was because the content arrangement in the interface was a little messy between the timeline and video clips, so for future use, the MSB system needs to improve its content arrangement and timeline visualization to make the system look simpler, and to make it easy to use and easy to learn. The MSB system helps users improve in terms of experience and also in the process of browsing temporal media content. Both groups of participants had positive responses about the interface, timeline visualization and usability of the MSB system. The IT students gave feedback about the interface design and components such as navigation, use of icons and the scrolling function in the MSB system. Some of the comments given were: “use icon that's easy to understand and clear” and “arrange the content systematically”.

The non-IT students were majoring in media studies, and their feedback was on how the content of the songs were presented in the timeline visualization and the arrangement of the contents (video clips and timeline). Some useful feedback was obtained from the non-IT students: “put the labels in timeline so it can be more understandable” and “display the exact time for song length as labels at the timeline to give the users accurate song position and song length”. Based on overall comments, the IT students were more interested in the way the system worked and the system evaluation. On the other hand, the non-IT students were more interested in the media part, namely the way the song contents in the movie were presented and the way the contents were arranged. All comments given by users from different perspectives should be noted so that the MSB system can be improved for future use. Based on all the results, it can be concluded that both groups of participants found that the MSB system has a good interface design that is easy to understand, easy to learn and interesting. The MSB system also has a timeline visualization that has the ability to help and support the browsing process. 95% of all participants from both groups stated that they would like to use the MSB system again in the future. Overall the MSB system was very useful and helpful for both groups of users.

VI.

Conclusion

Based on previous studies and the results in this study, it is important to have an interface that users can use, and it is very helpful to present the song contents in movies using a timeline as it makes it easier to browse and more interesting for users as they are able to interact with the system. However, the way the interface is designed and the way the content is arranged is also very important as the more attractive, simple and easy it is to use, the more users will feel interested in and attracted to the system. The content must be arranged in the simplest way but also in an orderly manner so that it can improve user understanding, because if the content is arranged untidily, users may not only get the wrong idea about what the system wants to provide, but users may not even want to use the system because they find the system confusing and boring. Based on overall results, the Movie Song Browser system has the potential to help and support the process of browsing media and users are able to understand what the system tries to provide. For future use, the system still needs more improvement so that it can be used more effectively and efficiently.

Acknowledgements We would like to thank all participants who participated in the experiments. The project was supported by UKM-GGPM-ICT-113-2010.

Fig. 6. Frequency results on usability of the MSB system



2458


References [1]

[2]

[3]

[4]

[5]

[6]

[7] [8]

[9]

[10]

[11]

[12]

[13]

[14]

[15] [16]

Authors’ information

S. Kim, Interdisciplinary cooperation. In B. Laurel, (Ed.) and S.J. Mountford, The Art of Human-Computer Interface Design (Addison-Wesley Publishing Company, 1990). Wills, Craig E., User interface design for the engineer, Proceedings of the IEEE Electro 94 Conference on Combined Volumes.(Page: 415-419 Year of Publication: 1994 ISBN: 0-7803-2630-X). J. Spolsky, Controlling Your Environment Makes You Happy. http://www.joelonsoftware.com/ [Date of Accessed: 31th August 2013]. G. Daniel, M. Chen. , Video Visualization, Proceedings of the 14th IEEE Visualization Conference (Page: 409-416 Year of Publication: 2003 ISBN: 0-7803-8120-3). M.M. Yeung., Y. Boon-Lock. , Video visualization for compact presentation and fast browsing of pictorial content, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, n. 5, pp. 771-785, 1997. J. Graham, J.J. Hull, A paper-based interface for video browsing and retrieval, Proceedings of 2003 International Conference on Multimedia and Expo (Page: 749-752 Year of Publication: 2003 ISBN: 0-7803-7965-9). A. Kirk, Data Visualization: A Successful Design Process (Packt Publishing, 2012). Y. Chiricota, G. Melançon, Visually Mining Relational Data, (2007) International Review on Computers and Software (IRECOS), 2 (3), pp. 242 - 257. R. Ewerth., M. Mühling., T. Stadelmann., J. Gllavata., M. Grauer., B. Freisleben, Videana: a software toolkit for scientific film studies. Digital Tools in Media Studies: Analysis and Research-An Overview (Germany: Transcript-Verlag, 2009, 101116). O. Hoeber, J. Gorner, BrowseLine: 2D Timeline Visualization of Web Browsing Histories, Proceedings of 13th International Conference on Information Visualization (Page: 156-161 Year of Publication: 2009 ISBN: 978-0-7695-3733-7). P. Craigh, N. Roa-Seiler, A Vertical Timeline Visualization for the Exploratory Analysis of Dialogue Data, Proceedings of 16th International Conference on Information Visualization (Page: 6873 Year of Publication: 2012 ISBN: 978-1-4673-2260-7). C. Plaisant, B. Milash, A. Rose, S. Widoff, B. Shneiderman, LifeLines: Visualizing Personal Histories, ACM CHI'96 Proceedings of the SIGCHI Conference on Human Factors in Computing System (Page: 221-227 Year of Publication: 1996 ISBN: 0-89791-777-4). R. B. Allen, S. Nalluru, Exploring History with Narrative Timelines, Proceedings of the Symposium on Human Interface 2009 on Conference Universal Access in Human-Computer Interaction (Page: 333-338 Year of Publication: 2009 ISBN: 9783-642-02555-6). Google Developers, Visualization: Timeline, https://developers.google.com/ Date of Accessed: 31th August 2013] J. Preece, Y. Rogers, H.Sharp, Interaction Design: beyond human-computer interaction (John Wiley & Sons. Inc, 2002). E. Tufte, The Visual Display of Quantitative Information (Graphic Press, 1983).


1

Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Darul Ehsan, Malaysia. E-mail: [email protected] 2

Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Darul Ehsan, Malaysia. E-mail: [email protected] 3

Singapore University of Technology and Design, 20 Dover Drive, Singapore 138682. E-mail: [email protected] Munauwarah is currently working toward the Masters degree (Information Science) at the Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia. Her research interest includes interface design, information visualization and user evaluation and usability.

Nazlena Mohamad Ali received the Ph.D. degree from Dublin City University, Ireland in 2009. She is currently a research fellow at Institute of Visual Informatics (IVI), Universiti Kebangsaan Malaysia. Her research background is Human Computer Interaction in particular: interaction design, user evaluation and usability, information visualization, interactive multimedia system and user engagement. Hyowon Lee received the Ph.D. degree in Computer Science from Dublin City University, Ireland. He is currently a Visiting Scholar at MIT (2012 - 2013), and an Assistant Professor in the Pillar of Information Systems Technology and Design at the Singapore University of Technology and Design (SUTD).


2459


An Access Control Model of Web Services Based on Multifactor Trust Management R. Joseph Manoj1, A. Chandrasekhar2 Abstract – Web services are services that are made available from a business web server for web users or other web connected programs. Service requester accesses the web service from concern providers to avail the service in different manner such as peer to peer management and centralized server etc., when the service requester accesses the web service, service providers may follow different access control policies to restrict the malicious users or behavior. This paper propose a dynamic access control model to manage the trust value of service requesters based on multifactor such as network conditions, frequency of access, timeout, success rate, failure rate etc. Based on the trust value the honest and active users will be allowed to avail the service otherwise their trust value will be decreased and not be allowed to access the service. This kind of method would control malicious requesters to access the web services and encourage the requesters to take part in the access process honestly. This paper also verifies the performance and correctness of the proposed work based on simulation results from a prototype implementation. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Multifactor and Trust Management, Trust Based Access Control, Web Services Security

I.

Introduction

Web service is becoming a popular technology since it is a heart of many e-business systems and brings great economic benefits to people. Web services are loosely coupled applications using eXtensible Markup Language (XML), the Simple Object Access Protocol (SOAP), the Web Services Description Language (WSDL), and the Universal Discovery, Description and Integration (UDDI) for representing, communicating, describing and discovering across the Internet [1]-[19]. I.1.

Web Services Access Models

Access control model for a web service is to restrict the set of clients or subjects that can invoke the operations offered by the service. Because the clients are usually not known a priori, credentials are adopted to enforce access control. Credentials are assertions describing the properties that are used to establish trust between the clients and the web service. Access control policies define rules stating that only subjects with certain credentials satisfying specific conditions can interact with a web service. Hence access control of web services is required to cross the border of security domain, and to address the movement of unknown users across borders so that access to services can be granted [1]. Access control for Web services is already becoming the hot topic in the field of web services security too. Very few of them are given as follows. Manuscript received and revised September 2013, accepted October 2013

2460

Attribute-Based Access Control (ABAC) models [2] make use of attributes owned by the clients, the providers, and some other attributes related to the environment. Decisions are be made to allow or deny the request based on all these attributes. Role-Based Access Control (RBAC) [3] [7] is widely used Web Service access control scheme where clients are assigned roles that contain permissions in order to get a secure access to specific Web Services. Location-Based Access Control (LBAC) takes physical location of service requester's into account when determining their access privileges. Session Based Access Control (SBAC), here the background of a transaction is limited to a session. Access to resources is based on the attributes of the subjects and the properties of the objects but the rights that can be applied at a given time are limited based on the context defined by the access session. I.2.

Trust Based Access Control

Trust is a complicated issue that may be associated to other attributes such as security, honesty, reliability, accuracy, risk, utility, benefit, competence, belief, perception, expertise etc., Trust is defined in [3] as: “Trust is the dense belief in the ability of an entity to act as expected such that this dense belief is not a fixed value associated with the entity but rather it is subject to the entity’s behavior and applies only within a specific context at a given time”. From this definition many important properties of trust have been captured.


R. Joseph Manoj, A. Chandrasekhar

They are: (i) Trust is not a static value, it is dynamic. (ii) It is condition and time reliant. (iii) It depends on entity’s past and present behavior. Trust-based access control systems [4] are diverse from the previous access control schemes since clients trust levels are dynamically calculated based on some arithmetical analysis of behaviors, activities and previous access attempts. Thus, service violations and bad client behavior lead to a decrease of the trust level, whereas good behavior leads to an increase in the trust level. The rest of the paper is organized as follows. In Section 2, relative study has been conducted. In Section 3, The Paper proposes the dynamic access model based on multifactor trust. In Section 4, proposed system’s performance is analyzed. Section 5 concludes the proposed system. Section 6, finally discusses the future work.

II.

Motivation and Relative Study

Web services are at the heart of many e-business systems. Thus securing the web services is critical process. So that developing effective framework for accessing web service in unavoidable. In many applications such as E-Commerce, Internet Shopping, the interaction between subject and object is established on the basis of trust. Many examples show that the interaction is more reliable based on trust. So, it is very essential to implement the concept of trust in access control. In recent years, many researchers working in the trust based access control model of web services. There are number of researchers have done their work on trust based models in web services. Some of them are as follows. The Eigen Trust algorithm [5] uses the number of satisfactory transactions minus the number of unsatisfactory transactions as the rating of a peer for another peer. This reflects only the experience with his acquaintances, however by asking friends’ friends, the peer will have a complete trust view of the network. Similarly the TRELLIS system [6] uses distinct number to represent atomic credibility or consistency and combine them into a continuous trust value. Cesar Ali et al (2010) [7] proposed a new trust model to access the web services based on context and role of the services requester. Here too they failed to handle new user trust value effectively. Wang Meng et al [8] proposed a dynamic trust model which is based on recommendation credibility. This model suggested a way to differentiate honest and dishonest recommendation and adjust the weight of trust evaluation dynamically. Gao Ying et al (2010) [9] proposed a layered trust model based on behavior to enhance web service security and extensibility. This model is based on the problem in open service grids to establish trust relationship among different domains. The authors have proposed an algorithm to adjust trust relationships between domains based on entities interactions and also proposed a technique to process recommendation trust.


Kai Wei Shaohua Tang [10] proposed a multi-level trust evaluation model based on direct search in which service providers need to evaluate and manage the trust of all users. Bhanwar et al (2009) [18] proposed a trust model by computing reputation and trustworthiness of the transacting domain on the basis of number of past transactions and rated feedback score. Vivekananth et al (2010) [11] proposed a behavior based trust model which shows the behavior conformity and concentrated on behavior of entities in different domains, in different contexts. The total trust is calculated using direct trust and indirect trust. The behavior was tracked using a tracking module. Based on experiences with the entities, an entity trust level is increased or decreased. A penalty factor is levied for malicious behavior. The trust factor between two entities may depend on penalty, context and time. The penalty factor ranges between 0 and 1. A threshold value is used and if the total trust is greater than the required trust then the resource is allocated. Shangzhu Jin et al (2010) [12] proposed a model in which service requester trust value is calculated based on feedback and time decay. It fails to calculate the new user trust value. Tie-Yan Li et al (2010) [13] proposed a two level trust model and the corresponding trust metrics evaluation algorithms. In this model, the upper level defines the trust relationships among Virtual Organizations (VO) in a distributed manner. The lower level justifies the trust values within a grid domain. This model provided an integrated trust evaluation mechanism to support secure and transparent services across security domains. Wu Xiaonian et al (2009) [14] tried to quantify the entities trust according to the entity’s behaviors. This behavior trust computation model is based on risk evaluation. This model includes asset identification, threat identification and trust. Wang et al (2005) [15] proposed an access control model based on multi-factors trust. The model includes multi-factors trust computation, permission mapping and feedback module. Simulation results show that the model is suitable for access control in dynamic environments. Kamvar [16] implemented two basic ideas to combat malicious peers. One is that the current trust values of a peer must not be computed by and locate at the peer itself. The other is that malicious peers are assumed to also return wrong outcomes when they are supposed to calculate any peer’s trust level. Thus the trust value of a peer in the network context is computed by more than one other peer. Srivaramangai et al (2010) [17] proposed a trust model to improve trustworthiness in grid. According to their model, reputation based systems can be used in grid to improve the trustworthiness of transactions and trustworthiness is achieved by establishing mutual trust between the initiator and the provider. Indirect trust is taken as the measure from the reputation score of other entities. Unreliable feedbacks are eliminated using Spearman’s rank correlation method.


2461


The above models are not handled new users trust level effectively and consider very few factors to compute the trust for service requester. The service requester transaction would be considered as failed if server errors occur. Hence their trust level will be decreased. To avoid these kinds of issues, proposed work compute trust value using multifactor such as such as network conditions, frequency of access, timeout, success rate and failure rate.

III. Proposed System In order to implement proposed model to access the web service based on multifactor trust management, the system architecture is designed and trust management components are developed to manage the trust value. The access control model calculates trust value dynamically, each time service requester access the service based on various factors. In this system, access model is located in service provider place. This access control model calculating the trust value based on some statistical analysis of behaviors, activities and previous access attempts of requester. Thus, service violations and bad client behavior lead to a decrease of the trust level, whereas good behavior leads to an increase in the trust level. III.1. System Architecture The proposed system architecture shown in Fig. 1 consists of service requester (SR), service provider (SP), and trust management components such as Trust Decision Point (TDP), Trust Management Point (TMP), and Trust Negotiation Point (TNP). Since each trust management component is located separately in service provider place, it is easy to manage the trust value of requesters. The following diagram shows proposed access control model components and its relationship with each other. The system architecture has been divided as two parts 1. Service requester who accesses the service 2.Service provider who provides the service and manage the trust value. The working concepts of the system components are given as follows: The system will start its task once service requester sends a request to provider for accessing service through web. First web service located in provider’s location receives the request and sends the service requester request to trust management components. Trust management components calculate the trust value based on different factors as follows. III.2. Web Service (WS) The granularity of resources provided in the infrastructure is at the level of web services. Web services enable each participant organization to provide access to their internal functionality and data to other participants.


Fig. 1. System Architecture

III.3. Trust Decision Point (TDP) Trust decision point is the component of trust management components. Based on current trust value TDP takes initial decision of whether service requester allowed or denied to access the web services. In this model, the trust value ranges have been assigned from 0 to 10 and threshold value is 0. If registered requesters trust level is greater than threshold value then requester will be allowed to access the web services otherwise service access will be ignored. This initial trust level verification makes system more efficient by avoiding calculation of trust level for the requester unnecessarily. In the case of new requester, system assigns initial trust value of 0. If the requester is new to the system, TDP will not send request to calculate the trust value to TMP and TNP but allows the requester to avail the service from the provider and store the details of transaction. III.4. Trust Management Point (TMP) Trust management Point (TMP) is another component of trust management section. The basic idea is that after the initial trust level verification done by TDP, Trust Management Point (TMP) evaluates trust value using the following factors. 1. Success Rate named St, This specifies number of successful transactions of a service requester in access control for a specified time period. This time period can be assumed by service provider. 2. Failure Rate named Ft, This Specifies number of failure transactions of a service requester in access control for a specified time period. 3. Frequency of Access Af, It has the value of frequency of accessing the service by the requester. It has specified threshold value which can be assigned by service providers. If the requester’s Af is lower than International Review on Computers and Software, Vol. 8, N. 10

2462


threshold level, requester can be defined as lazy requester. 4. Time-out named To, This value specifies number of time-out occurs during resource access. This factor is used to recognize honestness of requester. 5. Average Time named At, Average time spent during service access in access control model. This factor also used to recognize honestness of requester. This model is classified service requesters into four types according to their history of behavior in the system. They are 1.Honest and active 2.Honest and inactive 3.Dishonest and active 4.Dishonest and inactive. Here is an example for the four types of user introduced above; if Time-out (To) and Average time (At) are greater than its defined threshold value (Vo), then requester is considered as honest else dishonest requester. If frequency of access (Af) is greater than its threshold level, then the requester is considered as active requester else inactive requester. Service provider can assign their own threshold value for the above mentioned factors such as To, At and Af. Trust value calculation will be varying according to user’s type. They are as follows: Type 1: If the requester is identified as honest and active, then requester’s decayed trust value (dt) will be increased as follows: dtt+1 = (dtt + (St+Af)*dtt) / (St+Ft)

(1)

Type 2: If the requester is identified as honest and inactive or dishonest and active, then requester’s decayed trust value (dt) will be decreased as follows: dtt+1 = (dtt – (St+Af)*dtt) / (St+Ft)

dt (St, Ft, Af, dt, To, At) { If (To>Vo and At> Vo and Af > Vo) //Honest and Active Users Calculate dtt+1 by using formulae (1) Else if (To>Vo and At> Vo and Af < Vo) // honest and inactive /dishonest and active Users Calculate dtt+1 by using formulae (2) Else if (To

,

(4) −

( ( , )−

)

,

ℎ

The following steps are used to locate the center point of the fingerprint and to extract an 8×8 pixels matrix around the detected center point:

Fig. 2. Flow diagram of the iris recognition



2509

Kamel Aizi, Mohamed Ouslim

-

Estimate the orientation field Calculate the field strength of the loop at each point in the orientation field, using the expanded field of the hidden orientation. - Normalize the resistance loop field row in a range from 0 to 1. - Perform a thresholding on the field loop to locate both the kernel and the center of the region. In order to extract the relevant features of the fingerprint, the Gabor filter was applied on the framed part (8 × 8 pixels) of the fingerprint following 8 different directions. The results are complex values which were encoded in order to obtain a binary vector of size 1024, representing the main features of the fingerprint image. Similarly, the comparison of the fingerprints was performed by the Hamming distance in order to obtain a score that represents the degree of dissimilarity between the prototype fingerprint code and the code of the fingerprint under test.

N-Input scores

Classification by decision tree

No

‘Identical’ Class

Decision: unidentified person (Different)

Yes

More than one class ‘Identical’

Decision: person is identified (Identical i)

Yes

VI.

Multibiometric Fusion Approach

It is at a scores level of classification for each unimodal biometric system that we have carried out the fusion. This operation was performed using the classification approach and the combination approach. Fig. 4 shows the general flow diagram of this fusion method. The N-input scores are made of N iris scores and N fingerprint scores, where N is the total number of persons registered in the database used. This fusion method consists of two steps, a fusion step and a decision step. We used a decision tree, developed after preliminary experiments carried out on the database used, as it is shown in Fig. 5. The main role of this tree is to help to classify the 2-D N-scores vector, in one of the two classes, either Identical (if the identity of the person is verified) or Different (if his identity is not verified). Then, the decision is made according to the result of the classification by the decision tree. In the case where there is no identical class, then the person is not identified (Different). In the case where there is a single identical class, then the person is identified (Identical person i) and in the case where there is more than one identical class, then the combination method of the scores by the sum is used only to combine both scores of the iris and the fingerprint corresponding to the found Identical classes. The purpose of this combination by the sum, is to seek the smallest combined score (iris score + fingerprint score) among the N combined scores of the identical classes. This obtained score represents the comparison score for the identified person (Identical person i). We have to note that the scores generated by the two unimodal identification systems, do not require a prior step of normalization [7] before the combination by the sum, because these scores are homogeneous (dissimilarity distance), and their values are within the same interval [0-1].


No

Combination of scores by the sum (iris + fingerprint) for classes 'Identical'

Search the small combined score (iris + fingerprint)

Decision: person is identified (Identical i)

End Fig. 4. Flow diagram of the proposed fusion method Input scores vector

ISi ≤ 0.31 Yes

No FSi ≤ 0.30

FSi ≤ 0.30 Yes Identical Class

Yes

No FSi ≤ 0.41 Yes Identical Class

ISi ≤ 0.34

No

Yes

No Different Class

No

Different Class Identical Class

Different Class

Fig. 5. The obtained decision tree: ISi-iris score, FSi fingerprint score


2510


The classification method by the decision tree is configured on the two used databases (the iris and the fingerprint) using a specific setting of a decision threshold. We cut space of scores in three zones by making tests on the learning databases; zone 1 is the certainty zone (Identical class), zone 2 is the uncertainty zone (undefined class), zone 3 is the certainty zone (Different class). The cutting of the space of the scores is based on a double thresholding principle as it is represented in Fig. 6. The main advantage of this type of thresholding is to allow the decision tree to define the undefined class of zone 2 (uncertainty zone) based on two scores, the iris score IS and the fingerprint score FS, since the decision in this zone for unimodal biometric systems, is not sufficiently reliable. 1

Iris score

Zone3 : certainty zone (Different class) 0.34 0.31

Zone2 : uncertainty zone (Undefined class) Zone1 : certainty zone (Identical class)

0

SHELLEXECUTEINFO IrisInfo; ZeroMemory(&IrisInfo, sizeof(IrisInfo)); IrisInfo.cbSize=sizeof(IrisInfo); IrisInfo.hwnd=NULL; IrisInfo.fMask=SEE_MASK_NOCLOSEPROCESS; IrisInfo.lpVerb=NULL; IrisInfo.lpFile= "ExecutableIris.exe"; IrisInfo.lpParameters=NULL; IrisInfo.nShow=SW_SHOWNORMAL; bool IrisReturn =ShellExecuteEx(&IrisInfo); bool FingeReturn = WinExec("ExecutableFingerprint",SW_SHOWMINIMIZED); if (IrisReturn && FingeReturn) WaitForSingleObject(IrisInfo.hProcess,INFINITE);

3) The graphical interface program wakes up as soon as both executable programs terminate. It reads two score files and its combines these scores to make the final decision. We should note that the processing time for the iris (ExecutableIris.exe) is higher than that for the fingerprint (ExecutableFingerprint.exe). The reason why, in the previous code, after launching these executables, the function WaitForSingleObject() blocks the execution of the graphical interface program as long as the executable of the iris is not completed.

VIII. Results and Comments

0.30

0.41

1

Fingerprint score

Fig. 6. Principle of the double thresholding

VII.

[LISTING 1] Code of the parallelism.

Parallel Processing

It is obvious that if the processing is serially performed, the overall response time for the multibiometric system might be degraded. In an attempt to speed up the execution, we propose a parallel processing method which consists to run simultaneously the two unimodal biometric systems, by exploiting the inherent parallelism of the used hardware platform [35], [36]. The system was implemented under the C ++ Builder development environment. We have created two executable programs, corresponding to the two identification unimodal systems. In addition we developed a graphical user interface program, that managed the processing task covering the fusion step of unimodal systems. The exact program’s code used for this purpose is given below (listing 1). This code achieves the following main steps: 1) The graphical user interface program starts running in order to launch the two executable programs, and it enters a standby state. 2) These tasks are processed in parallel in a multicore processor. Each executable ends in generating a corresponding scores file.


Several tests were undertaken using the two modalities separately and then applying the proposed fusion method. During the experiments we used two standard databases: (1) The iris database CASIA-IrisV4 [29], which contains 6 subsets of different types. We used 220 samples of individuals with 5 samples for each individual, which gives 1100 iris images. (2) The fingerprint database CASIA-FingerprintV5 [30] which contains 20,000 fingerprint images (left and right thumb / index / middle / ring finger) from 500 individuals (each individual contributed 5 samples per finger). Similarly, we used 220 samples of individuals (left index finger), which gives 1100 fingerprint images. In our case, we created a multimodal database, which contains 220 virtual individuals, represented by the iris and the fingerprint signatures, by associating with each individual 5 images of each modality. We divided this database into two parts, the first one contains 180 individuals who were registered in the database, and the second part contains 40 individuals who were not registered in the database, i.e. 40 × 5 = 200 attacks by non-enrollees. Then, for each modality of each registered individual, we enrolled three extracted vector features (from 3 samples) in the database, and we left 2 samples for the test (180 × 2 = 360 access by enrollees). The objective is to fuse the two unimodal systems on this multimodal database and to compare the


2511


performance of the multimodal system with those of unimodal systems. To evaluate the performance of the proposed system, we used three metrics. The FAR (False Accept Rate) is the probability that an impostor is accepted as a genuine individual, the FRR (False Reject Rate) is the probability that a genuine individual is rejected as an impostor, and the EFAR (Enrollee False Accept Rate) is the probability that an enrollee is accepted as another enrollee [31] [32]. Furthermore, we calculated the Recognition Rate TR. In order to adequately select the decision thresholds, we carried out preliminary tests that allow us to empirically represent the three selected metrics in terms of the decision thresholds. As a result, the three error rate types are plotted according to the decision threshold for the iris and the fingerprint unimodal systems respectively in Figs. 7 and 8. In Fig. 7 and Fig. 8, we can clearly notice that when the decision threshold is low, the EFAR and FAR are both low and the FRR is approximately high in the case of the two unimodal systems. However, when the decision threshold is high greater than 0.5, the EFAR and FAR are both approximately high and the FRR is relatively low. 1 EFAR FRR FAR

0.9 0.8 0.7

Error rate

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4 0.5 0.6 Decision threshold

0.7

0.8

0.9

1

Fig. 7. Curve of changes in error rates according to the decision threshold for unimodal system of iris EFAR FRR FAR

0.8

Model Unimodal System Iris Unimodal System Fingerprint Multimodal System

TR

4.44 %

10.83 %

7.50 %

84.73 %

7.78 %

16.67 %

13.5 %

75.55 %

1.39 %

4.44 %

1.50 %

94. 17 %

As it is illustrated in Table I, we note that the final fusion system improved significantly the three error rates, by giving values less than 1.5% for the EFAR and the FAR and a value less than 4.5% for the FRR and it enhanced also the recognition rate to 94.17% compared to the best result given by the iris unimodal corresponding to 84.73 %. These results allowed us to confirm that the proposed method of fusion provides a multibiometric system that outperforms both unimodal systems. All the tests were performed on a personal computer based on a multi-core processor running at 2.3 GHz, with 2 GB of RAM. We calculated the response time of each unimodal system taken alone, and the response time of the combined multibiometric system for the identification of a person among 180 enrollees, which consists of a total of 1080 comparisons. The corresponding response times are represented in Table II.

Unimodal System Fingerprint Unimodal System Iris Scores Fusion Multibiometric System

0.7 0.6 Error rate

TABLE I REPRESENTATION OF THE PERFORMANCE METRICS Rate EFAR FRR FAR

TABLE II PROCESSING TIMES Model Time in seconds

1 0.9

Once the threshold values were selected for both unimodal systems and the decision tree with the principle of the double thresholding, corresponding respectively to the intervals [0.31 - 0.34] for the iris and [0.30 - 0.41] for the fingerprint, we computed the three performance metrics, as well as the recognition rate by exercising the database used, for both unimodal systems and the multimodal system based on the proposed method of fusion. Table I summarizes the obtained results.

0.5

2.511 s 9.781 s 0.001 s 9.782 s

0.4

Table II shows clearly that the response time of the overall multibiometric system is almost equal to the response time of the unimodal system of the iris and it is less than 10 Seconds. Consequently, we can state that the proposed parallel processing method, serves to enhance the overall response time of the multimodal biometric system compared to the case where the two unimodal systems are handled serially.

0.3 0.2 0.1 0

0

0.1

0.2

0.3

0.4 0.5 0.6 Decision threshold

0.7

0.8

0.9

1

Fig. 8. Curve of changes in error rates according to the decision threshold for unimodal system of fingerprint



2512


IX.

Conclusion

In this work, we succeeded to combine two biometric modalities, the fingerprint and the iris in order to provide a new multibiometric identification system based on a statistical method of classification. The obtained results derived from various tests indicate the enhancement of the performance metrics. We also implemented a parallel technique which helped in improving the identification response time by taking into consideration the inherent parallel architecture of the hardware platform used during the experiments. Offering, therefore, a good compromise between the recognition rate and the identification response time. It is worth noting that the successive comparisons based on the hamming distance for a large database is a time consuming operation, presenting the main bottleneck of such an identification system. Our next work continues in this direction in order to reduce the identification response time by applying different classifiers [33], such as neural networks and selecting the best one that guarantees the good compromise of the performance metrics.

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

References [1]

[2]

[3]

[4] [5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

A. K. Jain, A. Ross, S. Prabhakar, An introduction to biometric recognition, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, n. 1, pp. 4-20, 2004. A. K. Jain, A. Ross, Multibiometric systems, Communications of the ACM, special issue on multimodal interfaces, Vol. 47, n. 1, pp. 34-40, 2004. L. I. Kuncheva, C. J. Whitaker, C. A. Shipp, R. P. W. Duin, Is independence good for combining classifiers?, Proceedings of International Conference on Pattern Recognition (ICPR), vol. 2, pp. 168-171, Barcelona, Spain, 2000. A. Ross, A. Jain, Information fusion in biometrics, Pattern Recognition Letters, Vol. 24, n. 13, pp. 2115-2125, 2003. A. Ross, A. K. Jain, Multimodal biometrics : An overview, Proceedings of 12th European Signal Processing Conference (EUSIPCO), pp. 1221-1224, Vienna, Austria, 2004. M.J. Sudhamani, M.K. Venkatesha, K.R. Radhika, Revisiting Feature level and Score level Fusion Techniques in Multimodal Biometrics System, Proceedings of International Conference on Multimedia Computing and Systems (ICMCS), pp. 881-885, 2012. A. Jain, K. Nandakumar, A. Ross, Score normalization in multimodal biometric systems, Pattern Recognition, Vol. 38, n. 12, pp. 2270-2285, 2005. J. Kittler, M. Hatef, R. Duin, J. Matas, On Combining Classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, n. 3, pp. 226-239, 1998. Y. Wang, T. Tan, A. Jain, Combining face and iris biometrics for identity verification, Proceedings of Fourth International Conference on Audio- and Video-Based Authentication (AVBPA), pp. 805-813, Guildford, U.K., 2003. A. Jain, A. Ross, Learning User-specific Parameters in a Multibiometric System, Proceedings of International Conference on Image Processing (ICIP), pp. 57-60, New York, USA, 2002. C. Sanderson, K. Paliwal, Information fusion and person verification using speech and face information, Tech. Rep. IDIAP-RR 02-33, IDAIP, 2002. Y. Tong, F.W. Wheeler, X. Liu, Improving Biometric Identification Through Quality based Face and Fingerprint Biometric Fusion, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 53-60, 2010. M. Vatsa, R. Singh, A. Noore, Integrating image quality in 2v-


[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29] [30] [31]

[32]

[33]

[34]

SVM biometric match score fusion, International Journal of Neural Systems, Vol. 17, n. 5, pp. 343-351, 2007. Z. Yaghoubi, K. Faez, M. Eliasi, A. Eliasi, Multimodal Biometric Recognition Inspired by Visual Cortex and Support Vector Machine Classifier, Proceedings of International Conference on Multimedia Computing and Information Technology, pp. 93-96, 2010. M. Vatsa, R. Singh, A. Noore, A. Ross, On the Dynamic Selection of Biometric Fusion Algorithms, IEEE Transactions on Information Forensics and Security, vol. 5, n. 3, pp. 470-479, 2010. A. Kumar, Y. Zhou, Human Identification Using Finger Images, IEEE Transactions on Image Processing, vol. 21, n. 4, pp. 22282244. 2012. K. Nandakumar, A. Ross, A. K. Jain, Incorporating ancillary information in multi biometric systems, Handbook of Biometrics, (New York: Springer- Verlag, 2007, 335-355) S. JadAllah, Al-Hijaili, M. AbdulAziz, Biometric in health care security system, Iris-Face fusion system, international journal of academic research, vo1. 3, n. 1, pp. 1-11, 2011. Y. Zang, X. Yang, K. Cao, X. Jia, N. Zhang, J. Tian, A ScoreLevel Fusion Method with Prior Knowledge for Fingerprint Matching, Proceedings of International Conference on Pattern Recognition (ICPR), pp. 2379 – 2382, Tsukuba, Japan, 2012. H.B. Prajapati, S.K. Vij, Analytical Study of Parallel and Distributed Image Processing, Proceedings of International Conference on Image Information Processing (ICIIP), pp. 1-6, 2011. T. Kruger, J. Wickel, K. Kraiss, Parallel and Distributed Computing for an Adaptive Visual Object Retrieval System, Proceedings of the 17th International Parallel and Distributed Processing Symposium, (IPDPS), France, 2003. U. Ali, J.A. Taj, T. Hussain, M. Hussain, Real-Time Efficient Parallel Thermal and Visual Face Recognition Fusion, Proceedings of International Conference on Electro/information Technology, pp. 569-574, 2006. T. Keatkaew, T. Achalakul, Real-time Parallel Face Recognition Using Eigenfaces, Proceedings of International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), Vol. 2, pp. 149-1150, jeju, Korea, 2005. G.G. Slabaugh, R. Boyes, X. Yang, Multicore Image Processing with OpenMP, IEEE Signal Processing Magazine, vol. 27, n. 2, pp. 134-138, 2010. J. Daugman, High confidence recognition of persons by rapid video analysis of iris texture, European Convention on Security and Detection, pp. 244-251, 1995. J. Daugman, How Iris Recognition Works, IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, n. 1, pp. 2130, 2004. L. Chaorong, F. Bo, L. Jianping, Y. Xingchun, Texture-Based Fingerprint Recognition Combining Directional Filter Banks And Wavelet, International Journal of Pattern Recognition and Arti¯cial Intelligence, vol. 26, n. 4, pp. 1-20, 2012. A. K. Jain, S. Prabhakar, L. Hong, S. Pankanti, Filterbank-based fingerprint matching, IEEE Transactions on Image Processing, vol. 9, n. 5, pp. 846-859, 2000. CASIA-IrisV4. http://biometrics.idealtest.org/. CASIA-FingerprintV5. http://biometrics.idealtest.org/. T. Murakami, K. Takahashi, Accuracy Improvement with High Convenience in Biometric Identification Using Multihypothesis Sequential Probability Ratio Test, Proceedings of the First IEEE International Workshop on Information Forensics and Security, pp. 66-70, 2009. T. Murakami, K. Takahashi, Fast and Accurate Biometric Identification Using Score Level Indexing and Fusion, International Joint Conference on Digital Object Identifier, pp. 18, 2011. M. Ouslim, Iris identification using the pRAM neural network, Revue Courrier du Savoir scientifique et technique, Université Mohamed khider Biskra,n°12, pp75-78,Octobre 2011,ISSN1 1123338. Aravinth, J., Valarmathy, S., Score-level fusion technique for multi-modal biometric recognition using ABC-based neural


2513


network, (2013) International Review on Computers and Software (IRECOS), 8 (8), pp. 1889-1900. [35] Ahmadi, A., Mashoufi, B., A new optimized approach for artificial neural networks training using genetic algorithms and parallel processing, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2195-2199. [36] Gu, Y., Shi, G., Zhao, D., Sun, Y., The study of CUDA-based ELA algorithm for de-interlacing, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 3042-3046. [37] Thiyaneswaran, B., Padma, S., Human authorization using wavelet and tensor object analysis of the iris biometrics, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 3047-3055.

Authors’ information LMSE Lab. Electronics dept. University USTO MB, Oran, Algeria. Kamel Aizi got a license's degree in Industrial Informatics in 2009 and the Master’s degree also in Industrial Informatics in 2011 from University USTO MB Oran (Algeria). He is now a Ph.D. student in Intelligent Systems at the University USTO.His current main research interests include Multimodal Biometric Systems, Parallel Processing and Multitasking Systems. E-mail: [email protected] Mohamed Ouslim Engineer’s in electronics USTO (Algeria) 1985 MSc in computer engineering Ohio state university (USA) 1989. PhD in electrical and electronic engineering University of Nottingham (UK) 1997. He is an Associate professor in electronics dept. USTO since 2006 and Member of Microsystems and embedded systems lab. His current research interests cover image processing, wireless sensor networks, embedded systems and parallel processing. E-mail: [email protected]



2514


Computed Tomography Images Restoration Using Anisotropic Diffusion Regularization Faouzi Benzarti, Hamid Amiri Abstract – The CT scan imaging system is one of the most interesting non-invasive radiological methods allowing the generation of tomographic images of all parts of the human body. However, CT images are corrupted by noise and blur due to the imperfections and the physical limitations of the imaging systems. Increasing the spatial resolution of these images leads to a good interpretation by the clinician. In this paper, we propose a new approach to improve the quality of the CT images. Our method is based on the anisotropic diffusion regularisation incorporates an adaptative smoothness constraint in the deconvolution process. That is, the smooth is encouraged in a homogeneous region and discourage across boundaries, in order to preserve significant image details. The blur component is estimated by an iterative blind deconvolution approach and incorporated in the restoration process. Experimental results show a good performance and are very promising for future research. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: CT Images, Image Restoration, Regularization, Blind Deconvolution, PDE, Anisotropic Diffusion

For this reason, there has been considerable effort aimed at developing methods and techniques to improve the images quality that might provide a consistent aided diagnostics. However, image restoration process could produce a high quality images without modifying the imaging systems [19], [20], [24]. It refers to the task of recovering a good estimate of the true image from a degraded observation. In many imaging applications, the image degradation process can be adequately modeled by a two-dimensional convolution of the true image f(x, y) with a linear shiftinvariant blur known as the Point Spread Function (PSF) h(x,y) and a random additive noise n(x,y). That is:

Nomenclature g(x,y) f(x,y) h(x,y) | | ∅ Ω ∂Ω n MSE Div

Degraded Image Original Image Blur Kernal Gradient modulus of f(x,y) Edge preserving function Image domain Image boundary Normal vector Mean square error Variance of the additive noise Divergence operator

I.

Introduction

g  x, y    h  f  x, y   n  x, y 

Medical imaging is an incontestable vital tool for diagnosis, it provides in non-invasive manner the internal structure of the body to detect eventually diseases or abnormalities tissues. The Computed Tomography CT technique involves passing a series of X-rays through an object, and measuring the change in intensity or attenuation of these X-rays by placing a series of detectors on the opposite side of the object from the Xray source [1]-[31]. The measurements of X-ray attenuation are called projections and are collected at a variety of angles to produce a 2D cross sectional image [3]. Unfortunately, the images quality is corrupted by blur and noise due to the imperfections and the physical limitations of the imaging systems.


2515

(1)

where *: denotes convolution operator, g: denotes the degraded CT image, f : the original image, h: the blurring kernel known as the point spread function (PSF), n : is a Gaussian white noise with zero mean. Most restoration techniques model the PSF h(x,y) and attempt to apply an inverse procedure to obtain an approximation of the true image. Or in many practical situations and especially in medical imaging, the PSF is unknown and must be estimated from the blurred image itself. The process of estimating both the true image and the blur from the degraded image are called: blind deconvolution. The problem of blind deconvolution has been extensively studied for its practical importance as well as


Faouzi Benzarti, Hamid Amiri

its theoretical interest and several methods and algorithms have been proposed [23], [21], [26], [27]. However, the majorities of these methods lead to an under optimal solution and are not effective regardless preserving edges and discontinuities. This disadvantage is primarily due to the ill posed problem of the restoration process which requires an adequate and robust regularization [1], [2]. We remind that an ill-posed problem is one that does not satisfy three conditions in the Hadamard sense, namely: existence, uniqueness and stability. The condition of stability may cause the main problems because if it is not fulfilled, small errors in the data may lead to large errors in the solution. Solving an inverse problem is difficult because many different images can appear to be nearly the same after they are distorted. To adapt effectively the local structures of the image and to preserve discontinuities, it should be recommend the use of a non linear regularization. Anisotropic regularization is one of the most promising approaches to consider which is based on the Partial Derivative Equations (PDE) approach [25], [16], [4][5][9]. The use of the PDE in image processing is very recent and has shown their powerful and successful for studying a variety of problems including image segmentation, mathematical morphology and image denoising. In our approach, we first estimate the blur or the PSF of the imaging system by using an iterative blind deconvolution, and then we incorporate it to the PDE anisotropic scheme to estimate the original image. The paper is organized as follows: In section 2, we present the concept of the anisotropic regularization. In section 3, we describe the proposed method. In section 4, we report numerical and experimental results.

II.

The solution of the variational problem of Eq. (2) must satisfy the Euler-Lagrange equation [13] [02], which give a necessary condition that must be verified by f to reach the minimum of Jλ , that is:   '  f   J  h   h  f  g    div  f   0  f   

where: ĥ denotes the mirror-kernel ĥ(x,y):=h(-x,-y), and div : stands for the divergence operator. The Euler-Lagrange Eq. (3) can be solved by considering the steady state of the temporal equation:     f   f  h   h  f  g    div  f    t  f  f 0 n 

J   f ,h  

2

  g  h  f  d       f d  

where: n represents a normal vector to the boundary ∂Ω of Ω. We have to note that the nonlinear PDE (4) is closely related to the anisotropic diffusion of Perona and Malik [17], where an additional term forces the solution to remain to the data. This equation can be solved by using a gradient descent method, in which we substituting ∂f/∂t with the discrete difference (fn+1-fn)/∆t in (4). This leads to the following iterative equation [23]:

f n 1  x, y   f n  x, y  



(2)

where: Ω is an open bounded set in ℝ2. The first term in the integral is the square error between the data and the candidate images, which ensure the fidelity of the data. The second term uses a monotonically increasing function Φ(·) to enforce the smoothness of the deconvolved image. The regularization parameter λ balances the influence between the two terms. In the particular case of λ=0, the energy is reduced to the attached term on the data, and the problem corresponds to the least-squares method which leads to unstable solution.




 h   x,  y   g  x, y   f n  x, y   h  x, y        (5) n n   t    f f      div    n f        





(4)

f  t  0   f0

Anisotropic Regularization

As mentioned above, the regularization process has been introduced in order to overcome the instability of the inverse problem in the presence of noise. In the image restoration context, the anisotropic regularization can be formulated as an optimization problem, whose cost function to be minimized has the following form [4]:

(3)



The choice of the diffusivity function Φ(.) is very important to ensure a satisfied solution. The case where Φ has a pure quadratic form:   f

  f 2

which

corresponds to a quadratic regularization or a Tikhonov regularization [1], leads to an over-smoothed solution, because high gradients at edges of the reconstructed image are penalized over-proportionally. However, the choice of the function Φ(·) as an edge preserving function must verify the following conditions, while posing t  f [10] [16]:

'  t   m  0 : preservation of discontinuities for t 0 t

i) lim

high gradients. ii)

'  t  : smoothing inhomogeneous area or isotropic 2t diffusion for low gradients.


2516


'

iii) lim

t

0:

strictly

decreasing

to

avoid

t t instabilities. Among functions satisfying conditions -i) -ii) –iii), we choice the exponential function of Perona [17]:

f

k2 1 exp 2

f

2

k2

f

1

x, y hˆ n x, y ˆf n x, y ˆf n

altering edges. To avoid this unstable solution to occur, we use a regularized (or smoothed) version of the image gradient, the gradient of f is then replaced by:

G

hˆ n

ˆf n

(6)

This function introduces a parameter k which acts as a threshold which determines whether to preserve edges or not. Areas in which the gradient magnitude is lower than k will be blurred more strongly than areas with a higher gradient magnitude. This tends to smooth uniform regions, while preserving the edges between different regions. However, the parameter k can lead to a backward diffusion, when its value is smaller than f , and thus

fR

of the image f(x,y) and the estimate of the blur h(x,y). After an initial estimate of f and h, we have:

(7)

where Gδ : is a Gaussian with standard deviation δ.

1

x, y

ˆn

h

g x, y x, y ˆf n x, y

(8a)

ˆf n x, y hˆ n 1 x, y

x, y

(8b) hˆ n

1

x, y

hˆ n

g x, y x, y ˆf n x, y

1

The stopping criterion is determined by the relative norm error (RNE):

Obviously, this method can also estimate the original image, but the problem lays in the Gibbs oscillations nearly discontinuities in the restored image. III.2. Parameters Estimation

III. Proposed Method The proposed method is summarized in Figure 1. From the degraded image g(x,y) as input, we first estimate the blur model h(x,y) by using an iterative blind image deconvolution [12]. Once the PSF is estimated, it is then incorporated in the PDE anisotropic Eq. (5) to restore the original image.

The performance of the algorithm is widely bounded of two parameters: k and λ. These parameters are dependent on each other and may lead to different results if chosen inappropriately. The parameter k can be chosen by considering a robust statistical measure [18]: k

1.482MAD

(9)

f

where “MAD”: denotes the median absolute deviation, expressed by:

MAD

median

f

median

f

(10)

The parameter λ can be estimated by considering the blur signal to noise ratio (BSNR) [15]: Fig. 1. Flowchart of the Proposed Method

In the following sub-section, we will give some details about the estimation of the blur and the parameters of the models. III.1. Blur Estimation As mentioned above, the blur h(x,y) can be estimated by an iterative blind deconvolution based on the Richardson-Lucy algorithm [12][15]. This technique is widely used for restoring astronomical images. It follows an iterative procedure, alternating between the estimate Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

1 BSNR BSNR 10 log10

Var y

(11)

2

where: Var(y): the variance of the blur signal; : the variance of the additive noise. The variance of the noise can be estimated approximately using the Discrete Wavelet Decomposition (DWT) [11] which is an important tool for separating noise from the image signal. The signal to International Review on Computers and Software, Vol. 8, N. 10

2517


be analyzed is passed through filters with different cutoff frequencies at different scales. The image is divided into four sub-bands as shown in Fig. 2. LL1 LL2

TABLE I COMPARISON RESULTS Methods PSNR (dB) Quadratic method 18.01 IBD method 19.83 BA method 13.21 Proposed method 24.83

HL2 HL1

LH2

HH2

LH1

HH1

Fig. 2. Illustration of two levels DWT decomposition

The output decompositions give the detail coefficients labeled LH1, HL1 and HH1 (from the high-pass filter), and the approximation coefficients (from the low-pass) labeled LL1 which corresponds to coarse level coefficients. The noise variance σ2 can be estimated from the subband HH1 in the first scale, by using the following equation [11]:

2 

 

median Yij 0.6745

(12)

where Yij : wavelet coeffients in the sub-band HH1.

IV.

Figs. 3. -a-Original test Image -b- Blurred and noised image (SNR=14.48 dB), -c- BA method,-d- Quadratic method, -e- IBD method, -f- Proposed method

Results and Discussion

The performance of the algorithm was first tested with an artificially generated blurred image 128×128 sized pixels Fig. 3(b), obtained by convolving a Gaussian blur sized 7×7 and added with a zero-mean Gaussian noise with SNR=14.48dB. The regularization parameters values used in this procedure are estimated to: K=0.22, λ=0.51. The restored image in Fig. 3(f), shows a significantly improvement: edges and discontinuities have been recovered and preserved with a good suppression of noise. Compared to some blind deconvolution method as: quadratic method [6], BiggsAndrews method (BA) [7], and the Iterative Blind Deconvolution method (IBD) [21], we note that the proposed method has the best PSNR value as shown in Table I, defined by:

PSNR  10 log10

 N max 2 MSE

In Fig. 4, we report a comparison of the PSNR versus SNR for the four methods. It is incontestable that the proposed method has the higher PSNR.

(13)

where MSE : the mean square error between the original image and the restored one. Nmax : the maximum pixel value in the image. Fig. 4. Comparison results



2518


The second test was performed with a real Abdominal CT-Scan CT Scan image from Web database [28 [28], F Fig. 5. The rregularization egularization paramet parameters ers are estimated to: K=0.092, λ=0.48. The Results are very promising and show a good image restoration in in terms of noise reduction, deconvolution and preservation of image details. tails. Some fine structures also have been enhanced.

Fig Fig. 6,, shows the estimated blur component h((x,y)) of the CT brain image using the iterative blind deconvolution scheme of Eq. (8). In a context of linearity, we have assumed that the PSF is space invariant that is not always true and could change the results significantly. However, this work opens up a wide range of opportunities for the image quality range improvement.

V.

Conclusion

In this paper, we have proposed a new approach of blind image restoration using the anisotropic regularization via the Partial Differential Equations (PDE). The key idea behind this anisot anisotropic ropic approach is to incorporate an adaptative smoothness. That is, the diffusion is encouraged in the homogeneous region an andd discourage across boundaries. This new approach which associates deconvolution and regularization, should contribute to eliminate the blur, reducing noise, and preserving natural edges of the image. The clinical application of this approach is very promising and offers a great flexibility of processing, insofar as it is very helpful to assist radiologist in their quest and diagnostic diagnostic.. Future work will include automatic parameters estimation, deconvolution with variant PSF and 3D medical image deconvolution.

References [1] [2]

[3] [4]

[5]

[6]

Fig. 55.. Above : Original Abdominal CT Scan image, Below: CT--Scan Restored image

[7]

[8]

[9]

[10]

[11]

A. Tikhonov, V. Arsenin, Solutions of Ill Ill-Posed Posed Problems Problems,, Washington, DC, Winston and Sons, 1977. A.C. Likas, N.P. Galatsanos, A Variational Approach for Bayesian Blind Image Deconvolution Deconvolution,” ,”, IEEE Transactions on Signal Processing, Vol. 52 (8) (8),, 2004 2004.. A. C. Kak and M. Slaney,(1988) Slaney,(1988),, Principles of Computerized Tomographic Imaging Imaging,, IEEE Press,1988 D. Tshump Tshumperlé, erlé, R. Dériche, (2002) ”Diffusion PDE’s on Vector Valued Images, Local Approach and GeometricView GeometricView-point”, point”, IEEE Signal Processing Magazine Magazine-Special Special Issue on Mathematical Methods in Imaging, Vol. 19 ,n°.5,pp.15 ,n°.5,pp.15--25, 25, March 2002. D. Tschumperle, R. Deriche Deriche,’’Diffusion Diffusion PDE's on Vector Vector-Valued Valued images’’, IEEE Signal Processing Magazine, 19 (5) (2002) pp.15pp. 25 Molina, R. Mateos, J Katsaggelos, A.K. , Blind Blind Deconvolution Using a Variational Approach to Parameter, Image, and Blur Estimation Estimation, IEEE Trans. On Image Processing, 2006, pp.3715 pp.3715-3727. D.S.C Biggs, M. Andrews, ‘’ ‘’Asymmetric Asymmetric Iterative Blind Deconvolution of Multiframe Images’’ Images’’,, Proceedings SPIE, 33 (3461) 1998. D.P.K Lun, T.C.L Chan, T.C. Hsung, D.D. Feng, Y.H Chan, ‘’Efficien Efficien Efficientt Blind Image Restoration Discrete Periodic Radon Transform’’, IEEE Transactions on Image Processing, 13 (2), February 2004. Li, H.C. H.C.;Fan, Fan, P.Z; P.Z; Khan, M.K. Context Context-adaptive adaptive anisotropic diffusion for image denoising denoising,, Electronic Letters Vol 48, 2012 , pp.827-829.. pp.827F. Benzarti, K. Hamrouni, H. Amiri Amiri,” ,” Mammographic Image Restoration using Anisotropic Regularization Regularization”” IEEE International Conference on Machine Intelligence, ACIDCA-ICMI ACIDCA ICMI 2005, Tozeur-Tunisia TozeurTunisia,, November 2005. S. Mallat, A Wavelet Tour of Signal Processing, 2nd ed. San Diego, CA: Academic, 1999.

Fig. 6. 6 Mesh representation of the estimated blur component



2519


[12] L.B. Lucy, An Iterative Technique for The Rectification of Observed Images, Astronomical Journal, Vol. 79 (6) (1974) pp:28-37. [13] Zheng Huang;Jingxin Zhang;Cishen Zhang, Ultrasound image reconstruction by two-dimensional blind total variation deconvolution, IEEE International Conference on Control and Automation, 2009. ICCA2009. pp: 1801 – 1806 [14] M.Mignotte, J.Meunier, S-P.Soucy, C.Janicki, ‘’Comparison of Deconvolution Techniques using a Distribution Mixture Parameter Estimation: Application in Single Photon Emission Computed Tomography Imagery’’, Journal of Electronic Imaging, Vol. 11 (1), January 2002. [15] M.R. Banham, A.K Katsaggelos, Digital Image Restoration, IEEE Signal Processing Magazine, pp.24-41, Vol 14 (2), March 1997. [16] Mingjun Wang ; Shuxian Deng, Image Restoration Model of PDE Variation, Second International Conference on Information and Computing Science, 2009. ICIC '09. Vol. 2, pp. 184-187. [17] P. Perona and J. Malik, “Scale-space and edge detection using anisotropic diffusion,” IEEE Trans. Pattern. Anal. Machine Intell., vol.12, pp. 629–639, 1990 [18] P.J. Rousseau , A.M. Leroy, ‘’Robust Regression and outlier detection,’’ New york; Wiley 1987. [19] R.C. Gonzalez,”Digital Image Processing”, Second Edition, Prentice Hall, 2002. [20] R. Molina, J. Nunez, F.J. Cortijo, J. Mateos,’’ Image Restoration in Astronomy,’’ IEEE Signal Processing Magazine, March 2001, 11-29 [21] GR. Ayers, J.C. Dainty,”Iterative Blind Deconvolution”,Optics Letters, Vol 13n°7,pp.547-549,1988 [22] Y.L. You, M.Kaveh “Blind Image Restoration by Anisotropic Regularization” IEEE Trans. On IP , 8 (3), March 1999, 396-407 [22] Fahmy, M.F.; Abdel Raheem, G.M.; Mohammed, U.S.; Fahmy, O.F.; “A Fast Iterative Blind image restoration algorithm” 28th IEEE Conference on National Radio Science Conference (NRSC); pp1-8, 2011. [23] F. Benzarti; H. Amiri, Blind Photographic Images Restoration With Discontinuities Preservation, International Journal of Computer Information Systems and Industrial Management Applications (IJCISIM), Vol. 4, pp. 609-618, 2012. [24] F. Benzarti, M. Askri, K. Hamrouni, CT images Restoration, International Conference E-Medical Systems, E-Medisys, Hammamet-Tunisia, October 2008. [25] Shoulie Xie, Rahardia S, Alternating Direction Method for Balanced Image Restoration, IEEE Trans. On Image Processing, Vol. 21, pp.4557-4567,2012. [26] Chao Dong;Meihua Xie, A blind image restoration algorithm based on nonlocal means and EM algorithm, International Conference on Audio, Language and Image Processing (ICALIP),pp.485-489,2012. [27] Li Chen; Kim-Hui Yap, Efficient discrete spatial techniques for blur support identification in blind image deconvolution, IEEE Trans. On Signal Processing,Vol. 54 ,(4): 2006 , Page(s): 1557 – 1562 [28] http://www.bmlweb.org/image.html [29] Fan, H., Zhu, H., Zhu, G., Liu, X., Improvement of wood ultrasonic CT images by using time of flight data normalization, (2011) International Review on Computers and Software (IRECOS), 6 (6), pp. 1079-1083. [30] Liu, W., An image restoration algorithm based on image fusion, (2012) International Review on Computers and Software (IRECOS), 7 (3), pp. 1245-1249. [31] Wu, Q., Wang, K., Zuo, W., Total variation-based image restoration using I-divergence, (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 668-672.


Authors’ information Mr Faouzi Benzarti is an assistant professor in the High School of Techniques and Sciences of Tunis (ESSTT). He received Engineer’s degree in Electrical Engineering from the National Engineering School of Monastir, and his M. S degree in Biomedical Engineering from the Polytechnic School of Montreal CANADA (Ecole Polytechnique de Montréal). He obtained his Ph.D degree in Electrical Engineering from the National Engineering School of Tunis (ENIT) in 2006. He is presently a member of research group in the Image, Signal and Pattern Recognition TSRIF Laboratory. His current researches include: Image Deconvolution, Image Inpainting, Image retrieval, Anisotropic diffusion, Image segmentation, face recognition, 3D image reconstruction, Image assessment and license plate recognition. Mr Hamid Amiri received the Diploma of Electro-technics, Information Technique in 1978 and the PHD degree in1983 at the TU Braunschweig, Germany. He obtained the Doctorates Sciences in 1993. He is presently a Professor at the National Engineering School of Tunis (ENIT) Tunisia. From 2001 to 2009 he was at the Riyadh College of Telecom and Information. Currently. He is now thea head member of research group in the Image, Signal and Pattern Recognition Laboratory. His research is focused on Image Processing, Speech Processing, Document Processing and Natural language processing.


2520


Secure Medical Image Retrieval Using Dynamic Binary Encoded Watermark A. Umaamaheshvari1, K. Thanushkodi2 Abstract – Technical advancement has increased the availability of medical images, which made the retrieval process for a query hectic. Therefore, an efficient and secure retrieval technique is required along. To address this requirement, this paper has proposed a new technique named Secure and Efficient Image Retrieval (SEIR) technique. Here, to avoid the copyright violation as well as unauthorized user access SEIR uses the watermarking technique to provide authentication to the medical images. Watermarking technique is carried out by embedding the watermark into the query image using the Dynamic Binary Encoding (DBE) technique. Thereby, the proposed technique allows only the authenticated people to access the images presented in a database. In order to make the retrieval process efficient, SEIR uses the kNN classifier, which classifies the images in the database depending on the feature characteristic. This classification process consumes less time to retrieve the pertinent document. Therefore, the SEIR technique is secured as well as it is efficient. The experimental results show how SEIR is efficient as well as secure. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Authentication, Copyright Protection, Image Retrieval, Watermarking

Nomenclature mi b(l,l) R1 R2 R3 ρm ρn ρmn

Threshold value Each pixel value Block Correlation Coefficient between m and n Luminance value Contrast coefficient Contrast of m Contrast of n Contrast between m and n

I.

Introduction

Nowadays, doctors and research scholars highly depend on the medical images for diagnosis and research purpose. Such generated images of a particular organization is collected and stored in the databases and used only when they are required. Usually the organizations need to secure the images present in the database, and the retrieval of the images can be taken only through the legitimate users. To achieve the requirement of the organization various methods are encountered. One such widely used and more effective technique to secure the images of the database is digital watermarking. It is the most significantly used technique in the electronicallydriven fields to work against privacy and malicious manipulation. Watermarking plays an important role in securing multimedia data by providing copyrighting and content verification. Manuscript received and revised September 2013, accepted October 2013

2521

The process of digital watermarking is carried through embedding the original image with the suitable information without affecting the perceptual quality. The effectiveness of the watermarking usually depends on the following properties. 1. Imperceptibility 2. Readily extractable 3. Unambiguous 4. Robustness Among these properties imperceptibility and robustness has major impact and most needed attributes for digital watermarking. It is always a challenging task to achieve a tradeoff between these two properties. Normally, the digital signature can be classified into two categories namely, (1) Invisible digital watermarking and (2) Visible digital watermarking as shown in Fig. 1. In the perceptible watermarking, the watermarked information is visible in the medical image the data can be text, logo of an organization, etc. that is used to detect the owner of the image.

Fig. 1. Watermarking Schemes


A. Umaamaheshvari, K. Thanushkodi

The invisible watermarking technique is further classified and forms two different sub categories namely fragile and robust. The semi-fragile or fragile watermarks are applied to integrity verification and content authentication. The robust watermark algorithms are particularly designed to withstand against the attacks that occur frequently during image processing operations that attempt to destroy or remove the mark. It is used for copyright protection. This paper focuses on the authentication of users to access the images of the database. It is considered in this paper that only the authorized users can access the images of the database. To obtain this, authors have proposed SEIR technique that highly focuses on the authentication and the efficiency of the database for retrieval purpose. The authentication for accessing the details of the database is carried out using the watermarking technique. The input query image is converted into a stego image. A stego image is an image that is embedded in the authentication information. Authentication information can be text, image, or combination of these. This image is further processed by the database for finding the authenticity through extracting the watermarked image and compared with its corresponding original. If they are equal, then it is implicitly understood the user as a legitimate user and allows him/her to access the images of the database otherwise denies the access for the images presented in it. Thus, the authentication for the database is achieved through invisible digital image watermarking. Along with the secure access to the database, the retrieval process is made efficient through classification. The kNN classifier is used to classify the images in the database depending on the state of tumor. The state of tumor can be either (1) Normal, and (2) Abnormal. The search for retrieval process is carried based on the state of queried image. This classification scheme will reduce the time required to process the required set of images. The overall flow of SEIR’s authentication and retrieval process is represented in Fig. 2. Rest of this paper is organized as follows: section 2 provides the literature reviews of related topics. In section 3, authors have elaborately described the proposed SEIR technique for secure access to the database. The efficiency of SEIR is analyzed, and its results are presented in section 4. Finally, section 5 concludes the proposed work along with the ideas to enhance the SEIR techniques.

II.

Related Works

This section presents a survey on various watermarking techniques for multimedia data. The digital watermarking scheme can be implemented in time, spatial, and through the transform domain. Usually, the spatial domain based watermarking is used for still images, whereas the time-domain watermarking is applied to audio signals. Transform depended


watermarking use any one of the watermarking techniques namely (1) Discrete Cosine Transform, (2) Discrete Fourier Transform, and (3) Discrete Wavelet Transform. In the beginning hidden labels were used to detect the ownership and distribution of information in an image or video. Scholars of [1] have studied the decisive factors and difficulties in implementing hidden labels. With this they have proposed a method named randomly sequenced pulse position modulated code (RSPPMC) for the JPEG based models. This is robust against the low pass filtering, lossy data compression and/or color space conversion. Depending on the hidden label watermark techniques were used for copyright protection of multimedia data. Some techniques have also used secret key to select the pixels of an image where the watermarks can be embedded. Such methods were affected by attacks like compression, blurring, etc. To overcome the drawback faced by the secret key method a novel approach was proposed in [2], which uses amplitude model and resistant to aforementioned attacks. Signature bits were multiplied with the modified pixel values presented in the blue channel. Depending on the luminance proportion the multiplication can be either subtractive or additive. The extraction of watermark from the original image can be taken without the original image. Similar to [2], authors of [3] used key based technique to frame a model for watermarking. Here, they presented only the general description of key models. In paper [4] to implement the watermark, some blocks in the given images were selected using Gaussian classifier. Different pixels presented in the selected blocks were modified without violating the watermark conditions of Discrete Cosine Transform (DCT) coefficients. Here, they considered the following two constraints. (1) Embedding the linear constrains among the DCT coefficients, and (2) Circular detection region is defined in the DCT domain. This technique was resistant against the compression. Consequence of the above work, [5] also used the DCT coefficients to insert the sequence of real numbers that were pseudo-randomly selected. The proposed technique has exploits the masking properties in order to obtain the invisibility. Extraction of watermark from an image was also carried without the help of original image. It is also robust against most of the geometric distortions. In addition to that [6], [7] and [8] focused on DCT based digital watermarking. In [7] they segregate the original image into four sub-images using sub-sampling technique. The secrete data should be in binary form that should be embedded into two sub-images’ DCT coefficients. These coefficients were selected using the secret key generator. The extracted watermark can be compared among the DCT coefficients of the sub-images. For comparison, it does not require the original image. In the article [6] authors have studied about the technique Least


2522


Significant Bit (LSB) watermarking based on DCT. They have also focused on the Discrete Wavelet Transform for wavelet based watermarking, and multiple spatial watermarking in color image. However, they implemented and tested a slightly modified version of multiple spatial watermarking in color image technique. This technique was suitable for invisible watermarking that was implemented in spatial domain. Singular value decomposition (SVD) techniques were also used for watermarking, which preserved the one-way and non-symmetric properties that were not attained through DFT and DCT transforms. This technique was used in [9], [10] and [11]. In [9] they have used SVD to withstand highly against all possible attacks without any compromise on transparency. It also used the evolution algorithm (DE) to obtain the above mentioned. In order to embed the watermark data, the host image’s singular values were changed through multiple scaling factors. To obtain ultimate robustness and efficient transparency, the changes were optimized by DE. For satisfying the requirements of robustness and imperceptibility authors of [11] incorporated both the DWT and SVD technique and proposed a hybrid imagewatermarking mechanism. Through this technique the watermark was embedded to the singular value of original image’s DWT sub- bands instead of embedding it to the wavelet coefficients. This technique has the capability to withstand against the classical attacks. Similarly, Fractional Fourier Transform (FRFT) was also used by the authors of [12] and [13] in order to obtain the blind digital watermarking technique. In [12] they have analyzed the Hermite matrix for direct Discrete Fourier Transform computation and and Chirp signals hidden in the host image’s low frequency band in wavelet domain. This technique outperforms the existing spatial algorithm. Authors of [13] discussed the energy distribution of two-dimensional signal at different FRFT domain. Multiple Chirps were used to directly embed the watermark in the spatial domain. This technique retains the quality of the image. The DWT technique was briefly discussed in [14] along with the comparison of two different watermarking schemes that were based on DWT. In [15] blind image watermarking is obtained through cryptography based technique, which embeds enormous number of watermark bits in a gray scale image without compromising the imperceptibility and security of the watermark. A new technique was framed to cast the watermark on digital images in [16], which was done through randomly selecting image pixels and adding luminous values that were small into the image pixels. In addition to that statistical based detection mechanism was also proposed. Another statistical based watermark detection technique was proposed in the postulates of [17] for validation and detection of the invisible watermark. This detection technique can be made robust against when it was combined with the visual models for encoding the


watermark. Authors of [18] have proposed a technique that depended on a fractal coding and decoding method. The fractal coding mechanism is used to determine the spatial redundancy that present inside an image through generating relationship among different parts. This is used as a means of embedding the watermark. This technique was robust against the low pass filtering and JPEG conversion attacks. In [19], chaos and Fresnel transform based watermarking algorithm were proposed. The original image is converted (i.e. transformed) to Fresnel diffraction and watermark were embedded into the amplitude spectrum that achieved after scrambling using the chaotic sequence. The major challenge faced by the watermarking technique was that it should not reduce the quality of the original image after embedding process. A pioneering method was proposed in [20] with the aim of embedding watermark into an image without affecting the quality of it. This method consists of two watermarking mechanisms that were depended on the visual models. The visual models were used to identify the upper bounds of an image on watermark insertion, which yields maximum strength on the transparency. They have also proposed watermarking schemes in two different frameworks: (1) Block based discrete cosine transform. (2) Multi-resolution wavelet transforms. This technique was robust against classical attack as well as achieves good transparency. DCT technique for digital watermarking of textured images based on the concept of the gray level cooccurrence matrix (GLCM) was given in [21]. They present the behavior of the method in terms of correlation as a function of the offset for textured images. Moreover, compared the approach with other spatial and temporal domain watermarking techniques and reveal the potential for robust watermarking of textured images. Various technique discussed so far were focused mainly on the robustness of watermarking. Conversely, fragile watermark was used to detect slight modifications that were present in an image. In [22], two different fragile watermarks were compared. The first method used hash function for both the original and modified images to attain digest of the image. If digest were different then it is concluded that the image was forged. The hashing technique was used to spatially localize the changes. Second method used Variable-Watermark Two-dimensional algorithm (VW2D) also referred as semi-fragile watermark. Here, compassion to changes was user-defined. It accepts the images that have either with little modification or with no change.

III. Proposed Method This section deals with the secure image retrieval of images from a database through watermarking technique. The proposed work is divided into three different phases namely (1) Stego image generation, (2) Authentication, and (3) Retrieval. Detailed explanation is presented in the corresponding subsections.


2523


Fig. 2 denotes the overall flow of the proposed mechanism for secure image retrieval. This paper uses the invisible digital watermarking; whose process is as shown in the Fig. 3.

Fig. 4. Construction of Stego image

In the block truncation coding, most of the image data compression techniques achieve the high data compression ratio. It is a one-bit adaptive momentpreserving quantifier that preserves certain statistical moments of small blocks of the input image in the quantized output. One of the main goals for image data compression is to reduce redundancy in the image block much possible. That is, it is very significant to characterize an image with as few bits as possible while maintaining good image quality. BTC is one of the simple and easy to implement image compression algorithms. For the conversion, the given secret image is segmented into n x n non overlapping blocks. Two quantizes namely mean and standard deviation values are determined using the equation (1) and (2) respectively for each pixel present in the blocks: =

1

=

1

(1)

(

−

)

(2)

here, x represents the total number of pixels in the given image and denotes the value of the ith pixel presented in an image block. The value obtained for is set as the threshold value, which is compared with each pixel value . Based on the comparison the binary message block is generated through the Eq. (3):

Fig. 2. Overall flow of SEIR

Fig. 3. Invisible watermark’s mainframe

=

Generation of Stego image Either to access or retrieve the images in the database, user gives an image named query image as input. The images are related to the query image is retrieved from the database if the user has authentication for retrieval. To verify the legitimacy of a user, database uses the invisible digital watermarking technique. This technique is implemented through the embedding the secret information into the cover image (this article considers the query image as the cover image). Output of embedding phase is the stego image. Embedding Process The detailed procedure for embedding is depicted in the Fig. 4 that is used for generation of stego image. Initially, the secret image is converted to binary message through the block adaptive technique, which is formulated from block truncation coding mechanism.


1 0

≥

4

4

≤2·

(5)

4

ℎ

Similarly, the Eq. (6) is used when the binary bit is zero: (, )=

=

⎧ ( , )− ⎪

+5·

⎨ ( , )− ⎪ ⎩ (, )

+

4

≥3·

4 /2 ( , )= 1 (7) 0 ℎ


Fig. 7. Binary image of original and extracted watermark image

Then both images are compared and analyzed using the following parameters: 1. Universal Image Quality Index (UIQI) Consider the original and watermark image as = { | = 1,2, … , }, = { | = 1,2, … , }. With this, the UIQI can be determined from the Eq. (8):


2525


=

4· · · + ) · (( ) + ( ) )

(

(8)

where, , , , , can be manipulated from the Eq. (9), (10), (11), (12) and (13): 1

=

=

=

(9)

1

(10)

=

1 −1

(

−

=

1 −1

(

− )

1 −1

(

−

)(

)

(11)

(12)

− )

(13)

The value of R is dynamic, and it can take the value in the range [0, 1]. R=1 is the best value and it can be obtained only when = , i=1, 2… X. The distortion of this quality index is the combination of three different factors namely (1) luminance distortion, (2) loss of correlation, and (3) contrast distortion. Therefore, the definition of R can be redefined, and it is represented in the Eq. (14): (14) = · · =

=

2· · (( ) + ( ) ) =

=

·

·

(15)

·

2·

· +

2· · 2· · · (( ) + ( ) ) +

(16) (17) (18)

the is used to measure the correlation coefficients between the m and n. The second component is used to determine the closeness of luminance value between the m and n. The contrast between the original and watermark image are tested using . Therefore, the UIQI value between the original and the extracted watermark image should be near to one. If the value is too low, then it is regarded that the user is not authorized person to access the database.

=

(2 · ( +

· + 1)(2 · + 2) + 2)(( ) + ( ) + 1)

(19)

here, , , , , can be determined same as in UIQI and the z1 and z2 denotes the constants. If the UIQI and SSIM values are greater, then the extracted watermark image and its corresponding original image have greater similarity. 3. BER The difference among the compared images is manipulated using the bit-error rate. If this value is lower than the predefined threshold, then the user is allowed to access the database otherwise, the corresponding user is not allowed to access the image. Based on these three quality parameters the user is classified as either authenticated or not. This procedure acts as a detection system at the database to allow only the legitimate users to access the database. Once if the user is identified as authenticated, then he/she can retrieve the images present in the database. Retrieval of relevant images Once the user is identified as a legitimate user by the database, then they are allowed to access the images present in it. To make the access and retrieval process efficient, authors classify the images presented in it using kNN classification technique. This classification technique makes use of GLCM features to identify similar images under a class. Therefore, for each image in the database, the GLCM features are extracted and classified under specified classes. Classification GLCM feature extraction The intensity variations at a pixel of an image can be measured using the GLCM feature values. Such feature can be extracted from the image using the following two steps. (1) The distance between the pairwise spatial cooccurrences of pixels that are separated by a particular angle are measured and tabulated using the matrix named co-occurrence matrix. (2) Set of scalar quantities are manipulated, which are used to characterize the various aspects of the underlying texture. The table prepared at step one is used to manipulate the various combinations of gray-level that co-occur in the given image. The cooccurrence matrix is usually of size N x N, where the N denotes the number of different gray scale level of an image. The relative frequency of an image can be represented using an element ( , , , ), where x is the gray level at the pixel C at a particular location, and j is the gray level presented at a pixel that is located at a distance d from C at an angle . Spatial domain quantitative description can be derived from GLCM. kNN classifier The K-Nearest Neighbors algorithm is a nonparametric method in that no parameters are

2. Structural Similarity Index Metric (SSIM) SSIM value can be computed from the Eq. (19):



2526


predictable. Alternatively, the proximity of neighboring input interpretation for the training data set, and interrelated output values are used to predict the output values of cases in the validation data set. Moreover, KNN process is used for classification purpose. An image is classified and assigned to a specified class through a majority vote of its k nearest neighbours. Here, the classier takes the pattern that is very close to each other in the space for feature, which belongs to the class having the similar pattern. The neigbours are the images that are correctly classified into the well known class. Authors used Euclidean distance for distance measure. By this way, the images of a database are classified under different classes accurately. KNN classifier is better while comparing with other classifier. Retrieval Database processes the input cover image and finds GLCM features of it. Having the patterns derived from the patterns predicts the class to which it can belong. On predicting the class, similar images are retrieved from the database and presented to the user. Thus, the images are retrieved efficiently and securely from the database.

IV.


The proposed SEIR technique is experimented for its efficiency in retrieval. For analysis purpose authors have taken a database containing 150 brain images. These images are collected from Internet Brain Segmentation Repository (IBSR) dataset. The database consists of two different set of images one set with brain tumor and another without brain tumor. Authors assumed that only the doctors and research scholars of the hospital have authentication to access the database image. For accessing the database, authorization details are provided through invisible digital watermarking images. The photos of doctors and scholars act as the watermarking images, which are embedded into the query image. Query images are the images that act as the keyword for retrieving the similar images from the database. For experimental purpose a brain image is taken as query image and embedded with the photo of the user to generate the stego image. The query image, watermark image that is taken for experimental purpose and its stego image are as shown in the Fig. 8. This work uses 512 × 512 query image and the 64 × 64 watermark image. Once the query image is provided to database, the SEIR technique stimulates the extraction process to extract the watermark image from the stego image in the form of binary image. The extracted binary watermark image, i.e. the photo of the user is compared with the binary image of corresponding original image. Table I, represent the distance between the query image and the retrieved image (only for first 10 images). The efficiency of GLCM feature based kNN classification is determined and that is represented in the


confusion matrix. Table II presents the comparison results. For the BER the threshold value is taken as 10 for this experiment.

Figs. 8. Query image, Watermark image and Stego image

Image ID 4 3 2 8 9 101 22 56 57 1

TABLE I DISTANCE MEASURE Distance Measure between Query & Retrieved Image 6.936507968554296e-001 1.232947498754756e+000 1.884243209992982e+000 2.754879139944190e+000 3.255115342542945e+000 5.739929864217325e+000 6.720001565603548e+000 6.841970737718540e+000 9.676775789159622e+000 1.067667730101163e+000

TABLE II QUANTIZATION ATTRIBUTES AND THEIR CORRESPONDING VALUES Attribute Value UIQI 0.6314 SSIM 0.987 BER 0.457 MSE 0.0518 PSNR 60.9859

The quantized attribute result shows that the UIQI and SSIM values are very small, which implicitly says that the binary image of the watermark that is extracted from the stego image and its corresponding original images are similar. In addition to that BER value is too small than the predefined threshold that specifies that difference between the two compared image. Therefore, these values represent that the user is an authorized and he/she is allowed to access the database. The images of the dataset are classified in prior to access by an authenticated user. The classification is carried using the kNN classifier. The classifier classifies the images and groups the similar images under different classes. Here, there are only two different class are present namely normal and abnormal. Depending on the GLCM features of the images in the dataset, they are classified into either normal or abnormal class. This classification process helps for efficient retrieval process.


2527


Since the query image search only the subgroup of image not all the images. During retrieval the query image’s (after extraction of watermark image) GLCM features are estimated to determine the class of the image. If the query image is normal then the images under the normal is either accessed or retrieved from database. Table III, denotes the confusion matrix for the kNN classifier. It is clear that only 10 among 150 images are misclassified, which represents that the kNN classifier used in this paper has the classification accuracy as 93.34%. TABLE III CONFUSION MATRIX FOR CLASSIFICATION Abnormal Normal Abnormal 7 10 Normal 0 133

Another way besides the confusion matrix to compute the efficiency of the classifier is the ROC graph. The ROC graph usually has false positive (FPR) on x-axis and true positive (TPR) on y-axis. If the points in the graph denotes (0, 1) then it implicitly represents that images are classified perfectly. Zero in x-axis and one in y-axis express that the false positive rate is 0 and the true positive rate is 1. Likewise, if (1, 0), (0, 0) and (1, 1) represent the incorrect classification, all classifications are negative and all the classifications are positive respectively. Fig. 9 represents the first 10 images that are relevant to the query image (shown in Figure 8, which is normal).

Fig. 9 shows only the sample of images that are retrieved from the database (not all images that are normal). The kNN classifier uses Euclidean distance for finding the similar images. Figs. 10 and 11 denote the ROC graph for normal and abnormal class classification. It is clear from the Fig. 10 that the images for normal classes are predicted accurately. No images are misclassified. TABLE IV ROC VALUES FOR NORMAL CLASS False Positive Rate True Positive Rate 0 0.01 0 0.05 0 0.1 0 0.15 0 0.18 0 0.2 0.09 0.2 0.09 0.22 0.09 0.25 0.09 0.28 0.1 0.29 0.18 0.32 0.2 0.35 0.23 0.6 0.3 0.65 0.32 0.67 0.4 0.68 0.44 0.69 0.5 0.75 0.53 0.9 0.6 0.92 0.7 1 0.8 1 0.9 1 1 1 TABLE V ROC VALUES FOR ABNORMAL CLASS False Positive Rate True Positive Rate 0 0.05 0 0.12 0 0.19 0 0.25 0 0.3 0 0.34 0 0.4 0.1 0.4 0.2 0.4 0.3 0.4 0.4 0.4 0.4 0.5 0.46 0.55 0.6 0.59 0.7 0.65 0.8 0.8 0.9 0.95 0.92 1 0.94 1 0.95 1 0.98 1 1 1

Fig. 9. Retrieved image from dataset

Similar to the ROC graph, precision and recall graphs are used to estimate the efficiency of the retrieval process. Precision number of images that are retrieved is relevant to the query image, while the recall denotes the number of related image that are retrieved.

Fig. 10. ROC graph for normal class



2528


graph for the normal class and abnormal class image retrieval’s efficiency. Figs. 12 and 13 express that the images are retrieved efficiently. Tables VI and VII, characterize the precision and recall for normal and abnormal class. A message of size 64×64 (4096 bits) the time taken is 1.7677s for embedding and 0.5498s for extraction.

Fig. 11. ROC graph for abnormal class

Precision is a measure that expresses the fraction of returned (retrieved) images that are relevant, which is purely based on the measurement and understanding of relevance. Precision can be calculated using the Eq. (20): ∩

=

Fig. 12. Retrieval efficiency for normal class

(20)

In the above equation , interpreted as relevant images and retrieved images respectively. It can also be given as: .

Precision =

.

The recall value can be estimated through the Eq. (21): =

∩

(21) Fig. 13. Retrieval efficiency for abnormal class

It is also given as: . .

TABLE VII PRECISION & RECALL FOR ABNORMAL CLASS Precision Recall Precision Recall 1 0.08 0.7 0.4 1 0.09 0.6 0.4 1 0.1 0.5 0.4 1 0.15 0.4 0.4 1 0.2 0.3 0.4 1 0.25 0.2 0.4 1 0.3 0.1 0.4 1 0.35 0.12 0.5 1 0.4 0.14 0.7 0.9 0.4 0.16 0.9 0.8 0.4 0.18 1

ℎ

TABLE VI PRECISION & RECALL FOR NORMAL CLASS Precision Recall Precision Recall 1 0.08 0.96 0.38 1 0.09 0.92 0.4 1 0.1 0.92 0.42 1 0.15 0.92 0.48 1 0.2 0.92 0.5 0.98 0.25 0.92 0.53 0.98 0.26 0.92 0.6 0.98 0.27 0.9 0.7 0.98 0.3 0.9 0.8 0.96 0.32 0.9 0.89 0.96 0.34 0.88 1 0.96 0.36 0.87 1

Figs. 12 and 13 represent the precision and recall


Initially 1000 Images are taken for study and 150 Images are given in the paper. In the previous work [23] using feature based method of watermarking PSNR arrived is 49.1447 and we arrived 60.9859 using dynamic binary encoding method.


2529


[4]

[5]

[6] [7]

[8] [9]

[10] Fig. 14. Simulation results

V.

[11]

Conclusion

[12]

This paper deals with the issues of providing authentication to the users of a database containing images. To provide authentication authors have proposed an SEIR technique based on invisible digital watermarking. SEIR motivates the users to provide the query information along with their identity. The watermark image is embedded into the query image to build the stego image. The embedding process is carried out using DBE technique. The generated stego image is given as input to the database. On receiving the stego image, database extracts the secret information (i.e. Watermark image) and compares the binary images of the watermark image and its corresponding original image. If both the images are equal, then the user is allowed to access or retrieve the image. Otherwise, their access is denied. To make the retrieval process efficient the images in the database are classified under different class. The images that are under a class have similar patterns. The classification process is carried out using the kNN classifier. The query image is also processed to find the features, based on which the class for the input image is identified. Once the image’s class is determined, then all the images under the specified class are retrieved. The experimental analysis shows that the proposed SEIR retrieves the images in the database efficiently. The retrieval efficiency is measured using precision and recall.

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

References [1]

[2]

[3]

[23]

Koch E, Zhao J,(1999), Towards robust and hidden image copyright labeling, In Workshop on Nonlinear Signal and Image Processing, pp. 452-455 Kutter M, Jordan F, Bossen F,(1997), Digital signature of color images using amplitude modulation, In the Proceedings of SPIE storage and retrieval for image and video databases , Vol. 3022, pp. 518-526 Nyeem, H, Boles W, Boyd C,(2011), Developing a digital image watermarking model, In the International Conference on Digital


Image Computing Techniques and Applications (DICTA), pp. 468-473 Bors AG, Pitas I (1996), Image watermarking using DCT domain constraints, In the Proceedings of International Conference on Image Processing, Vol.3, pp.231-234 Piva A, Barni M, Bartolini F, Cappellini V (1997), DCT-based watermark recovering without resorting to the uncorrupted original image. In the Proceedings of International Conference on Image Processing, Vol. 1, pp. 520-523 Chaudhry SH,(2009), Digital Image Watermarking. Lu W, Lu H, Chung FL, (2006), Robust digital image watermarking based on sub sampling, Applied Mathematics and Computation, Vol. 181, issue. 2, pp. 886-893 Saxena V, Gupta JP, (2008), Digital Image Watermarking Aslantas V (2009), An optimal robust digital image watermarking based on SVD using differential evolution algorithm, Optics Communications, vol. 282, issue. 5, 769-777 Chang CC, Tsai P, Lin CC, (2005) SVD-based digital image watermarking scheme, Pattern Recognition Letters, Vol. 26, issue. 10, 1577-1586 Lai CC, Tsai CC, (2010), Digital image watermarking using discrete wavelet transform and singular value decomposition, In the IEEE Transactions on Instrumentation and Measurement, 59(11), pp. 3060-3063 Dian HW, Dong L, Jun Y, Fen-xiong C,(2007) ,An Improved Chirp Typed Blind Watermarking Algorithm Based on Wavelet and Fractional Fourier Transform, In the proceedings of Fourth International Conference on Image and Graphics, pp. 291-296 Feng Z, Xiaomin M, and Shouyi Y,(2005),Multiple-chirp typed blind watermarking algorithm based on fractional Fourier transform, In the Proceedings of International Symposium on Intelligent Signal Processing and Communication Systems, pp. 141-144 Hameed K, Mumtaz A, Gilani SAM,(2006), Digital image watermarking in the wavelet transform domain, World Academy Of Science, Engineering & Technology Gupta P, (2012), Cryptography based digital image watermarking algorithm to increase security of watermark data, In International Journal of Scientific & Engineering Research, Vol. 3, pp. 1-4 Pitas I,(1996),A method for signature casting on digital images, In the Proceedings on International Conference on Image Processing, Vol. 3, pp. 215-218 Zeng W, Liu B,(1999), A statistical watermark detection technique without using original images for resolving rightful ownerships of digital images, IEEE Transactions on Image Processing , Vol. 8, issue. 11, pp. 1534-1548. Puate J, Jordan F,(1996) , Using fractal compression scheme to embed a digital signature into an image, In Proceedings of SPIE Photonics East, Vol. 96, pp. 108-118 Wang Z, Lv S, Feng J, Sheng Y, (2012), A Digital Image Watermarking Algorithm Based on Chaos and Fresnel Transform, In the proceedings of4th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Vol. 2, pp. 144-148 Podilchuk CI, Zeng W,(1998), Image-adaptive watermarking using visual models, IEEE Journal on Selected Areas in Communications, Vol. 16, issue. 4, pp. 525-539 Kamble S, Agarwal S.Srivatsava V.K, Maheshkar V,DCT based texture watermarking using GLCM, IEEE 2nd International Conference on Advanced Computing,185-189,2010 Wolfgang RB, Delp EJ, (1999), Fragile watermarking using the VW2D watermark, In the proceedings of Electronic Imaging'99, International Society for Optics and Photonics, pp. 204-213 Umaamaheshvari A and Thanushkodi .K ,A robust digital watermarking technique based on feature and transform method,in Scientific research and Essays, Vol 8(32) .pp.1584-1593, August 2013.


Assistant Professor/Dept. of ECE/SSEC/Coimbatore/INDIA.

2

Directorec / Akshaya College of Engg and tech/ Coimbatore /INDIA


2530


A. Umaamaheshvari, born in Coimbatore District, Tamilnadu state,India in 1974, received the BE in Electrical and Electronics Engineering from Madras University, Chennai. ME in Applied Electronics in Anna University, Chennai and Currently doing Ph.D in Digital Image Processing from Anna University,Coimbatore. Her research interests are in the area of Computer Networking,Computer security and Image processing. Published few papers in International conferences and Journals. Dr. K. Thanushkodi born in Theni District, Tamilnadu State, India in 1948. Received the BE in Electrical and Electronics Engineering from Madras University, Chennai. MSc (Engg) from Madras University, Chennai and PhD in Electrical and Electronics Engineering from Bharathiyar University, Coimbatore in 1972, 1976 and 1991 respectively. His research interests lie in the area of Computer Modeling and Simulation, Computer Networking and Power System. He has published many technical papers in National and International Journals.



2531


Microarray Gene Expression and Multiclass Cancer Classification Using Improved PSO Based Evolutionary Fuzzy ELM Classifier with ICGA Gene Selection T. Karthikeyan, R. Balakrishnan Abstract – Cancer has become one of the dreadful and most widely spreading diseases in recent years. Cancer diagnosis has become an active research area in the field of medical image processing. DNA microarrays are considered as an effectual tool used in molecular biology and cancer diagnosis. As the superiority of this method has been recognized, a variety of open queries occur about appropriate test of microarray data. As the numbers of cancer victims are increasing tremendously, the requirement of an efficient and accurate cancer classification system has become essential. For the above impenetrability and to obtain better consequences of the system with accuracy a combination of Integer-Coded Genetic Algorithm (ICGA) and Improved Particle Swarm Optimization (IPSO), coupled with an evolutionary fuzzy extreme learning machine (EFELM), is used for gene selection and cancer classification. ICGA is used with IPSO based EFELM classifier to chose an optimal set of genes which results in an efficient hybrid algorithm that can handle sparse data and sample imbalance. The performance of the proposed approach is evaluated and the results are compared with existing methods. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Biology and Genetics, Classifier Design and Evaluation, Feature Evaluation and Selection

Nomenclature x

Input objects Membership function Independent variable Velocity of particle Fitness value

I.

Introduction

Cancer is a collection of diseases recognized by uncontrolled separation and enlarge of cells [1]. Localized tumors are the tumors can be separated by surgery or by passing high irradiation. As cancer growth, on the other hand, it metastasizes overrun the tissues also in the surrounding, ingoing the blood stream, scattering and set up colonies in isolated parts of the body. Hence one-third of patients with metastasized cancer continue to live more than five years. DNA microarray is a group of microscopic DNA spots associated to a solid outside. DNA microarrays are utilized to calculate the expression levels of huge numbers of genes concomitantly or to genotype several regions of a genome [2]. The growth of DNA microarray technology has been formed large quantity of gene data and has made it simple to examine the expression patterns of more and more number of genes concurrently underneath particular investigational environments and conditions [3]. Manuscript received and revised September 2013, accepted October 2013

2532

In addition, be able to analyze the gene information very rapidly and accurately by organizing them at one time [4]. Microarray technology has been functional to the region of accurate prediction and identification of cancer and projected that it would help them. Accurate classification of cancer is especially a significant problem for treatment of cancer. Researchers were investigated about various problems of cancer classification by means of gene expression profile data and make an effort to propose the best possible classification method to exercises these problems [5]. By this investigation the researchers said that, some of the techniques may gives better results when compared with the other techniques. Even though, the best techniques do not have any complete work to compare the feasible feature selection methods and classifiers. Necessitate a careful effort to provide the assessment of the possible methods to resolve the problems of analyzing gene expression data. The gene expression data frequently consist of enormous number of genes, and the requirement of tools analyzing the genes to obtain useful information is essential. There is study that analytically analyzes the results of test by means of a variety of feature selection techniques and classifiers for choosing informative genes to facilitate classification of cancer and classifying cancer [6].


T. Karthikeyan, R. Balakrishnan

Although, the results were not established sufficiently since only one benchmark dataset was used. In this paper, a better gene selection and cancer classification technique is proposed for microarray data that is described by sample sparseness and imbalance. The microarray data includes several classes of cancers that are classified continuously as different to the existing traditional classification methods, where one class is exposed next to all the other classes. In this paper, an Integer-Coded Genetic Algorithm (ICGA) [7] is used for strong and healthy gene selection. Next, propose an Improved Particle Swarm Optimization (IPSO) [8] driven Evolutionary Fuzzy Extreme Learning Machine (E-FELM) [15], for managing the sparse/imbalanced data classification problem.

II.

Related Works

The feature selection and classification approach is a significant problem in microarray gene expression analysis due to huge number of variables and small number of experiential conditions. For diagnosing the disease, classifier performance has through impact on final results. The author introduced a new-fangled method of gene selection and classification through nonlinear kernel support vector machine (SVM) based on recursive performance removal (RFE) is proposed. It is recognized experimentally that this method has better wide-ranging performance than other linear classification approaches like Linear Kernel SVM and Fisher Linear Discriminant Analysis (FLDA), perform good than when compared to various non-linear classification approaches like Least Square SVM (LS-SVM) using non-linear kernel. In the testing, as well test set, leave-one-out algorithm is also used to test the classifiers overview performance [9]. Mohr et al [10] described an efficient approach for gene selection and classification of microarray data depending on the Potential Support Vector Machine (PSVM) for feature selection and a nu-SVM for classification. P-SVM approach extends the decision function through sparse group of support features. A fully automated technique for gene selection based on hyper-parameter optimization and microarray classification has been presented [10]. Banka et al [11] presented an evolutionary rough feature selection algorithm is presented for classifying gene expression patterns. As the data classically comprises of a huge number of redundant features, an initial redundancy reduction of the features is done to facilitate faster convergence. Rough set theory is carried out to produce the distinction table that facilitates PSO [26] to identify reducts, which denote the minimal sets of non-redundant attributes capable of discerning between all objects. The significance of the approach is illustrated on datasets such as Colon, Lymphoma and Leukemia using MOGA [11]. Dash and Patra [12] presented Supervised CFS-Quick Reduct algorithm by integrating Correlation based Feature Selection (CFS) and Rough


Sets attribute minimization for gene selection from gene expression data. Correlation based Feature Selection is employed as a filter to remove the redundant features, then the minimal reduct of the filtered feature set is minimized by rough sets. Three different classification approaches are used to assess the performance of this approach. This approach improves the significance and minimizes the complexity of the classical algorithm. The results are carried out on two public multi-class gene expression datasets and the it is shown that this approach is very efficient in selecting high discriminative genes for classification task [12]. Recently, Saras Saraswathi et al [7] proposed a novel combination of Integer-Coded Genetic Algorithm (ICGA) and Particle Swarm Optimization (PSO), coupled with the neural-network-based Extreme Learning Machine (ELM), is used for gene selection and cancer classification. ICGA is used with PSOELM to choose an optimal set of genes, which is then utilized to generate a classifier that handles sparse data and sample imbalance. This paper aims to provide extension of Saras Saraswathi work [7]. This paper tends to overcome the drawbacks of the algorithm used in [7] and develop a more efficient approach for cancer classification.

III. Proposed Methodology for Accurate Classification and Gene Selection III.1. Improved PSO based Evolutionary Fuzzy ELM Classifier for Accurate Classification with ICGA based Gene Selection A novel Improved PSO based Evolutionary Fuzzy ELM (IPSO based EFELM) and ICGA based gene selection approach is proposed, which can minimize the size through gene (feature) selection and use the chosen relevant genes for accurate classification of a sparse and imbalanced data set. The fundamental ELM classifier can distinguish the cancer classes among the data denoting the chosen features quickly, but the performance of ELM classifier is based on the nature of the input data distribution. For sparse and highly imbalanced data set, the random input weight selection in ELM classifier affects the classification performance to a great extent [13], [14]. IPSO based EFELM classifier is proposed in this paper, where the IPSO algorithm is utilized to identify the optimal input weights such that E-FELM classifier can distinguish the cancer classes significantly, i.e., the performance of the E-FELM classifier is enhanced. The proposed methodology of the block diagram is shown in the figure 1 as follows The performance of E-FELM [15] classifier is mainly based on the selected input genes. In order to minimize the computational aspect, an Integer-Coded Genetic Algorithm is used to choose and minimizes the number of genes, which can discriminate the cancer classes efficiently. Based on these chosen genes, E-FELM algorithm generates significant classifier.


2533


ICGA Gene Selection

Evolutionary Fuzzy ELM Parameter Selection

Training Data

Gene Selection ICGA

Evolutionary Fuzzy ELM Classifier

Evolutionary Fuzzy ELM Parameter IPSO

Fig. 1. Proposed Block Diagram

The proposed research work for classification and optimal gene selection procedure is shown in Fig. 1. Initially, ICGA selects n independent genes from the available gene set. For the selected genes, IPSO will identify optimal parameters (number of hidden nodes and input weights) such that the performance of the E-FELM multiclass classifier is improved. The best validation performance ( +) will be utilized as fitness for the ICGA evolution. The validation performance of E-FELM classifier ( ) is used in IPSO for selection of E-FELM parameters. III.2. Evolutionary Fuzzy Extreme Learning Machine (E-FELM) Initially, the basic association between the single layer hidden feed-forward neural networks and the zero-order TSK fuzzy inference systems is investigated in [15]. Based on such an investigation, an Evolutionary Fuzzy Extreme Learning Machine (E-FELM) is then proposed in this paper. Initial work has been observed that applies Modified Gaussian membership functions to fuzzifying the input objects = ( , , … . . , )[26]. By taking the reciprocals of the widths of the membership functions, rather than themselves, as the independent variables: =( =

, 1

,

1

,….,

1

(1)

ℎ

( ) =

=

=

(−(

=

(

(−(

)

− −

)) = )

(2)

))

Based on the structure of ELM, for a dataset comprising of N distinct objects: ( , ), ( ∈ , ∈ )the linear system generated by a (non-linear) -rule fuzzy system is: (3) = The elements in the hidden-layer jointly produce the matrix = ℎ ( = 1, … , , = 1, … , ) with ℎ = (∑ (−( − ) )). Similarly, the following holds: = [ ,… ] × and = [ , … ] × . From this, applying the E-ELM solves such a linear system results in E-FELM. In this approach, each individual in the population is composed of a set of parameters: = { ,…, , ,…, } where: =

)=

,….,

=

,

,….,

,

=

,

,….,

All and are randomly initialised within the range of [-1, 1]. Thus, the root mean squared error (RMSE) is:

the fuzzified inputs become: ( )= (−( − ) = 1,2, … , ; = 1,2, … . ,

=

,

The limit that the value of cannot be zero is avoided during the ELM learning process. The outputs of such a zero-order TSK fuzzy inference system become:


∑

∑

ℎ −

∥

×

Once the fitness values of all individuals in the population are calculated, based on mutation, crossover and selection are applied to the zero-order TSK fuzzy inference system.


2534


This procedure repeats until a maximum number of learning iterations is reached. III.3. Improved PSO Particle swarm optimization algorithm, is customized for optimizing difficult numerical functions based on metaphor of human social interaction, is able of mimicking the ability of human societies to process knowledge [16]. The key constituents are artificial life and evolutionary computation. The main solutions are flow through hyperspace and are accelerated towards better on more optimum solutions. Its model can be employed in simple form of computer codes and is computationally inexpensive in terms of both memory requirements and speed. In evolutionary computation formulations, fitness is working and candidate solutions to the issues are termed as particles or individuals, each of which adjust its flying due to flying experiences of both and its companion. Vectors are considered as representation of particles as more optimization issues are appropriate for such variable presentations. The higher dimensional space computations of PSO are performed over a series of time steps. The population is based on the quality factors of the previous best individual values and group values. The concept of stability is adhered as the populations transform its state if and only if the best group value changes. As it is reported in [17], [27], this optimization approach can be utilized to solve many optimization problems and is works better than GA. It has also been observed to be robust in solving problem featuring highdimensionality problems. PSO improves the speed of convergence and find the global optimum value of fitness function. PSO initiates with a population of random solutions ‘‘particles’’ in a D-dimension space. The ith particle is denoted by = ( , , . . . , ). Each particle keeps track of its coordinates in hyperspace, which are connected with the fittest solution. The value of the fitness for particle i (pbest) is also stored as = ( , ,..., ). The global description of the PSO analysis the overall best value (gbest), and its location, attained is the best in the entire population. At each step, PSO comprises of changing velocity of each particle toward its pbest and gbest according to Eq. (4). The velocity of particle i is represented as = ( ,v ... ). Acceleration is weighted through a random factor, with separate random numbers being generated for acceleration toward pbest and gbest. The position of the ith particle is then updated based on Eq. (5) [17]: ( + 1) =

× +

(

( + 1) =

( )+ − ( )+

−

( ) +

( )) ( + 1)

(4) (5)


where, and denote pbest and gbest respectively. A number of modifications have been proposed to improve the performance of PSO algorithm in terms of its speed and convergence toward the global minimum. A local-oriented paradigm (lbest) with different neighborhoods is introduced. gbest version performs best in terms of median number of iterations to converge. But, Pbest version with neighborhoods of two is very resistant to local minima. It is observed from the thorough investigation of PSO that was not considered at an early stage of PSO algorithm. However, affects the iteration number to identify an optimal solution. If is low, the convergence will be fast, but the solution will fall into the local minimum. If the value will increase, the iteration number will also increase and so the convergence will be slow. Classically, in fundamental PSO, value of inertia weight is adjusted in training process. It is indicated that PSO algorithm is further improved using a time decreasing inertia weight, which leads to a decrease in the number of iterations [18]. In Eq. (4), term of − ( ) represents the individual movement and term of ( ) denotes the social behavior in − identifying the global best solution. In this paper, in order to attain exact solution and fast convergence of algorithm, parameters are used in IPSO algorithm has been initialized based on Table I. TABLE I RATE OF PARAMETERS FOR IPSO ALGORITHM Parameter Rate Problem Dimension 11 Number of Particles 100 Number of Iteration 100 0.1 Mutation Probability Inertia Weight Factor = 0.4, = 0.9 Selected Randomly in (0,1) , 1 1.5 C 0.9

According to equation (4), the velocity update of the particle consists of three segments. The first term is its own current velocity of particles. The second term is cognitive segment which denote the particles own experiences and the last term is social part which denotes the social interaction between the particles [19]. With respect to Eq. (4), it is realized that best position of particles take places proportional to . It can be seen that: when a particle's current position coincides with the global best position (gbesti), the particle will only leave this point if the inertia weight and its current velocity are different from zero. If the particles' current velocities in swarm are close to zero, then these particles will not move once reach the global best particle, thus all the particles will converge to the best position (gbest) identified so far by the swarm [20]. At this point, if this positions fitness is not the expected global optimal, then it results in the premature convergence phenomenon. International Review on Computers and Software, Vol. 8, N. 10

2535

In order to overcome this limitation, an Improved Particle Swarm Optimization (IPSO) is proposed by introducing the mutation operator is often used in genetic algorithm [21]. This process can make certain particles to jump out local optima and search in other area of the solution space. In this approach, the mutation probability (PM) is dynamically adjusted based on the diversity in the swarm. The aim with mutation probability is to avoid the premature converge of PSO to local minima. In this study, PM is taken as 0.1.

Average Accuracy (a)


95 IPSOEF… 90

85 -10

10

30

50

70

The PSO searches for the best H, V, and b values that analytically computed weight in the ELM classifier which results in better generalization performance. The cross validation performance of best H, V, and b is . The parameters of the PSO algorithm are shown in Table I. The main factor in PSO-ELM classifier is to determine the amount of imbalanced data set that the classifier can handle without losing performance considerably [22]. III.5. Analysis on Imbalance Data The sample imbalance handling capacity of IPSO-EFELM classifier is based on the technique in [22]. In [22], the number of samples in one of the class was reduced and performance of the classifier was examined for different imbalance criteria. A similar examination was conducted for the proposed IPSO-E-FELM classifier and the average( ), overall ( )and individual ( )classification efficiencies obtained are shown in Fig. 2. It is observed that the average and overall classification efficiency of IPSO-EFELM classifier is almost constant up to 40% sample imbalance in class 2 data. By proper selection of the input weights and bias value, a better performance can be attained. If careful observation is not taken then the classification performance of E-FELM classifier falls drastically with sample imbalance. III.6. Integer-Coded Genetic Algorithm

95 IPS… 90

85

-10

10

30


70

-10

95 IPS… 90

85 10

30

50

70

Imbalance in C2 Fig. 2. Effects of the imbalances in data are depicted here, where the performance of the ELM classifier was analyzed for different imbalance conditions

In recent years, a number of techniques have been proposed for integrating genetic algorithms and neural networks. Genetic Algorithms are found to be effective in gene selection and classification. A study on selection function (ranking method) and genetic operators (hybrid crossover and mutation) of GA are described in [23]. Descriptions of string representation and Fitness are given below. III.7. String Representation In this paper, ICGA is used for selecting the N best independent features from the given set. The characteristic string, which denotes N independent features, is given as: =

Genetic algorithms are widely used to solve complex optimization problems, in which the number of parameters and constraints are large and analytical solutions are not easy to obtain [23].

50

Imbalance in C2 Individual Accuracy (2)

Thus, the best particle and the velocity of the equations are attained from the velocity update equation, the first term denotes the current velocity, the second term denotes the local search, and the third term is the global search. The fitness value of the particles is the validation efficiency of the E-FELM classifier, whose = , = and RMSE=b is initialized using the particle: =

Overall Accuracy (0)

Imbalance in C2

III.4. Improved PSO based E-FELM Classifier

, , , ,

where the selected features belong to the set S and they are independent.


2536


III.8. Fitness The main aim of feature selection is to determine the features (search nodes) that best illustrate the input output characteristics of the data. The results of the IPSO-E-FELM fivefold cross-validation test are used as fitness criteria, i.e., for the selected features, IPSO will identify the best hidden neurons, input weights, and biases values, and return the validation efficiency obtained by the E-FELM algorithm along with the best E-FELM parameters. The features returning the best validation efficiency eventually are chosen as representative of the full data set: = The best solution (for the selected set of genes and EFELM parameters) obtained after a given number of generations is used to develop a classifier using the complete training set. This classifier is then used to classify the testing samples.

IV.


In this section, the performance of the proposed approach is compared with other methods based on Global Circulation Models (GCM) data set [24], in two steps. Initially, with the GCM data set the classification results are compared with other classifiers then the results for gene selection are compared with other existing results for gene selection. The samples in each class are tiny with high sample imbalance in GCM data set, that is, large number of classes with high dimensionality requires attention for selection of samples to training and testing. In these experiments, the data set is dividing into training and testing data. IV.1. Global Cancer Map Data The GCM data is the collection of six different medical institutions around 14 different types of malevolent tumors. It consists of 190 primary complete tumor samples and 8 samples are not used here called metastasis. Each sample contains the virtual expression of 16,063 genes (take for granted a one-to-one mapping from gene to probe set ID). From 190 samples, 144 samples are utilized for gene selection and classifier growth and the left behind 46 samples are used for assessment of the generalization performance. The amount of training samples per class varies from 8 to 24 which are sparse and imbalanced.

Based on these notes, the GCM data set is sparse in environment with a high sample imbalance and a highdimensional feature space for huge number of genes. The main objective is to select sets of genes from the 16,063dimensional space and recognize the smallest number of genes needed to concurrently categorize every tumor types with greater accuracy. In turn to calculate the classifier performance for sparse and imbalance data set, the results obtained by the proposed IPSO_E-FELM classifier for a given number of genes is compared them with the existing classifiers. Here, 98 genes as selected in [24] as the source for the classifier performance comparison. The IPSO_E-FELM classifier is ruined to recognize the paramount number of hidden neurons, input weights, and bias by means of 144 training data. With the use of best E-FELM parameters, an E-FELM classifier is developed by means of the complete training data and the resultant classifier is tested on the remaining 46 samples. This study is experimented for a variety of random combinations of 144 training and 46 testing samples set, and the results are account in Table II. From the Table II, examine that the IPSO_E-FELM classifier gives better performance than the existing classifier for 98 genes selected in [24]. IV.2.

ICGA_IPSO_E-FELM-Based Gene Selection and Classification Results

The proposed approach is called to select 14, 28, 42, 56, 70, 84, and 98 genes from the original 16,063 genes using a 10-fold cross-validation method on the 144 training samples. The unexploited testing set (46 samples) is worn to assess the generalization performance. ICGA_IPSO_E-FELM is identified best genes for each set. In this experiments, create that the best genes are chosen throughout different runs do not share any common genes. The overlap between the best genes sets (14-98) chosen by proposed approach is insignificant, but their ability to differentiate the cancer classes is more or less similar. These results show that there be real subsets of genes that can discriminate or differentiate the cancer classes efficiently The performance of the proposed classifier by creating 100 random trials on the training and testing data sets is done by the best gene sets selected as above. It helps us to predict the classifier sensitivity to data variation. The average, maximum, and standard deviations of training and testing performances are given in Table III and the selected genes are listed in Table IV.

TABLE II COMPARATIVE ANALYSIS ON CLASSIFICATION METHODS IN LITERATURE FOR GCM DATA SET USING 98 GENES SELECTED AS EXPLAINED IN [24] Various Methods ns Training Testing Mean Std_Deviation Mean Std_Deviation SVM [22] 106 96.50 1.85 73.78 5.10 ELM[25] 50 92.30 2.25 79.43 6.23 PSO_ELM [7] 36 94.91 1.42 85.13 4.88 Proposed IPSO_E-FELM 30 93.14 1.23 88.45 3.94



2537


Genes 14 28 42 56 70 84 98

TABLE III PERFORMANCE OF PROPOSED CLASSIFIER FOR THE BEST SET OF FEATURES SELECTED BY IPSO BASED EFELM WITH ICGA GENE SELECTION APPROACH Training Efficiency % Testing Efficiency Avg Max Std_dev Avg Max Std_dev 92 92 2 72 80 6 92 92 2 70 85 6 90 94 1 73 98 4 90 94 1 87 96 3 93 97 2 88 98 4 94 97 2 91 98 4 93 97 2 91 97 4

TABLE IV GENES SELECTED FROM GCM DATA SET THAT WERE USED FOR CLASSIFICATION BY IPSO BASED E-FELM WITH ICGA GCM 42 Genes Gene Accession ID Gene # Accession ID Gene # Accession ID Gene # Accession ID 572 D79987_at 1882 M27891_at 7870 AA232836_at 13781 RC_AA403162_at 5836 HG3342-HT3519_s_at 6868 M68519_rnal_at 8034 AA278243_at 13964 RC_AA416963_at 917 HG3432-HT3618 _at 6765 M96132_at 8107 AA287840_at 14565 RC_AA446943_at 5882 HG417-HT417_s_at 3467 U59752_at 8231 AA320369_s_at 14793 RC_AA453437_at 1119 J04611_at 3804 U80017_rna2 8975 AB002337_at 11421 X05978_at 1137 J05068_at 6154 V00565_s_at 9546 H44262_at 476 D50678_at 9731 L13738_at-2 11443 X52056_at-2 9833 M21121_s_at 1383 L20320_at 4629 X79510_at 10322 R74226_at 9781 L40904_at 4781 X90872_at 12020 RC_AA053660_at 5319 L46353_at 4944 Y00815_at 12182 RC_AA100719_s_at 1655 L77563_at 11606 Z30425_at-2 12717 RC_AA233126_at 1791 M20530_at 7284 AA036900_at 13541 RC_AA347973_at TABLE VI RESULTS FOR GENE SELECTION AND CLASSIFICATION BY ICGA_IPSO_E-FELM FOR DIFFERENT DATA SETS Data set #Classes #Genes Testing Accuracy % Average Best Lymphoma 6 12 98 100 CNS 2 12 100 100 Breast Cancer-B 4 12 91 100

IV.3. Performance Comparison of Proposed IPSO based EFELM with ICGA Classifier with Existing Methods The proposed approach for the GCM data set results is compared with other existing methods. Table V shows the minimum number of genes needed by each method to attain maximum generalization performance. From the Table V, the proposed ICGA_IPSO_EFELM selects a minimum 42 genes with a high average testing accuracy. GA/SVM, selects a minimum of 26 genes which gives results close to ICGA_IPSO_E-FELM performance. It was seen that genes chosen in a variety of runs for any given subset do not have major overlaps also there is no any overlap of genes between any two subsets. Until now, the classifiers improved by means of these sets of selected genes make similar classification performance and were experiential to have the same discriminatory power to classify various cancer classes. The ICGA_IPSO_E-FELM gene selection and classifier was used to select the minimum number of genes necessary for accurate classification. The average classification accuracies are given in Table VI. TABLE V MINIMUM NUMBER OF GENES REQUIRED BY VARIOUS METHODS TO ACHIEVE MAXIMUM GENERALIZATION PERFORMANCE Data Gene selection method Genes Avg.Testing Set Accuracy % GCM Proposed ICGA_IPSO_E42 90 FELM 98 94 ICGA_PSO_ELM 42 88 98 91 GA/SVM [4] 26 85

V.

Conclusion

There is a significant growth in the gene expression microarray technology to facilitate the simultaneous monitoring of large amount of genes in a particular experiment. With those samples, a thorough study can be made whether there are patterns or dissimilarities among samples of different types. In this paper, an accurate gene selection and sparse data classification for microarray data is done using Improved PSO based E-FELM with ICGA gene selection for multiclass cancer classification is proposed. ICGA selected genes included with optimal input weights and bias values selected by IPSO and used by the E-FELM classifier, to deal with higher sample imbalance and sparse data conditions resourcefully. Hence, ICGA gene selection approach is incorporated with the IPSO based E-F-ELM classifier to identify a dense set of genes that can discriminate cancer types efficiently resulting in enhanced classification results.

References [1]

[2]


Boufera, H., Bendella, F., Intelligent decision based on multi agent system - To aid breast cancer diagnosis, (2010) International Review on Computers and Software (IRECOS), 5 (3), pp. 331-336. M. Ringner, C. Peterson, and J. Khan, “Analyzing Array Data


2538


[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12] [13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

Using Supervised Methods,” Pharmacogenomics, vol. 3, no. 3, pp. 403-415, 2002 Harrington, C. A., Rosenow, C., and Retief, J. (2000): Monitoring gene expression using DNA microarrays. Curr. Opin. Microbiol., 3:285-291. Eisen, M. B., Spellman, P. T., Brown, P. O. and Bostein, D. (1998): Cluster analysis and display of genome-wide expression patterns. Proc. of the Natl. Acad. of Sci. USA, 95:14863-14868 Dudoit, S., Fridlyand, J. and Speed, T. P. (2000): Comparison of discrimination methods for the classification of tumors using gene expression data. Technical Report 576, Department of Statistics, University of California, Berkeley Ryu, J. and Cho, S. B. (2002): Towards optimal feature and classifier for gene expression classification of cancer. Lecture Note in Artificial Intelligence, 2275:310-317 Saras Saraswathi, Suresh Sundaram, Narasimhan Sundararajan, Michael Zimmermann, and Marit Nilsen-Hamilton, “ICGA-PSOELM Approach for Accurate Multiclass Cancer Classification Resulting in Reduced Gene Sets in Which Genes Encoding Secreted Proteins Are Highly Represented”, IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 8, NO. 2, MARCH/APRIL 2011 V. Rashtchi, H. Shayeghi, M. Mahdavi, A. Kimiyaghalam, E. Rahimpour, “Using an Improved PSO Algorithm for Parameter Identification of Transformer Detailed Model”, International Journal of Electrical and Electronics Engineering 2:11 2008. Zhang Qizhong, “Gene Selection and Classification Using Nonlinear Kernel Support Vector Machines Based on Gene Expression Data”, IEEE, 2007. Mohr, J. ; Sambu Seo ; Obermayer, K., “Automated Microarray Classification Based on P-SVM Gene Selection”, IEEE, 2008. Banka, H. ; Dara, S., “Feature Selection and Classification for Gene Expression Data Using Evolutionary Computation”, IEEE, 2012. Dash, S. ; Patra, B., “Rough set aided gene selection for cancer classification”, IEEE, 2012. S. Suresh, S. Saraswathi, and N. Sundararajan, “Performance Enhancement of Extreme Learning Machine for Multi-Category Sparse Cancer Classification,” Eng. Applications of Artificial Intelligence, vol. 23, pp. 1149-1157, 2010. S. Suresh, R.V. Babu, and H.J. Kim, “No-Reference Image Quality Assessment Using Modified Extreme Learning Machine Classifier,” Applied Soft Computing, vol. 9, no. 2, pp. 541-552, 2009. Yanpeng Qu, Changjing Shang, Wei Wu, and Qiang Shen, “Evolutionary Fuzzy Extreme Learning Machine for Mammographic Risk Analysis”, International Journal of Fuzzy Systems, Vol. 13, No. 4, December 2011 P.V. Naganjaneyulu, K. Satya Prasad, (2009)."An Adaptive Blind Channel Estimation of OFDM System by Worst Case H∞ Approach", International Journal of Hybrid Information Technology, Vol.2, No.4, pp: 1-6. Ye (Geoffrey) Li, Leonard J. Cimini, Nelson R. Sollenberger, (1998). "Robust Channel Estimation for OFDM Systems with Rapid Dispersive Fading Channels", IEEE Transactions on Communications, Vol. 46, No. 7, pp: 902-915. Prasanta Kumar Pradhan, Oliver Fausty, Sarat Kumar Patra, Beng Koon Chua, (2011). "Channel Estimation Algorithms for OFDM Systems", International Conference on Electronics Systems, National Institute of Technology, Rourkela, India. Yonghong Zeng, W.H.Lam, Tung Sang, (2006)."Semiblind Channel Estimation and Equlization for MIMO Space-Time Coded OFDM", IEEE Transactions on Circuits and Systems, Vol.53, No.2, pp: 463-474. H. Bölcskei, R.W. Heath, Jr., A. J. Paulraj, (2002). “Blind channel identification and equalization in OFDM-based multi-antenna systems,” IEEE Trans. Signal Processing, vol. 50, pp. 96–109. L. Deneire, P. Vandenameele, L. Van der Perre, B. Gyselinckx, M. Engels, (2001).“A low complexity ML channel estimator for OFDM,” in Proc. IEEE Int. Conf. Commun., Helsinki, Finland, June 11–14. S. Suresh, N. Sundarajan, and P. Saratchandran, “A Sequential Multi-Category Classifier Using Radial Basis Function Networks,” Neurocomputing, vol. 71, nos. 7-9, pp. 1345-1358,


2008. [23] Z. Michalewicz, Genetic Algorithm + Data Structures = Evolution Programs, third ed., pp. 18-22. Springer-Verlag, 1994. [24] S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, T. Poggio, W. Gerald, M. Loda, E.S. Lander, and T.R. Golub, “Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures,” Proc. Nat’l Academy of Sciences USA, vol. 98, no. 26, pp. 15149-15154, Dec. 2001. [25] R. Zhang, G.-B. Huang, N. Sundararajan, and P. Saratchandran, “Multicategory Classification Using an Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis,” IEEE/ ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 3, pp. 485-495, July-Sept. 2007. [26] Zhao, Y., Xu, X., Wang, B., Bai, X., Two-dimensional fuzzy entropy image segmentation based on adaptive CPSO algorithm, (2012) International Review on Computers and Software (IRECOS), 7 (4), pp. 1767-1772. [27] Ahmadi, A., Mashoufi, B., A new optimized approach for artificial neural networks training using genetic algorithms and parallel processing, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2195-2199.


Associate Professor, P.S.G. College of Arts and Science, Coimbatore. E-mail: [email protected] 2

Assistant Professor, Dr.NGP Arts and Science College, Coimbatore. E-mail: [email protected] Mr. Thirunavu Karthikeyan received his graduate degree in Mathematics from Madras University in 1982. Post graduate degree in Applied Mathematics from Bharathidasan University in 1984. Received Ph.D., in Computer Science from Bharathiar University in 2009. Presently he is working as a Associate Professor in Computer Science Department of P.S.G. College of Arts and Science, Coimbatore. His research interests are Image Coding, Medical Image Processing and Datamining. He has published many papers in national and international conferences and journals. He has completed many funded projects with excellent comments. He has contributed as a program committee member for a number of international conferences. He is the review board member of various reputed journals. He is a board of studies member for various autonomous institutions and universities. E-mail [email protected] R. Balakrishnan MSc., M.Phil.,(Phd). He is a Phd Research Scholar in Manonmaniam Sundaranar University, Tirunelveli, Tamilnadu, India. He is working as a Assistant Professor in Dr.NGP Arts and Science College, Coimbatore. He has 15 years of experience in teaching line 9 years of experience in research. He conducted International, National Conference and he presented paper in International, National Conference and Journals His Interest areas are Data Mining, Image Processing, Current research project Genetic Algorithm using , Image Processing.


2539


Comparative Analysis of Intrusion Detection System with Mining S. Vinila Jinny1, J. Jayakumari2 Abstract – Intrusion Detection System (IDS) is a system that monitors the network activities for a suspicious event. Suspiciousness in the event cannot be identified by a single activity. Set of activities that crosses the limit of normal behavior is considered as an intrusion. There are many methods that provide security like authorization, authentication, and encryption. But all these methods decide the security issue by a single activity so that the slow intrusion is not identified. In this paper, we review the existing real time IDS models that capture the slow poisoning of the network by using improved data mining algorithms. Normal IDS has the issues like low accuracy, high false negative rates. This system suggests the high accuracy IDS system with less false positives. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Anomaly based Intrusion Detection, Decision Tree, Intrusion Detection System (IDS), KDD Cup 99 Data Set, Misuse Detection, SVM

Nomenclature N nc P R α f φ

II.

Number of training examples for c=cj Number of examples for which c=cj Prior estimate Equivalent sample size Lagrange multiplier Classification function Kernel function

I.

Introduction

A cryptographic mechanism secures the computers and their communications from unauthorized use by access control. Access control is the basic thing which provides security. There are many techniques that provide access control [1]. If the break in is known openly taking measure is very easy. But intrusion is similar to slow poisoning. If it goes unnoticed it will result in the death of the network. These intrusions can be identified only through statistical measurement of user behavior. Here monitoring the network is very important. Thus it leads to the importance of developing a monitoring system. Monitoring system should have the ability to detect intrusion [2]. This is possible only by training a system with known vulnerability so that it can compare with this. To get known vulnerability it is necessary to mine some records. Different sources are there to develop known vulnerabilities. The remaining part of the paper first tells an overview of IDS, and then tells the various techniques available. Then it tells the various steps in establishing a new classifier. It then tells the various existing models and it provides a comparative study on them and finally suggests a best approach.


2540

IDS Overview

The Intrusion detection system as a whole can be split into a system containing three modules. Modules are Monitoring module, analyzing module and reestablishing module. Fig. 1 depicts the modules. Monitoring module is the fully developed IDS which is placed in the network for identifying intrusions and may take some preventive measure. The issue to be met in this module is where to place this in the network. It is best decided by network experts. Analyzing module is the important module which is the ultimate place of our consideration. In order to develop new IDS we have to consider some sources to be analyzed and we should design some classifier which identifies intrusion from normal behavior. It is the place where we have to apply the suitable data mining algorithm for constructing the classifier [2]. If once the classifier is designed means it can be set as a monitoring system. The third module is the re-establishing module which can be considered as reengineering process. As the number and type of attacks increases daily it requires the process of updating the classifier with new information so that only we can use it effectively. ANALYZER

RE-ANALYZER

MONITOR

Fig. 1. IDS Modules


S. Vinila Jinny, J. Jayakumari

II.1.

IDS Techniques

Intrusion Detection System basically uses two types of techniques [3]. Misuse detection and Anomaly based detection. In misuse detection already known attacks are maintained as a database when new intrusion is suspected it is matched with the existing database and if matches we can identify it as intrusion [6]. In anomaly detection technique, normal behavior of the system is maintained in the database, the network behaviors are compared with this and if it deviates from behavior means we can consider it as an intrusion [7]. In misuse detection new attacks goes unidentified resulting in false negatives where as in anomaly detection technique some genuine behavior is also considered as intrusion resulting in false positives. So in order to overcome this difficulty nowadays hybrid system is used which contains both misuse detection and anomaly detection. An example of intrusion patterns in misuse detection is “more than three consecutive failed logins”. An example of intrusion pattern in anomaly detection is “CPU usage by a program”. II.2.

KDD CUP 99 DATASET

Test Data

Training data

Filter Filter IDS model

Detection Model

IDS Types

Updation

IDS can be classified in to three. They are Host-based, Network-based and hybrid-based [5]. In Host-based intrusion detection system monitor module is placed in any one of the node and data in that local area or distributed data is analyzed. Here we can apply both anomaly and misuse detection techniques. In Networkbased IDS monitor module is placed in specific point of network infrastructure [16], which captures packets and analyses it for intrusion. Hybrid-based is the combination of two, which requires the additional management of two systems.

III. IDS Model IDS model is first created. Fig. 2 shows the classification model. For developing the classification model KDD Cup 99 Data set is used. The training data is first passed through a filter which pre-processes the data. Then the classification algorithm is applied to develop the classification model. After that using test data the classification model is tested to for right classification. If new anomaly character occurs the anomaly pool is updated. The input to the proposed system will be a raw dataset, which is divided into two subsets as training dataset and testing dataset. Initially, the training dataset is classified into five subsets so that, four types of attacks (DoS (Denial of Service), R2L (Remote to Local), U2R (User to Root), Probe) and normal data are separated [8], [9]. Regular updating of patterns is very important so that recent and new intrusions can be detected. Also fast detection is also an important issue that should be tackled.


Normal

Attack Pool

Fig. 2. Classification Model

IV.

Classification Model

The classification model of the system follows Naive Bayes, decision tree and support vector machine method. Detection accuracy of the system is affected by these classifiers [6]. We go for analyzing the use of these classifiers in the IDS model. IV.1. Naïve Bayes Naive Bayes follows the Bayesian model, which is a probabilistic model [12]. In this model the probability of each class and features are noted with the help of the training dataset. The outcome of the current data or the classification is based on these probabilistic models. Naive-Bayes is a supervised learning method, it can be said as statistical method for classification. The uncertainty about the mode is identified by determining probabilities of the outcomes. The Bayes Naive classifier selects the most likely classification Cnb given the attributes with values a1,a2,a3,...,an., which will result in: =

∈

(

(1)


2541


( )=

P(ai | cj) is estimated as r-estimates: (

| )=

+

⁄( + )

( )=

∝

V. Decision Trees are insightful methods for classifying a model through a series of regulations or questions. Here the next question depends on the solution on the current question. It is mostly useful for categorical data, as rules do not require any idea of measure [12]. A variety of different Decision Tree algorithms exists, such as ID3, C4.5, CART or CHAID. ID3 algorithm was developed by J. Ross Quinlan in 1979. This model learns by past history that is with example and this way of learning is called Symbolic Learning and Rule Induction. ID3 algorithm is a supervised learner; it looks for input/output matches of the give data to get the results for new facts. A decision tree classifier classifies data based on its attributes. It is bottom to top approach. It has decision nodes and leaf nodes. The leaf node has homogeneous data that means further classification is not necessary. ID3 forms decision trees until all the leaf nodes are homogeneous. ID3 doesn’t handle continuous, numeric data which requires discretization. IV.3. Support Vector Machine SVMs is introduced by Boser, Guyon and vapnik. It is developed based on Statistical Learning Theory [13], [15]. Support Vector Machine is much easy than neural networks. SVM is a learning algorithm. The basic behind this algorithm is to convert the problem in to a linearly separable problem by using mapping function, which transforms the given data in input space to feature space. After converting the problem to linear helps to automatically discover the best separating hyper plane. SVM uses the following classification function in case of linear problems: ( )· ( )

(4)

(

· )+

(5)

∝ is a Lagrangian multiplier which gives the importance of each data points.

IV.2. Decision Trees

∝

+ )

(2)

where: n is the number of training examples for c=cj nc is the number of examples for which c=cj and a = ai p is a prior estimate for which P(ai | cj) r is the equivalent sample size

( )=

( ·

(3)

But this is not suitable for non-linear problems. So it lead to the use of kernel functions. Kernel functions like polynomial, radial basis function are used to separate the feature space by placing a hyper plane. If we have N training data points {(x1,y1), (x2,y2),............., (xN,yN)}, where xi  Rd and yi  {+1,-1}. In a hyperplane (u,c), where u is a weight vector and c is a bias, the classification of new object x is done with:


Data Set

In order to evaluate a system, we need data set. Data set can be prepared by collecting network data from actual network or we can use existing data sets like KDD’99 data set, NSL-KDD Data set. To evaluate the system we have used NSL-KDD data set [14]. NSL-KDD data set is a improved form of KDD’99 data set. The number of records in the NSLKDD train and test sets are reasonable. Evaluation results based on this will be consistent. This consists of about eight set of files. Table I shows the set of files in the data set. TABLE I FILES IN THE DATA SET File Formats Files Names ARFF TXT KDDTrain+ Full train set with Full train set with binary label attack-level label and difficulty level KDDTrain+_20percent 20% subset 20% subset KDDTest+ Full test set with Full test set with binary label attack-level label and difficulty level KDDTest-21 Subset of test set Subset of test set with not including with not including 21 difficulty level 21 difficulty level records records

The relation taken for training is KDD train. It contains about 125973 records and contains 42 attributes. As it is pre-processed it doesn’t contain missing values. Table II shows the list of attributes in the data set and it also gives the type and uniqueness of each attribute.

VI.

IDS Performance Evaluation

After building a classifier model, it is very much required to evaluate the performance of the model. There are various evaluation metrics for the predictive accuracy of a classifier. The accuracy of the IDS model is the percentage of intrusions that is correctly identified: Accuracy, recognition rate =

+ +

(6)

The inability of the IDS model can be calculated using error rate: Error rate, misclassi ication rate =

+ +

(7)


2542


(8) (9)

(10)

(11)

(12) TABLE II ATTRIBUTE LIST Attribute Distinct Type Duration 2981 Numeric Protocol_type 3 Nominal Service 70 Nominal Flag 11 Nominal Src-bytes 3341 Numeric Dst_bytes 9326 Numeric Land 2 Nominal Wrong_fragment 3 Numeric Urgent 4 Numeric Hot 28 Numeric Num_failed_logins 6 Numeric Logged_in 2 Nominal Num_compromised 88 Numeric Root_shell 2 Numeric Su_attempted 3 Numeric Num_root 82 Numeric Num_file_creations 35 Numeric Num_shells 2 Numeric Num_access_files 10 Numeric Num_outbound_cmds 1 Numeric Is_host_login 2 Nominal Is_guest_login 2 Nominal Count 512 Numeric Srv_count 509 Numeric Serror_rate 89 Numeric Srv_serror_rate 86 Numeric Rerror_rate 82 Numeric Srv_rerror_rate 62 Numeric Same_srv_rate 101 Numeric Diff_srv_rate 95 Numeric Srv_diff_host_rate 60 Numeric Dst_host_count 256 Numeric Dst_host_srv_count 256 Numeric Dst_host_same_srv_rate 101 Numeric Dst_host_diff_srv_rate 101 Numeric Dst_host_same_src_port_rate 101 Numeric Dst_host_srv_diff_host_rate 75 Numeric Dst_host_Serror_rate 101 Numeric Dst_host_srv_serror_rate 100 Numeric Dst_host_rerror_rate 101 Numeric Dst_host_srv_rerror_rate 101 Numeric

Unique (%) 2 0 0 0 1 3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

FP - It refers to negative tuples that were incorrectly labelled as positive. (False positives) FN - It refers to positive tuples that were mislabeled as negatives (False negatives) Precision is a measure of exactness and recall is a measure of completeness of intrusion detection. The performance of the IDS model is also evaluated by measures like scalability, speed of detection, robustness, interpretability [10], [11]. Table III illustrates the detection accuracy of the system. It shows that support vector machine has high detection accuracy over other two. Table IV illustrates the performance evaluation, which shows that the training and testing time for support vector machine is higher than naive bayes and ID3 but accuracy is also high. Fig. 3 shows the performance evaluation of the three classifier models. Fig. 4 shows the detection accuracy of the three classifier models.

Classification Algorithm Naive Bayes ID3 SVM

Classification Algorithm Naive Bayes ID3 SVM

TABLE III DETECTION ACCURACY Accuracy of detection (in %) Correctly classified Incorrectly classified 90.4 9.6 99.8 0.2 99.9 0.1 TABLE IV PERFORMANCE EVALUATION Accuracy of detection (in %) Training Testing Accuracy 2.73 0.06 90 5.4 0.54 99 6.5 1.5 99.5

Fig. 3. Performance Evaluation

In (6)-(12): TP - It refer to the positive tuples that were correctly labeled by the classifier. (True positives) TN - It refer to the negative tuples that were correctly labelled by the classifier. (True negatives) Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Fig. 4. Detection Accuracy


2543


VII.

Conclusion and Future Work

This paper illustrates the working of Intrusion Detection System with selected classification models. By choosing the best classifier the detection accuracy of the system is highly improved. From the above comparative analysis it says that Support vector Machine classifier takes high training time and testing time but its accuracy is high compared to others. IDS requires classifier taking less time but good detection accuracy, so it is required to improve the SVM by applying pre processing or post processing techniques for other two algorithm. It is better to apply attribute selection methods to improve the overall efficiency of the system. It is planned to apply this algorithm in combination with feature selection algorithms.

References [1]

[2] [3] [4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13] [14]

[15]

Emmanuel S. Pillai, R.C. Joshi, Rajdeep Niyogi, Network forensic frameworks: survey and research challenges, Digital Ivestigation, vol. 7, pp. 14-27, 2010. D.Barbara, S.Jajodia, Applications of Data Mining in Computer Security (Norwell, MA:Kluwer,2002). A. Sundaram, An Introduction to intrusion detection (Crossroads: The ACM Student Magazine, 2(4), 1996). K. Anand, S. Ganapathy, K.Kulothungan, P. Yogesh, A. Kannan, A rule based approach for Attribute Selection and Intrusion Detection in Wireless Sensor Networks, In proceedings of International Conference on Modeling Optimisation and Computing., pp. 1658-1664, 2012. Marcos M. Campos, Boriana L. Milenova, Creation and Deployment of Data Mining-Based Intrusion Detection Systems in Oracle Database 10g, In Proceedings of the Fourth International Conference on Machine Learning and Applications, 2005. W. Lee, S. Stolfo, and K. Mok, A Data Mining Framework for Building Intrusion Detection Model, In Proceedings of the IEEE Symposium on Security and Priva 1cy, Oakland, CA, pp. 120-132, 1999. E. Eskin, A. Arnold, M .Prerau, L. Portnoy, and S. Stolfo, A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data, In Applications of Data mining in Computer Security. Norwell, MA: Kluwer, 2002. Srilatha Chebrolu, Ajith Abraham, Johnson P. Thomas, Feature deduction and ensemble design of intrusion detection systems, Computers and Security. pp. 295-307, 2005. R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, Proc. 1993 ACM SIGMOD International Conf. on Management of Data, Washingtion, DC, 1993, 207-216. W. Lee and S. Stolfo, Data mining approaches for intrusion detection, Proc. 7th USENIX Security Symposium (SECURITY ’98), San Antonio, TX, 1998, 79-94. M .Moorthy, S. Sathiyabama, A Hybrid Data Mining based Intrusion Detection System for Wireless Local Area Networks, In proceedings of International Journal of Computer Applications, vol.49, pp. 10, July 2012. Ian H. Witten & Eibe Frank , Data Mining-Practical Machine Learning Tools and Techniques (Morgan Kaufmann publishers, 2005). Jiawei Han, Micheline Kamber, Jain Pei, Data Mining : Concepts and Techniques ( Elsevier, 2011) Nsl-kdd data set for network-based intrusion detection systems, Available on: http://nsl.cs.unb.ca/KDD/NSL-KDD.html, March 2009. S. Vinila Jinny, J.Jaya Kumari, Neuralised intrusion detection system, In proceedings of IEEE Conference on Signal Processing, Communication, Computing and Networking Technologies, 2011.


[16] Weihua, H., Qi, J., Yuge, D., Li, C., Zhao, W., Li, C., Anides: Agent-based network intrusion detection expert system, (2012) International Review on Computers and Software (IRECOS), 7 (4), pp. 1453-1457.


Asst. Prof. Dept. Of CSE, Noorul Islam University.

2

Prof. Dept of ECE, Noorul Islam University.

S. Vinila Jinny is working as Assistant Professor, Department of Computer Science and Engineering, Kumaracoil. She received her B.E. degree in Computer Science and Engineering from M.S. University, Tirunelveli in 2004 and M.E degree in Computer Science and Engineering from Anna University, Chennai in 2006. Now doing Ph.D in Computer Science and Engineering in Noorul Islam University, Kumaracoil. Her research area of interest is Data mining and Security. Have teaching experience of 7 years. Jayakumari J. is working as Professor & Head, Department of Electronics & Communication Engineering, Noorul Islam University, Kanyakumari District, Tamil Nadu. She received her B.E. degree in Electronics & Communication Engineering from M.S. University, Tirunelveli in 1994 and M.Tech degree in Applied Electronics and Instrumentation from University of Kerala in 1998 and Ph.D degree in Electronics and communication Engineering from University of Kerala in 2009. Have teaching experience of 17.5 years, research experience of 10 years and administrative experience of 13 years. Was Head of Dept. of Electronics & Communication, C.S.I. Institute of Technology, Thovalai, Kanya kumari district (2000-2009). Have published several papers in International journals. Her research interest includes Wireless communication and Networking, Signal and image processing, Detection & Estimation Theory, Spread Spectrum Systems and Error Correcting Codes. She is a life member of Institution of Electronics and Telecommunication Engineers (IETE) , Indian Society for Technical Education (ISTE),Institution of Engineers(India) and Senior Member of Institution of Electrical and Electronics Engineers (IEEE).


2544


Enhanced Distributed Text Document Clustering Based on Semantics J. E. Judith, J. Jayakumari

Abstract – Distributed text document clustering is an emerging area that is used to improve quality in information retrieval and document organization in digital libraries. Enormous amount of data are available in large scale networks .So it is difficult to cluster data from a centralized location. A wide variety of distributed text document clustering algorithms are available for analyzing data from distributed sources. An enhanced distributed text document clustering algorithm (DEKLSI) has been proposed that uses an enhanced K-Means algorithm along with Latent Semantic Indexing (LSI) for increasing the quality and accuracy of the algorithm. Latent Semantic Indexing is used to cluster the documents based on semantics that deals with the problems of synonymy and polysemy. The results show improvement in the clustering quality and execution time thereby the accuracy. The performance of this enhanced algorithm is compared and analyzed with the clustering algorithm without semantics. The experiment is evaluated using two different document datasets such as 20NG, Reuters. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.

Keywords: Enhanced K-Means, Semantics, LSI, Distributed Document Clustering, Synonymy, Polysemy

Nomenclature P(C) E(C) P R F Σ

Purity of cluster Entropy of cluster Precision Recall F-measure Summation

I.

Introduction

The fundamental tasks in data mining are clustering and classification. Classification is a supervised learning method and clustering is an unsupervised learning method (in some cases used for both) [1]. Clustering is a descriptive task and classification is a predictive task. The main objective of clustering is to discover a new set of categories that are of interest in themselves, and their assessment is intrinsic. In classification tasks, the groups must reflect some reference set of classes and their assessment is extrinsic. I.1.

Document Clustering

Document clustering groups together a set of documents in to classes or clusters. The documents within a cluster have high similarity compared to documents in other clusters [1]. It is used to assign the documents in a collection to a single cluster. Initially there is a single cluster and after processing the documents are distributed among a number of clusters.


2545

The documents in a cluster must have high intracluster similarity and low inter-cluster similarity. In order to cluster documents the closeness between a pair of documents must be efficiently defined [2]. A number of similarity or distance measures are available like Cosine similarity, Jaccard coefficient, Euclidean distance etc. The performance of these similarity measures are evaluated in [3]. There are many applications of document clustering like clustering of web search engine results so that it is understandable to the user, clustering documents in digital libraries, automatic creation of document hierarchies or taxonomies, and efficient retrieval of relevant clusters through information retrieval techniques [4]. I.2.

Distributed Document Clustering

Centralized data warehouse based system for data mining is thought to be as containing a large repository, and then mine knowledge from it. But mining knowledge from distributed resources is in great demand [3]-[5]. Conventional algorithms which are available to us are based on assumption that the data is memory resident, which makes them unable to cope with the increasing complexity of distributed algorithms. Most existing document clustering algorithms are designed for central execution. In these algorithms clustering is performed on a dedicated node, and are not suitable for use over large scale distributed networks. Therefore, novel efficient algorithms for distributed clustering have to be developed [3]-[8].


J. E. Judith, J. Jayakumari

These distributed clustering algorithms are needed to overcome the challenges in distributed document clustering. A number of challenges exist in distributed document clustering like, selecting a good similarity measure, improving the clustering quality and improving the speedup. There are a number of metrics to evaluate the performance of the algorithm based on these parameters [4]. I.3.

Document Clustering Algorithms

Many methods and algorithms for clustering are developed. The most widespread clustering algorithms fall into two [18] categories: hierarchical and partitional clustering. Han and Kamber [1] suggest categorizing the methods into additional three main categories: densitybased methods, model-based clustering and grid based methods. Hierarchical clustering groups together the documents by constructing a tree of clusters, known as a dendrogram. This can be done either top-down (divisive) or bottom-up (agglomerative).However the efficiency of this algorithm is reduced[1],[7]. Partitional clustering performs clustering by initially splitting the documents into a number of subsets and then finding the distance among the documents in a cluster. A number relocation schemes are used to iteratively optimize the clusters. The clusters are revisited at each step and this is the main advantage. These algorithms are more efficient for text document clustering. Another difference between hierarchical and partitional methods is the way the distances between clusters are calculated. The former approach computes the distances between each individual points of a group, while the latter uses a single representative for each cluster [1]. There are two techniques used for constructing a representative for a cluster: taking the data point that represents the cluster best (k-medoids) or computing a centroid, usually by averaging all the points in the group (k-means).The objective of distributed data mining is to approximate the data mining result of centralization approach using distributed resources [3][8]. In this paper, Section 2 deals with the previous works. Section 3 explains the system design. Section 4 clearly shows the experimental results and analysis and the paper conclusion is at Section 5.

II.

Related Methods

Distributed Data Mining (DDM) algorithms started its emergence since 1990. Data mining in distributed environments is known as DDM, also known as Distributed Knowledge Discovery (DKD). Eisenhardt et al.[10]was the first to introduce P2P clustering algorithms. In this method the centroid information is calculated and broadcasted to all peers for K-Means computation. Due to this centroid broadcasting, the


scalability of the algorithm is affected. Hsiao and King [6] introduced an algorithm that avoids broadcasting by employing a DHT to index all clusters using manually selected terms. Extensive human interaction is needed for selecting the terms, and the algorithm cannot adapt to new topics. Eshref Januzaj et al.[11] proposed an effective and efficient distributed clustering .In this method for both local and global clustering a density based clustering algorithm is used. This method shows improvement in clustering quality as well as efficiency. But the author suggests that performance of local clustering algorithm can still be improved there by accuracy. Hammouda and Kamel[4] proposed a distributed K-Means algorithm for hierarchical topology. From lowest level of the hierarchy the local clustering solutions are aggregated until the root peer is reached. The drawback of this method is clustering quality decreases for each hierarchy level, and significantly for large networks. Datta et al.[5] proposed two P2P approximations of K-Means. In LSP2P the centroids are distributed using gossiping. When it comes to clustering text, LSP2P fails because it is based on the assumption that data is uniformly distributed among the peers, this assumption clearly does not hold for text collections in P2P networks. The second algorithm, USP2P, uses sampling to provide probabilistic guarantees. In this method a coordinating peer gets easily overloaded, since it is responsible for exchanging centroids with a significant number of peers, for sampling.Chang Liu et al.[12] introduced distributed document clustering for search engine. In this method TF-IDF is used for document representation and initial cluster generation and cosine similarity measure is used. Clustering quality is improved in this method but it does not deal with the problem of synonymy and polysemy. Qing He et al [13] proposed a text clustering based on frequent term sets for peer-to-peer networks. Zhongjun Deng et al.[14] introduced hybrid clustering algorithm over P2P network. In this method a k-medoids algorithm is used in each node and k-means algorithm is used between different nodes. Sridevi. U et al.[15] introduced a semantically enhanced document clustering based on PSO. Here the documents are clustered based on semantic relationship between the documents. The performance of this algorithm is evaluated on a centralized node. Odysseas Papapetrou et al.[8] proposed a probabilistic text clustering for peer-to-peer networks. This paper presented a distributed document clustering based on distributed hash table (DHT).The relationship between term and documents based on semantics is not evaluated. A. Amine, Z. Elberrichi, M. Simonet [21] proposed an evaluation and a comparison of three text clustering methods based on WordNet for the representation of texts. Different similarity measure were analysed on clustering but it did not consider the relationship between terms and distributed data. An effective hybrid distributed document clustering


2546


based on K-means and K-medoids using Jaccard coefficient similarity measure was proposed in our previous publications [9]. The clustering quality and accuracy of the algorithm can be improved by considering semantics using latent semantic indexing technique that also improves the speed of execution of the algorithm.

III. System Design III.1. Text Document Preprocessing Text document preprocessing is done to represent the documents in a suitable format. It is done to optimize the performance of text mining algorithm by discarding irrelevant data. Preprocessing consists of steps that take as input a plain text document and output a set of terms [3]-[5], [9] to be included in the vector space model. Some common words like articles, prepositions, conjunction called stop words that do not contribute to the retrieval task are removed. Stemming is done to reduce words to their base form or stem. For example the words “connected”, ”connection”, ”connections” are all reduced to the stem “connect”. Porter’s algorithm [16] is the de facto standard stemming algorithm that is used here. This is used to reduce the dimensionality of text documents. Pruning or filtering is done to remove words that appear with very low frequency throughout the corpus. A pre-specified threshold is used for this. In some cases words which occur too frequently are also removed. III.2. Text Document Representation Model Documents have to be transformed from full text version to document vector which describes the content of the document as a vector. The document data model that is used here is the Vector Space Model [17]. Each document is represented by a vector d = tf1, tf2,… tfn, where tfi is the frequency of each term (TF) in the document. In order to represent the documents in the same term space, all the terms from all the documents have to be extracted first. They are represented as a termdocument matrix. Term weighting is a process of calculating the degree of relationship between a term and a document.This results in a term space of thousands of dimensions. Since each document usually contains several hundred words, these representation leads to a high degree of sparsity. Therefore another factor is added to the term frequency called the inverse document frequency (IDF). Thus, each component of the vector d now becomes tfi*idfi. The frequency of a term t in the document d gives the term weight of the document d in a collection of documents D that is described as: ( , )=

( , )×

| | ( )

(1)


where df(t) is the frequency of documents in which term t appears. This term-document matrix shows the corresponding relationship between term and document [19]. III.3. Determining the Similarity Measure Similarity/distance measure must be determined before clustering is done. Jaccard coefficient is used as the similarity measure to perform clustering. This is effective for very sparse data [2], [3], [9] and it outperforms other similarity measures. Jaccard coefficient [2],[3],[9] measures the similarity by comparing the sum weight of shared terms to the sum weight of terms that are present in either of the two document but are not the shared terms: (⃗ , ⃗ ) =

⃗ ∙ ⃗ |⃗ | + |⃗ | − ⃗ . ⃗

(2)

where ta and tb are m-dimensional vectors over the term set. The value ranges between 0 and 1. It is 1 when the two documents are identical and 0 when the two documents are disjoint. III.4. Enhanced Distributed K-Means (DEKMeans) Clustering The general steps involved [9], [18] in distributed clustering algorithm are: o Computing a local model. o Aggregating all the local models by a central node. o Computing a global model or aggregated models are sent back to all the nodes and locally optimized clusters are produced. Traditional K-means document clustering algorithm works by determining the centroids of the cluster, where the coordinate of each centroid is the mean of the coordinates of the documents in the cluster and assigns every document to the nearest centroid. In the proposed method a local model is first generated by applying an Enhanced distributed K-means (DEKMeans) algorithm to each and every peer. Fig. 1 gives the overall system methodology that is followed in this paper. Each peer consists of a set of documents of different classes that have to be clustered. The documents are represented as a vector by calculating the term frequency and inverse document frequency [17].Thus the documents are represented as a term-document matrix. A random document is selected as initial centroid. To cluster a document in a peer, the centroid of the document is compared with centroid of other documents using Jaccard coefficient similarity measure. The document is assigned to the most similar cluster and the centroid of each and every cluster is recalculated to generate the local document clusters. The dimensions of these local document clusters are reduced by distributing latent semantic indexing to identify the concepts International Review on Computers and Software, Vol. 8, N. 10

2547


contained in tthe he text. text These The documents are categorized based on their conceptual meaning and are aggregated to generate global document clusters.

the original matrix but also shows relationship relations hip among documents using semantics [[20 20]. The core of LSI is the SVD that can decompose an m×n m n matrix. Consider two matrices of U and V that satisfy the following condition: X=USVT

(3)

where where S is a diagonal matrix in which the diagonal elements are the singular values of matrix X. X The matrices of U and V must satisfy the following: U TU = V T V = I

(4)

If the first k elements must be larger than a certain threshold and diagonal elements (r (r-kk)) are zero, then the first k columns of U are kept and the first k rows of V.. Afterwards two new truncated matrices Uk and Vk are formed formed.. A new kk-rank rank matrix Xk using Uk and Vk and Sk is generated: Xk = Uk Sk V kT (5)

Fig. 1. System Design

III.55. Distributed Latent Semantic Indexing ((D DLSI) LSI) LSI has been widely used in text mining. It plays an important role in information retrieval fields. LSI is effective in dealing with the problems of synonymy and polysemy within a proper matrix scale. LSI assumes that there are certain kinds of latent semantic relationships among the documents. Th Thesee relationships are hidden in the context due to polysemy and synonymy. Latent Semantic Indexing is a method for dimensionality reduction [[19 19].The ].The dimension reduction can enhance the efficiency and improve the precision of clustering. The advantage of using LSI [19 [19]] is to tackle the problem of synonymy and polysemy. Synonymy refers to the fact that the same underlying concept can be described using different terms [19]. [19] Polysemy describes words that have more than one meaning. By using a reduced representation in LSI, LSI, some "noise" from the data, which could be described as rare and less important usages of certain terms are removed. LSI is a dimensionality reduction technique that takes a set of documents that exist in a high high-dimensional dimensional space and represents them in a low dimensional space. Latent semantic indexing is the application of a particular mathematical technique, called Singular Value Decomposition or SVD, to a word document matrix. word--by--document SVD is a least least-squares squares method. It is a terms terms--documents documents matrix which repre represents sents the original matrix approximately. This matrix not only reduces the scale of

Copyright © 2013 Praise Worthy Prize S.r.l S.r.l.. - All rights reserved

Therefore Xk can be used to represent the original terms documents matrix X approximately. LSI can help to achieve an approximate matrix from the original terms documents matrix using SVD. This new matrix terms-documents reduces noises. Document assignment in LSI to one or more predefined categories is based on their conceptual similarity of the categories. It also scales scale and highlights the semantic [[19]] relationships of terms and documents. F For or large document data size the performance of LSI has to be improved improved. III.6 K-Means eans and LSI (DEKLSI) III.6. Enhanced Distributed K The good characteristic of K K-means means algorithm is that it can be distributed. The performance of executing L Latent atent Semantic ndexing (LSI) on a standalone machine is Semantic Indexing reduced so in order to improve itss performance it is used along with other clustering algorithms algorithms.. In each and every node, the text documents are preprocessed and node, represented using term term-docu document ment matrix. This original term document matrix is formed using the vectors of term-document terms and vectors of documents determined using tf tf× ×idf idf that combines both local and global weighting and can be regarded as an input data set for K K-means. means. Jaccard similarity measure is used to determine the similarity between term term--document document matrixes. However, using K K-means means clustering the large document data set is clustered in to several practically independent clusters. That is the original term independent term--document document matrix is reduced to several smaller term term--document document matrices. LSI also called SVD is applied to these term term-document matrices and the dimension of the term term-documents matrix is reduced according to the number of nodes This technique reduces the loss of semantic nodes. relationships between terms and documents and produces a global document cluster cluster. Combined with K-means, K means, the processes of computing Singular Value Decomposition is


2548


faster and easier since they are applied to smaller matrices. Since Distributed K-means is used with LSI the execution time is reduced and clustering quality is improved.

IV. Experimental Results IV.1. Experimental Setup Each peer node is configured with Intel core i3 processor with a speed of 3.2 GHz and 4 Giga bytes of memory, Windows 7 operating system is loaded in the peer nodes. For performance evaluation, data is partitioned randomly over all the nodes of the network. The number of clusters corresponds to the actual number of classes in each data set. A set of clusters are formed in each and every node. The dimensionality of these clusters is reduced based on semantics.

(6) where is the number of documents that are from the dominant category in cluster Cj and njh represents the number of documents from cluster Cj assigned to category h.For an ideal cluster the purity value is 1 because it contains documents from a single category. The higher the purity value better [2],[9] the quality of clusters. Fig. 2 shows the purity analysis of the proposed algorithm on Reuters document dataset [21]. From the figure the purity value of DEKLSI algorithm is better than DKMeans algorithm without semantics.

IV.2. Dataset Description Two document data sets, listed in Table I, have been used in the evaluation. 20NG document dataset [22] contains newsgroup articles from 20NewsGroup [NG] on a variety of topics including politics, computers etc. The Reuters document dataset [23] contains news paper articles and have been widely used for evaluating clustering algorithms. The experimental results for the performance analysis of the algorithm on Reuters document dataset is given.

Classes

Terms

Average Class Size

18828

20

28553

1217

Reuters

1504

13

2886

131

Description

Documents

20NG

Fig. 2. Purity Analysis of DEKLSI algorithm [Reuters]

Source

Data

TABLE I DATASET DESCRIPTION

20news18828 Reuters21578

Newsgroup post Newsgroup post

IV.5. Entropy This metrics is used to evaluate the distribution of categories or classes in a given cluster. The entropy of cluster Ci with size ni is defined as: (7)

IV.3. Evaluation Metrics There are a variety evaluation metrics in order to evaluate the performance [2]-[5], [9] of clustering algorithms. The quality of a clustering result is evaluated using two evaluation measures-Purity and F-measure. The accuracy of the clustering algorithm is calculated using Entropy and Speedup.

where c is the total number of categories in the dataset and nih is the number of documents from the hth class that were assigned to cluster Ci. The average entropy of the overall solution is defined to be the weighted sum of the individual entropy value of each cluster, that is: (8)

IV.4. Purity This metrics is used to evaluate whether the documents in a cluster are from a single category. It is used to evaluate the degree of coherence of the documents in a cluster. Purity of Cj is formally defined as:


Here smaller the entropy value, better the quality [2], [8] of clusters. Fig. 3 gives the entropy analysis of the given algorithm on different nodes. Entropy analysis gives the overall distribution of categories in a given cluster and it is better for the proposed DEKLSI algorithm. International Review on Computers and Software, Vol. 8, N. 10

2549


Fig. 4. F-measure Analysis of DEKLSI algorithm [Reuters]

Fig. 3. Entropy Analysis of DEKLSI algorithm [Reuters]

IV.6. F-Measure The F-measure is a harmonic combination of the precision and recall values [2] used in information retrieval. The precision P(i, j) and recall R(i, j) of each cluster j for each class I is calculated. If ni is the number of members of the class i, nj is the number of members of the cluster j, and nij is the number of members of the class i in the cluster j, then P(i, j) and R(i, j) can be defined as: (9) Fig. 5. Speedup Analysis of DEKLSI algorithm [Reuters]

(10)

IV.8. Performance Analysis

corresponding


DEKLSI

1 5 10 15 20 25

DKMeans

The relative increase in speed of one algorithm [4] over the other is called speedup. It is defined as the time taken by the clustering algorithm without using semantics and using semantics. Fig. 5 shows that the speedup of the proposed algorithm using semantics is improved when compared with algorithm without semantics.

DEKLSI

IV.7. Speedup

DKMeans

TABLE II ACCURACY AND CLUSTERING QUALITY OF DEKLSI ALGORITHM [REUTERS] Clustering Quality Clustering Accuracy Purity F-measure Entropy Speedup Nodes

In general, the larger the F-measure is, the better [4], [9] is the clustering result. Fig. 4 shows that the F-measure value of DEKLSI algorithm is improved when compared with the document clustering algorithm without semantics and therefore the accuracy of the algorithm is improved.

DEKLSI

(11)

Table II gives the performance of the proposed DEKLSI algorithm considering the accuracy and clustering quality. This is compared with the Distributed K-means (DKMeans) algorithm without LSI. Table II shows that the clustering quality and accuracy is improved when considering the relationship between terms and documents using latent semantic indexing technique.

DKMeans

the

DEKLSI

above equations is defined as:

DKMeans

From the F‐measure

0.55 0.53 0.51 0.43 0.41 0.39

0.57 0.58 0.55 0.53 0.59 0.61

0.25 0.23 0.2 0.19 0.16 0.18

0.3 0.29 0.32 0.31 0.33 0.34

0.5 0.53 0.54 0.59 0.65 0.71

0.45 0.46 0.47 0.52 0.56 0.67

1.4 5.2 7 10 13 16

2 5.5 10 14 20 26


2550


V.

Conclusion

In this paper, an enhanced distributed text document clustering based on semantics is introduced. Documents are represented as a vector using tf×idf scheme. Distributed documents are clustered using a distributed enhanced K-means clustering algorithm to generate document clusters. This algorithm is more effective in generating lower dimensional term-document matrices. In order to improve accuracy and cluster quality the relationship between the term and document is determined using the technique called Latent Semantic Indexing (LSI). Distributed Enhanced K-means algorithm when used long with LSI improves the accuracy and clustering quality. The scalability of the algorithm can be evaluated with the increase in number of nodes and document size in future. The speedup of the algorithm for increased dataset size has to be improved. Inorder to improve the speedup, this work can be extended by analyzing the performance of the algorithm based on Mapreduce programming model and Hadoop framework.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12]

[13]

[15]

[16]

[17]

[18] [19] [20]

[21]

[22] [23]

Authors' information

References [1]

[14]

Symposium on Knowledge Acquisition and Modeling, pp. 352355,2010. Zhongju Deng, Wei Song, Xuefeng Zheng, P2PKMM:A Hybrid Clustering Algorithm over P2P Network, Third International Symposium on Intelligent Information Technology and Security Informatics,pp.450-454,2010. Sridevi U.K, Nagaveni N, Semantically Enhanced Document Clustering Based on PSO Algorithm, European journal of Scientific Research,Vol.57,n.3,pp.485-493,2011. M.F. Porter, An algorithm for suffix stripping, Program electronic library and information systems, Vol. 14, Iss.3, pp.130 - 137, 1980. G. Salton, A. Wong, and C. S. Yang, A vector space model for automatic Indexing, Communications of the, ACM, Vol.18, pp: 613–620,1975. King, B. 1967,Step-wise clustering procedures, J. Am. Stat. Assoc., Vol.62, n.317,pp. 86–101,1967. Barbara Rosario, Latent Semantic Indexing: An overview, Final Paper, Infosys 240, Spring 2000. Jianxiong Yang,Watada.J,Decomposition of term-document matrix for cluster analysis,IEEE International Conference on Fuzzy Systems,pp.976-983,2011. A. Amine, Z. Elberrichi, M. Simonet, WordNet-Based Text Clustering Methods: Evaluation and Comparative Study, IRECOS, Vol.8, n.8, 2013. http://people.csail.imit.edu/jrennie/20Newsgroups/ D.D. Lewis, Reuters-21578 text categorization test collection distribution 1.0 http://www.research.att.com /lewis,1999.

Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques (Morgan Kaufmann Publishers, 2006). Anna Huang, Similarity Measures for Text Document Clustering, Proceedings of the New Zealand Computer Science Research Student Conference, pp.49-56,2008. Neethi Narayanan, J.E. Judith, J. Jayakumari, Enhanced distributed document clustering algorithm using different similarity measures ,IEEE Conference on Information & Communications Technologies(ICT),pp.545-550,2013. Khaled M.Hammouda, Mohamed S.Kamel, Hierarchically distributed peer-to-peer document clustering and cluster summarization, IEEE transaction on Knowledge and Data Engineering, Vol.21 , n.5,pp.681-698,2009. Souptik Datta, K. Bhaduri, Chris Giannella, Ran Wolff and Hillol Kargupta, Distributed Data Mining in Peer-to-Peer Networks , IEEE Internet Computing, pp.1-8,2006. H.-C Hasio and C.T King, Similarity Discovery in structured P2P overlays, International Conference on Parallel Processing,pp.636-644,2003. S.Datta, C.R. Giannella and H. Kargupta, Approximate distributed k-means clustering over P2P network, IEEE transaction on Knowledge and Data Engineering, Vol.21,n.10, pp.13721388,October 2009. O. Papapetrou, W. Siberski, and W. Nejdl, Decentralized Probabilistic Text Clustering, IEEE transaction on Knowledge and Data Engineering ,Vol.24, n.10, pp.1848-1861,October 2012. J.E. Judith, J. Jayakumari, Performance Evaluation of an effective hybrid distributed document clustering algorithm , European Journal of Scientific Research, Vol. 86,n.2, pp.283297,September 2012. M. Eisenhardt, W. Muller and A. Henrich, Classifying documents by distributed P2P clustering, in INFORMATIK,pp.286-291,2003. Eshref Januzaj, Hans-Peter Kriegel and Martin Pfeifle, Towards Effective and Efficient Distributed Clustering, Workshop on Clustering large Data Sets,2003. Chang Liu, Song-Nian Yu and Qiang Guo, Distributed document clustering for search engine, Proceedings of the International Conference on Wavelet Analysis and Pattern Recognition, pp.454459,2009. Qing He, Tingting Li, Fuzhen Zhuang, Zhongzhi Shi, Frequent Term based Peer-to-Peer Text clustering, International


Judith J. E. received her Bachelor in Engineering (B.E.) degree in Computer Science and Engineering in 2003 from Manonmaniam Sundaranar University and also received her Master in Engineering (M.E.) degree in Computer Science and Engineering in 2006 from Karunya University. At extant she is working as an Assistant Professor from the Department of Computer Science and Engineering at Noorul Islam University, Kumaracoil, and pursuing her Doctorate in Data Mining at Noorul Islam University, Kumaracoil, India. Dr. Jayakumari J. is working as a Professor and is the Head of the Department of Electronics & Communication Engineering, Noorul Islam University, Kanyakumari District, Tamil Nadu. She received her B.E. degree in Electronics & Communication Engineering from M.S. University, Tirunelveli in 1994 and M.Tech degree in Applied Electronics and Instrumentation from University of Kerala in 1998 and Ph.D degree in Electronics and communication Engineering from University of Kerala in 2009. She has teaching experience of 16.6 years, research experience of 9 years and administrative experience of 12 years. She was the Head of Dept. of Electronics & Communication, C.S.I. Institute of Technology, Thovalai, Kanya kumari district (2000-2009). She have published several papers in International journals. Her research interest includes Wireless communication and Networking, Signal and image processing, Detection & Estimation Theory, Spread Spectrum Systems and Error Correcting Codes. She is a life member of Institution of Electronics and Telecommunication Engineers (IETE) , Indian Society for Technical Education (ISTE),Institution of Engineers(India) and Senior Member of Institution of Electrical and Electronics Engineers (IEEE).


2551

1828-6011(201310)8:10;1-9 Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved

Computers and Software

Computers and Software

Suggest Documents