Proceedings of the 2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
http://lrs-annaba.net/ICNAS2015
Proceedings Editor Dr Djellali Hayet, University of Annaba
Proceedings Co-Editor Dr Mohamed Amine Ferrag, University of Guelma
Legal deposit: 2541-2013 ISBN: 978-9931-9142-0-4 (Distributed by National Library of Algeria) Copyright©2015 - Networks and Systems Laboratory www.lrs-annaba.net
I
Editor ICNAS 2015, the second edition The second edition of International Conference on Networking and Advanced Systems ICNAS 2015 is a scientific event that provides a remarkable opportunity for the academic communities to address new challenges and share solutions, and discuss future research directions. ICNAS 2015 is organized by the Networks and Systems Laboratory LRS . It will take place at the Badji Mokhtar University in Annaba from May 6-7, 2015. The main objectives of ICNAS 2015 are to bring together members of the networking community from academia, to discuss recent advances in the broad and quickly evolving fields of computer and communication networks and develop visions for the networking domain. The students will present state of the art research results, address new challenges and discuss trends in Computer Networks and Security, Artificial Intelligence, Pattern Recognition and their applications in diverse field. We can proudly emphasis not only the number and the origin of papers submitted to the conference as well as the exceptional quality. In this second edition ICNAS has broadened its program and topics to encompass emerging research on Networking, artificial intelligence and pattern recognition. We would like to take this opportunity to thank our sponsor for their valuable support: Badji Mokhtar University, easy chair and LRS Laboratory. We also express our highest appreciation for all who contributed to the success of this conference: reviewers, conference chairs, and the different committees members for their dedication and hard work.
II
Preface The computer networks security: a major challenge The computer networks security is a growing challenge due to rapid technological developments and the natural increase of risks that can be resulted. It is therefore an important topic to promote the development of trade in all areas. From a technical perspective, security encompasses both access to information on desktops, servers and the data transmission network. Internet is the tool that allows all computers to communicate regardless of their type. Security mechanisms can nevertheless cause discomfort for the users and the instructions and rules become increasingly complicated as and as the network expands. Thus, the computer networks security enters in the overall security of the Information System (IS) of a company. More specifically, the computer networks security consists to respect the procedures in human, technical and organizational level. Therefore, IT security must be studied in such a way to not prevent users from developing uses that they need, and to ensure that they can use the information system with complete confidence. The objective is therefore to protect the company network and guard against all kinds of risks that can degrade performance. Today, the company is increasingly forced to open its computer network to the outside, for answering in fact to several issues:
Opening the Information System to partners and customers, Mobility of consultants, Network connection Smartphone’s, laptops and personal USB flash drives of employees.
Only by opening up, the company is exposed to more risks that can come from:
Bad use of the network by employees, A malicious person whose intention is to force enterprise applications, An aimed attack to steal sensitive company data, A virus, which spreads throughout the information system via the network, whose mission may be the open of the access to hackers or the degradation of the Information System, ...
That is why it is necessary to define a security policy, where its implementation is to ensure that the (material and/or software) resources for an organization are used only for the intended environment. The IT security generally aims: - The insurance that the data are indeed those that are believed to be, it is the integrity which consists to determine if the data have not been altered during communication; - The insurance that only authorized individuals have access to the resources exchanged, it is the confidentiality; - The maintaining the good functioning of the information system, it is the availability of which the objective is to ensure access to a service or to resources;
III
- The guarantee that a transaction cannot be denied, it is the concept of the nonrepudiation; - The insurance of the identity of a user, and the permission to access to resources only to authorized persons, it is the authentication. The problems related to computer network security show that known and mastered technologies allow today to guarantee the safety of the transmission of information on public and private infrastructures. But to succeed, we must also earn the trust of users. The IT security of the company is based on a good knowledge of the rules by employees through training and awareness among users, but it must go further and cover the following fields:
A physical and logical security device, adapted to the business needs and user uses; A management procedure updates; A disaster recovery plan; A well-planned backup strategy; A documented system to date;
Several companies have decided to specialize in computer network security by offering secure internet solutions for businesses. If you are an individual or a company having sensitive data, do not neglect the security of your network! During this second edition of ICNAS'2015 organized by the Networking and Systems Laboratory (LRS) of Annaba University, opportunities are available to researchers and young researchers to propose their ideas and contributions on networking and management and systems that revolve around this technology. The talking points also address the challenges of synchronization in wireless sensor networks. Pr LASKRI Mohamed Tayeb
IV
Research Topics In fact, the areas of interest ICNAS 2015 are valid with a cross between different research themes. Through this conference, the scientific frontiers are pushed and multidisciplinary investigations allow researchers and young researchers to propose their ideas and contributions. The conference deals with topics relating to networking, artificial intelligence and pattern recognition that revolve around this technology.
V
Committees
Honorary Chair
Pr. Ammar HAIAHEM (Rector of Badji Mokhtar University)
Honorary Co-Chair
Dr. Abdelaziz BAAZIZ (Head of Computer Science Department)
Conference Chair
Pr. N. Ghoualmi-Zine (Head of Networks and Systems Laboratory)
Conference Co-Chair Dr. Hafidi Mohamed, University of Annaba
Program Committee Chair Dr. Mahnane Lamia, University of Annaba
Proceedings Editor Dr. Djellali Hayet, University of Annaba
Proceedings Co-Editor Dr. Mohamed Amine Ferrag, University of Guelma
Sponsors Chair Dr. Mehdi Nafaa, University of Annaba M.A.A Hassina Bensefia, University of Annaba
Accommodation Dr. Guessoum Souad, University of Annaba M.A.A Radia Amirouche, University of Annaba M.A.A Benchalel Amir, University of Annaba M.A.A Debeche Feriel, University of Annaba
Organization Committee – Doctoral Students Amara korba Abdelaziz Ahmim Marwa Bendjeddou Amira Hadji Houssem Khelifi Meriem Mechtri Leila
VI
Naamane Sara Tounsi Abdelkader
Web Chair Dr. Mohamed Amine Ferrag, University of Guelma
Organization Committee Pr. Ghoualmi-Zine Nacira, University of Annaba Dr. Ahmed Ahmim, University of Algiers 3 Dr. Djamel Bektache, University of Souk Ahras Dr. Derdour Makhlouf, University of Tebessa Dr. Djellali Hayet, University of Annaba Dr. Mohamed Amine Ferrag, University of Guelma Dr. Guessoum Souad, University of Annaba Dr. Hafidi Mohamed, University of Annaba Dr. Mahnane Lamia, University of Annaba Dr. Mehdi Nafaa, University of Annaba M.A.A Radia Amirouche, University of Annaba M.A.A Debeche Feriel, University of Annaba M.A.A Benchalel Amir, University of Annaba M.A.A Bensafia Hassina, University of Annaba
TECHNICAL PROGRAM COMMITTEE Ghoualmi-Zine Nacira University of Annaba Aissani Nassima University of Oran Amine Abdelmalek University of Saida Araar Abdelaziz University of Science and Technology (AUST),UAE, Ajman Babahenini Mohamed Chaouki University of Biskra Babes Malika University of Annaba Baghdad Atmani University of Oran Barigou Fatiha University of Oran Beghdad Rachid University of Bejaia Bekrar Abdelghani University of Valenciennes, France Belabbas Yagoubi University of Oran Belbachir Hafida University of Oran Bellatreche Ladjel LIAS/ENSMA,France Ben Yahia Sadok Faculty of Sciences, Tunis Benblidia Nadjia University of Blida Benabderrahmane Sidahmed INRIA Rennes, France Benabderrahmane Sidahmed University of Paris 8, France
VII
Benaissa Moussa University of Oran Benferhat Salem University of Artois, France Benmohammed Mohammed University of Constantine 2 Benslimane Sidi Mohamed University of Sidi Bel Abbes Bernardino Jorge ISEC Polytechnic Institute of Coimbra, Portugal Bilami Azeddine University of Batna Bouabdellah Kechar University of Oran Bouamrane Karim University of Oran Bouderah Brahim University of M’sila Bouhadada Tahar University of Annaba Boukerram Abdellah University of Béjaia Boussaid Omar University of LYON 2, france Brahimi Abderrazak University of Mostaganem Challal Yacine Université de Technologie de Compiègne, France Chantal Soulé-Dupuy
Institut de Recherche en Informatique de Toulouse – Univ. Toulouse 1 Capitole, France Chaoui Allaoua University of Constantine Chickh Mohammed Amine University of Tlemcen Chikhi Salim University of Constantine 2 Congduc Pham University of Pau, france Cuzzocrea Alfredo
Debbat Fati Derdour Makhlouf Derhab Abdelouahid Djellali Hayet Djenouri Djamel Djoudi Mahieddine El Abbassia Deba Elberrichi Zakaria Ghalem Belalem Gelareh Shahin Ghanemi Salim Guessoum Souad Guillot Philippe Haffaf Hafid Hafidi Mohamed Hai Wang Hamdadou Djamila Hamou Reda Mohamed Harous Saad Kaddour Mejdi
The Institute of High Performance Computing and Networking of the Italian National Research Council, Italy University of Mascara University of Tebessa King Saud University (KSU), Saudi Arabia University of Annaba CERIST, Algers University of Poitiers, France University of Oran University of Sidi Bel Abbes University of Oran University of Artois, LGI2A, France University of Annaba University of Annaba University of Paris 8 University of Oran University of Annaba University of Saint Mary’s, Canada University of Oran University of Saida University of UAE University of Oran
VIII
Kamel Nadjet Kazar Okba Khelfi Mohamed Fayçal Kholladi Kheiredine Kouahla Mohamed Nadjib Lafifi Yacine Lafourcade Pascal Laouar Reda Laskri Mohamed Tayeb Lebbah Yahia Loukil Lakhdar Mahmoudi Saïd UMONS Mahnane Lamia Merouani Hayet Farida Meziane Hassina Meziane Abdelkrim Mimoun Malki Mohamed Ben Ali Yamina Moussa Ahmed Moussaoui Abdelouahab Nafa Mehdi Onaindia Eva Roose Philippe Seridi Hamid Sekhri Larbi Taghezout Noria Tlili Yamina Wan H Hassan MJIIT, Zeghib Nadia
University of Sétif 1 University of Biskra University of Oran University of Oued Souf University of Guelma University of Guelma University of Grenoble, France Univerity of Tebessa University of Annaba University of Oran University of Oran Faculté Polytechnique, Belgique University of Annaba University of Annaba University of Oran CERIST, Alger University of Sidi Bel Abbes University of Annaba University of Abdelmalek Essaadi, Morroco University of Sétif 1 University of Annaba University of Politecnica de Valencia, Spain IUT de Bayonne, France University of Guelma University of Oran University of Oran University of Annaba UTM KL, Malaysia University of Constantine 2
2nd International Conference on Networking and Advanced Systems (ICNAS 2015) Annaba, Algeria, May 6-7, 2015
Final Program Day 1 : 6 MAY 2015
08 :00 08 :30 9:00
Registration Opening Coference Chairs: O. Boussaid & M. Malki & S. Ghanemi Tutorial 1: Time Synchronization in Wireless Sensor Networks
Dr : Djamel DJENOURI (CERIST Research Center, Algiers, Algeria) Chairs: H. Haffaf & A. Baghdad & R. Laouar Session 1 : Secured Cloud Computing 9 :45
10:05 10:25
ACS – Advanced Cloud Simulator: A Discrete Event Based Simulator for Cloud Computing Environments. S. Sadi & Y. Belabbas An MDA approach to secure access to data on cloud using implicit security. Y. Ghebghoub, S. Oukid & O. Boussaid A Novel Access Control Model for Securing Cloud API. K. Bendiab, M. Benmohammed & M. Batouche
Chairs: F. Merouani & D. Hamdadou & A. Moussaoui Session 2 : Pattern Recognition 1 10:45
11:05
11:25
11 : 40
Off-line handwritten signature verification using variants of local binary patterns Y. Serdouk, H. Nemmour & Y.Chibani Neutron Image Segmentation: Morphological Processing, Prediction of Fluid Flows inside Opaque Metallic Systems Sekhri Arezki, D. Hamdadou & B. Beldjilali Distributed Video Coding of the Chroma Components for the Capsule Endoscopy images D.E. Boudechiche, S. Benierbah & M. Khamadja Satellite Images Analysis with Symbolic Time Series: A Case Study of the Algerian Littoral D. Attaf & S. Benabderrahmane , D.Hamdadou & A. Lafrid
Chairs: M. Babes & S. Chikhi & H. Meziane Session 3 : Intelligent Systems 13 :30 13 :50 14 :10
14 :30
System of systems Modeling: An hypergraph approach H. Haffaf Calcul du skyline par réduction de l’espace candidat: nouveaux résultats. Zekri Lougmiri & H. Belaicha Using the Fractional Model Reference for Tracking Trajectory in Adaptive Control Y. Bensafia , Khettab Khatir & S. Ladaci Improved Cuckoo Search Algorithm for DNA Fragment Assembly Problem W. Kartous & S. Chikhi
Chairs: L. Sakhri & Y. Challal & A. Boukerram Session 4 : Wireless and Networks Security 15:15
15:35 15:55
16 :15
16:35
Smart Monitoring for Semantic Wireless (SCADA/DCS) Systems with Semantic SNMP Protocol N. Sahli , E.B. Bourennane & M. Benmohammed A Hierarchical Fault Tolerant Routing Protocol for WSNs Y. Djebaili, A.Bilami & A. Bourmada A New Bio-Inspired Technique of Artificial Social Cockroaches for Spam Detection with Visual Result Mining. H.A. Bouarara, R.M. Hamou, Abdelmalek Amine, M.E. Rahmani & A. Rahmani BIARV: Bio-Inspired Approach for Routing in Vehicular Ad Hoc Networks. Sayad Lamri, D.Aissani & L. Bouallouche Application layer versus IP Multicast in Manet M. Doudou, L. Mahnane & M. Nafaa
Day 2 :7 MAY 2015 9 :00
Chairs: A. Bilami & N. Teghezout & E. Debat Tutorial 2: Big Data Analytics : Mort ou Nouvelle vie des Systèmes d’Information Décisionnels Pr. Omar Boussaid (university of France) Chairs: Y.Mohamed Benali & B. Bouderah & A. Brahmi Session 5 : Semantic Web and Decisional Systems
9:45
A new approach for Multiple Criteria Collaborative Decision Making Process (MCCDM): A combined use between AHP and WEB DSS I. Bessedik & N. Taghezout Towards a new agent based approach for modeling the business rules processes in small and medium enterprise in Algeria Nawal Sad Houari and Noria Taghezout ADOM: Arabic Dataset for Ontology Matching. A. Khiat & M. Benaissa
10:05
10:25
Chairs: T. Bouhadada & L. Loukil & N. Kamel Session 6 : Educational Technology 10: 45 11: 05 11:25
12:00
E-Tutoring Using Text Mining Technology for Higher Education F. Bouarab-Dahmani Recommending relevant GitHub repositories: a collaborative-filtering approach M. Guendouz, Abdelmalek Amine & R. M. Hamou Adaptive Atomicity in Web Services Composition. Z. Mahfoud & N. Nouali-Taboudjemat
Closing conference
IX
Table of contents Time Synchronization in Wireless Sensor Networks Djamel Djenouri , Cerist Research Center, Algiers, Algeria
1
Big Data Analytics : Mort ou Nouvelle vie des Systèmes d’Information Décisionnels Pr. Omar Boussaid, Laboratoire ERIC, Département Informatique et Statistiques, Université Lyon 2
2
Smart Monitoring for Semantic Wireless (SCADA/DCS) Systems with Semantic SNMP Protocol. Sahli Nabil, El-Bay El-Bay Bourennane, Benmohammed Mohamed.
4
ACS Advanced Cloud Simulator: A Discrete Event Based Simulator for Cloud Computing Environments. Samy Sadi, Belabbas Yagoubi.
11
CP ORBAC to secure access to data on cloud using implicit security Yasmina Ghebghoub, Saliha Oukid, Omar Boussaid
17
A New Bio-Inspired Technique of Artificial Social Cockroaches for Spam Detection with Visual Result Mining Hadj Ahmed Bouarara, Reda Mohamed Hamou, Mohamed Elhadi Rahmani, Abdelmalek Amine, Amine Rahmani
22
BIARV: Bio-Inspired Approach for Routing in Vehicular Ad Hoc Networks SAYAD Lamri, AISSANI Djamil, BOUALLOUCHE-MEDJKOUNE Louiza
29
Recommending relevant GitHub repositories: a collaborative-filtering approach Mohamed Guendouz, Abdelmalek Amine, Reda Mohamed Hamou
34
Hypergraph for System of Systems modeling Haffaf Hafid
38
Application layer versus IP Multicast in Manet Mounia Doudou, Lamia Mahnane, Mehdi Nafaa
44
Adaptive Atomicity in Web Services Composition. Zohra Mahfoud, Nadia Nouali-Taboudjemat
50
A Hierarchical Fault Tolerant Routing Protocol for WSNs.
56
Yasmine Djebaili, Amal Bourmada ,Azeddine Bilami
A new approach for Multiple Criteria Collaborative Decision Making Process (MCCDM): A combined use between AHP and WEB DSS. Bessedik Imene, Taghezout Noria.
61
A Novel Access Control Model for Securing Cloud API. Bendiab Keltoum, Benmohammed Mohamed, Batouche Mohamed
67
X
Off-line handwritten signature verification using variants of local binary patterns Yasmine Serdouk, Hassiba Nemmour, Youcef Chibani
75
Neutron Image Segmentation: Morphological Processing, Prediction of Fluid Flows inside Opaque Metallic Systems Sekhri Arezki, Djamila Hamdadou, Beldjilali Bouziane
80
Distributed Video Coding of the Chroma Components for the Capsule Endoscopy images Djamel Eddine Boudechiche, Said Benierbah, Mohammed Khamadja
86
Symbolic Representation of Satellite Images Time Series using SAX Dalila Attaf, Sidahmed Benabderrahmane, Djamila Hamdadou, Aicha Lafrid
90
CAD System for Breast Masses classification using Improved S3VM Learning Algorthim and Heteregenous Features: Application on Mammogram Images Zemmal Nawel, Nabiha Azizi
96
Using the Fractional Model Reference for Tracking Trajectory in Adaptive Control Bensafia Yacine, Khettab Khatir, Ladaci Samir
104
Towards a new agent based approach for modeling the business rules processes in small and medium enterprise in Algeria Sad Houari Nawal, Taghezout Noria
111
Improved Cuckoo Search Algorithm for DNA Fragment Assembly Problem Widad Kartous, Salim Chikhi
117
ADOM: Arabic Dataset for Ontology Matching Abderrahmane Khiat, Moussa Benaissa
122
Calcul du skyline par réduction de l’espace candidat: nouveaux resultats. Zekri Lougmiri, Belaicha Hadjer.
129
E-Tutoring Using Text Mining Technology for Higher Education. Farida Bouarab-Dahmani.
135
Author Index
141
Keynote #1
Time Synchronization in Wireless Sensor Networks Dr. Djamel Djenouri, CERIST Research Center, Algiers, Algeria The challenging problem of time synchronization in wireless sensor networks (WSN) will be addressed. In the first part of the talk, some general concepts and problem statement will be presented, along with a detailed state-of-art on the related literature. More focus will be to given practical challenges related to the implementation in sensor motes. The solutions that we recently proposed and developed will be presented in the second part. Most of these solutions use the receiver-to-receiver principle introduced by the Reference Broadcast Synchronization (RBS), which reduces the time-critical path compared to the sender-toreceiver approach. They are also distributed and use point-to-point relative synchronization. Both local (single-hop) and multi-hop are considered; where the latter is general proposed as a smooth extension based final local estimates, with no forwarding of synchronization signals. Maximum likelihood estimators (MLE) to estimate relative skew/offset for channels with Gaussian distributed delays, as well as Exponential delays, are also presented, along with the Cramer-Rao lower bounds (CRLB) for numerically comparison with the MLE’s mean square error (MSE). The solutions have been evaluated by simulation and compared with the appropriate state-of-the-art. More importantly, Some have been implemented and tested on real motes. Results confirm micro-second level precision, and long term stability when using the skew/offset model. The challenges we faced during the implementation will be debated.
Biography: Djamel Djenouri obtained his PhD in computer science from the University of Science and Technology (USTHB), Algiers, Algeria, in 2007, under supervision of Prof Nadjib Badache. During his PhD he visited John Moors University in Liverpool, UK, where he carried out collaborative work with researchers of the “Distributed Multimedia Systems and security” group. From 2008 to 2009 he was granted a post-doctoral fellowship from the European Research Consortium on Informatics and Mathematics (ERCIM), and he worked at the Norwegian university of Science and Technology (NTNU), in Trondheim, Norway, where he participated in the MELODY project supported by the Norwegian Research Council. Currently, Dr Djamel Djenouri is a permanent full-time researcher with CERIST research centre in Algiers. His researches focus on ad hoc and sensor networking, especially on the following topics: quality of service, security, power management, routing protocols, MAC protocols, fault tolerance, sensor and actuator networks, and vehicular applications. Dr Djamel Djenouri participated in many international conferences. He published several papers in international peer-reviewed journals and conference proceedings, and two books. He is a professional member of the ACM, and chaired workshops held in conjunction with DCOSS 2010-2012 and GlobCom 2010-2012. He also served as TPC member of many international conferences such as IEEE LCN (2009-2012), IEEE GlobCom 2013, IEEE ICUMT (2009-2012), IEEE ISABEL (2009-2012) etc., and he has been reviewer for many international Journals, including many IEEE transactions. In 2008, Djamel Djenouri was granted the best publication award from ANDRU, supported by the Algerian government, and the CERIST best researcher awards in 2010. Dr Djamel Djenouri current works on several aspects related to wireless sensor networks in the ongoing projects he manages, with more focus on applications for vehicular traffic management, integration with RFID, and internet of things for projects.
1
Keynote #2
Big Data Analytics : Mort ou Nouvelle vie des Systèmes d’Information Décisionnels Pr. Omar Boussaid, Laboratoire ERIC, Département Informatique et Statistiques, Institut de Communication, Université Lyon 2
L’avènement du Big data provoque plusieurs défis dans la recherche scientifique et technologique. Au delà du buzz autour de ce phénomène surexploité médiatiquement et économiquement, de véritables verrous scientifiques sont posés dans un contexte où la donnée devient le centre de préoccupations des chercheurs de plusieurs communautés. Data science (ou les sciences de la donnée) sont aujourd’hui un nouveau domaine de recherche, sur lequel se penchent des chercheurs de disciplines diverses. Les entreprises sont aujourd’hui à la recherche de la « nouvelle pépite » qu’est la data scientist. Pour illustrer ces nouvelles tendances, quelques repères et problèmes scientifiques sont présentés pour illustrer quelques nouvelles pistes de recherche. A travers le croisement de la BI (Business Intelligence) et le Big date, un panorama de travaux de recherche est présenté montrant les préoccupations actuelles des chercheurs.
Biography : Omar Boussaid est professeur des universités en informatique à l’université Lumière Lyon 2. Ses travaux de recherche portent sur l’entreposage et l’analyse en ligne des données complexes. Ces derniers et l’analyse sémantique (OLAP Sémantique) représentent certains des axes de ses travaux actuels. La modélisation multidimensionnelle des données textuelles et leur analyse à travers les cubes de graphes de réseaux sociaux et la détection des communautés sont des exemples d’intérêts scientifiques sur lesquels reposent actuellement son travail d’animation et d’encadrement scientifique. Ses recherches actuelles concernent le Data warehousing and mining, les entrepôts XML, le Couplage de l’OLAP avec le data mining et la recherche d’information, l’OLAP social avec la détection et l’analyse des communautés, les entrepôts et l’OLAP distribués utilisant le paradigme MapReduce et les bases de données NoSQL pour créer des cubes OLAP dans l’environnement cloud.
2
Topic : Computer Networks and Security
Smart Monitoring for Semantic Wireless (SCADA/DCS) Systems with SNMP Protocol
ACS - Advanced Cloud Simulator: A Discrete Event Based Simulator for Cloud Computing Environments
CP ORBAC to secure access to data on cloud using implicit security
A New Bio-Inspired Technique of Artificial Social Cockroaches for Spam Detection with Visual Result Mining
BIARV: Bio-Inspired Approach for Routing in Vehicular Ad Hoc Networks
Recommending relevant GitHub repositories: a collaborative-filtering approach
Hypergraph for System of Systems modeling
Application layer versus IP Multicast in Manet
Adaptive Atomicity in Web Services Composition
A Hierarchical Fault Tolerant Routing Protocol for WSNs
A new approach for Multiple Criteria Collaborative Decision Making Process (MCCDM) : A combined use between AHP and WEB DSS
A Novel Access Control Model for Securing Cloud API
3
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Smart Monitoring for Semantic Wireless (SCADA/DCS) Systems with SNMP Protocol Sahli Nabil1, El-Bay Bourennane2, BenMohammed Mohamed3 1
Le2i Laboratory Burgundy University France BP 47870 21078 Dijon cedex France and LIRE Laboratory Constantine 2 University SONELGAZ GROUP & AIG Association Algeria,
[email protected] or
[email protected] 2 Le2i Laboratory Burgundy University, BP 47870 21078 Dijon cedex France,
[email protected] 3 LIRE Laboratory Constantine 2 University, Computational Department Algeria,
[email protected]
Abstract — in this paper we propose a new implementation of SNMP protocol through (WS) and SOAP protocol, for wireless semantic (SCADA/DCS) systems and (IT-SCADA) platform monitoring. (WS) and SOAP may run on top of secure transport services implemented through protocols like HTTPS combined with industrial protocols as (DNP3, SOAP, ISM BAND, and ZIGBEE). We chose ZIGBBE wireless protocol for implementing our semantic security (IT-SCADA) management platform with semantic SNMP protocol via WS. To detect and infiltrate wireless radio network and semantic attacks, the use of (WS) gateways is needed to include SNMP devices into (WS) based management architecture. The evaluation shows that (WS) gateways created with SNMP and SOAP resolve a lot of new semantic attacks and vulnerabilities. We integrate security mechanism in the header of (SNMP/SOAP/ZIGBEE) frame; we obtained new semantic security intelligent monitoring protocol named (SNMP/SOAP/ZIGBEE/SECURITY). We work for implementing our solution in SONELGAZ group Algeria, SCADA and DCS systems.
a new visions of semantic intelligent control systems, in witch web resources are improve with machine processable metadata that describes their meaning and new semantic (SCADA/DCS) application became a semantic web application, with the use of web standard as (XML, RDF, RDFS, OWL, OWL-S, SWRL,…etc.) and ontology’s. The semantic wireless (SCADA/DCS) to be secure we need to ensure that all of the layers of the semantic web embedded applications were secure, includes (secure XML, secure application layer protocols and secure wireless communication protocol) as presented in Figure 2.
Keywords: Semantic Wireless (SCADA/DCS); Web Services (WS), Semantic SNMP Protocol, (IT-SCADA) platform.
I. INTRODUCTION (SCADA/DCS) systems use IT devices to control physical processes in critical industrial platforms, because it can be spread over large distances, wired connections become infeasible with the use of wireless devices and protocols. Commercially available wireless radios are often used in place of wires to connect network nodes, an example of wireless semantic (SCADA/DCS) system used in SONELGAZ Group Algeria, presented in Figure 1.
Figure2. Multiple layers in the semantic application [1] Besides all these procedures, the whole (WS) architecture composed by other auxiliary infrastructures to improve the (WS) applicability presented in Figure3. All communication in (SOA – Service Oriented Architecture), used SOAP protocol, and all (WS) published in (UDDI- Universal Description Discovery and Integration) register.
Figure 1. SONELGAZ Group Algeria SCADA system Today new generation of (SCADA/DCS) systems exist in the market semantic intelligent SCADA and semantic intelligent DCS system, we named semantic (SCADA/DCS) system, because they have the same vulnerabilities and security solution, where embedded semantic application in intelligent devices used in these systems. The semantic wireless (SCADA/DCS) systems are
ISBN number: 978-9931-9142-0-4
4
Figure. 3 SOA- Web Services architecture
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
These trends have moved the semantic wireless (SCADA/DCS) networks and smart grid networks [2]. From proprietary, closed networks to the arena of information technology IT with theirs cost and benefits and IT security challenges. It is simple to fool a field device to take a dangerous command in the semantic wireless (SCADA/DCS) systems, cyber hackers as (script kiddies, hackers, organized crime, disgruntled insiders, competitors, deploy terrorists, activists , eco-terrorists,…etc) can sophisticated intrusion detection and cyber monitoring web services.
smart devices as (RTU- Remote Terminal Unit /PLCProgrammable Logic Control) create new big semantic vulnerabilities in wireless (automation and control systems) [3] and theirs wireless protocols. Used by hackers and terrorist for creating very dangerous damage and explosion as presented in Figure 5.
We present in Figure 4 the semantic wireless (SCADA/DCS) system diagram.
Figure 5. Samples of dangerous hacker’s damage The intelligent and security management of computer networks has been facing interesting and challenging problems for years. The diversity of network devices leads the standardization bodies to define management protocols that are able to provide a reduced number of device management interfaces. The definition of the SNMP (Simple Network Management Protocol) [4] was a key step towards this direction. Then, network players and users started to build their intelligent security monitoring solutions based on SNMP and multi agent paradigm, with the embedded SNMP Agent in intelligent (SCADA/DCS) devices, which improved the knowledge and semantic security , about the managed networks as (IT-SCADA) platforms and their behaviors as presented in Figure 6.
Figure 6. Network Management with intelligent SNMP
II.WIRELESS (IT-SCADA) PLATFORMS
Figure 4. The semantic wireless (SCADA/DCS) diagram The use of web services embedded in semantic wireless (SCADA/DCS) industries creates very big new vulnerabilities in these systems and theirs wireless connectivity protocols as ZIGBEE and ISM Band. The use of web services embedded in semantic (SCADA/DCS)
ISBN number: 978-9931-9142-0-4
5
Critical infrastructures are those systems that if disrupted or destroyed will cause wide spread loss of essential services to a nation’s citizens. The loss of these infrastructures could even possibly lead to the destruction of the nation itself. Semantic wireless (SCADA/DCS) provides management with real-time data on production improves plant and personnel safety and reduces costs of operation. Benefits are made possible through standard hardware and software components, improved communications protocols and increased connectivity to
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
outside networks. This efficiency comes at a price; the price is increased vulnerability to the system through internal and external sources from (IT-SCADA) platform.
A lot of wireless vulnerabilities and hacker’s mechanisms exist, used for destroying semantic wireless (SCADA/DCS), as presented in Figure 8, with the use of network identifiers as Smartphone numbers and IP address.
The new IT networks connected to extranet networks and internet , where we named (IT-SCADA) platforms [5], typically use embedded semantic web services and XML database make a very big security problems, add to old one [5,6, 7, 8, 9, 10]. We present in Figure 7 the new (IT-SCADA) platforms connected wireless semantic (SCADA/DCS) to internet.
Figure 8. Hacker’s mechanism for semantic wireless (SCADA/DCS) systems attacks
III. WEB SERVICES AND SNMP GATEWAYS Web Services (WS) can be broadly described as a set of technologies that allow transactions over the internet. Each (WS) is a piece of software that, using internet protocols (e.g. HTTP, SMTP, (SSL/TLS) and FTP), receives service invocations from clients. (WS) is composed by a set of operations exposed to external clients. The clients of (WS) can be, for example, Web browsers, end-user applications or even other (WS). In this last case, one (WS) can request operations from other (WS), allowing this way and hierarchy of (WS) requests. Also, it allows the creation of very complex (WS) based on the invocation of simpler ones, besides all these procedures. The (WS) description through WSDL and theirs registration in UDDI register are not addressed in the work done so far. Yoon-Jung Oh et al. [12] define SNMP to XML gateways and three methods for interactive translations: DOM-based translation, HTTP-based translation, and SOAP-based translation. In the DOM-based translation an XML-based manager calls a DOM interface that resides in the gateway. Such call is then translated to SNMP operations between the gateway and the target device. With the HTTP-based translation the gateways receive XPath and XQuery expressions coded by an XML-based manager. Such expressions are then translated to SNMP requests. This method of translation is especially interesting because information filtering can be executed directly in the gateway, reducing the management traffic between the XML-based manager and the gateway, although a processing overhead is introduced. Finally, in the SOAPbased translation the gateway exports more sophisticated services accessed by the XML-based manager. With these
Figure 7. Modern (IT-SCADA) platform connected wireless semantic (SCADA/DCS) to internet [11]
ISBN number: 978-9931-9142-0-4
6
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
services the manager can look up information with XPath or proceed with complex queries through XQuery expressions.
IV. PROTOCOL SNMP TO WS GATEWAYS In our approach, two levels of SNMP to WS gateways are supported: protocol-level and object-level gateways. First, the protocol-level gateway [13, 14] directly maps SNMPv1 primitives to (WS) operations (e.g. Get, GetNext, and Set). An object-level gateway, on its turn, offers operations that reflect the structure of the management information. In this case, a (GetIfTable) is an example of an operation offered by an object-level gateway. Currently, both protocol-level and object-level gateways do not provide support for SNMP traps to (WS) translation because we are primarily interested in observing Get, (GetNext) and Set. The protocol-level SNMP to WS gateway provides operations that are direct mappings of SNMPv1 primitives. In Figure 9, we present the general picture when a WS-based manager uses such gateway.
semantic (SCADA/DCS) system, we embedded (SOAP/ZIGBEE/UDP) frame proposed in our publication [15] in the body of SNMP frame and we integrated our security mechanisms in the SNMP header, as presented in Figure 10.
SNMP Header Our Security Mechanisms
(SOAP/ZIBGBEE/UDP) Frame
Figure 10. (SNMP/SOAP/ZIGBEE/SECURITY) protocol frame We used security token for authentification and authorization integrated in the (SNMP/SOAP) header, «Username Token » composed by (Username/password, Username/digest), and binary token with an X509 certificate. We crypt and signed the global header information with our mixed coordinates ECC cryptography [3, 11] and a hash function as presented in Figure 11. (SNMP/SOAP): Header With Security tags as And Security token We crypt and signed header with mixed coordinates ECC cryptography and a hash function
Figure 9. (SNMP/SOAP) gateway to (WS) for (IT-SCADA) semantic security management
Figure 11. (SNMP/SOAP/ZIGBEE/SECURITY) security protocol header frame mechanism
A WS-based manager requests management information accessing the SNMP to WS gateway through SOAP messages. In our prototype implementation, the SOAP messages run over either HTTP or HTTPS. This allows the management station to be a standard Web browser that accesses a (HTTP/HTTPS) Web server. Such server (that contains the gateway) receives from the WS-based manager the identification of the operation to be accessed (e.g. Get or Set), and a list of SNMP-related parameters (the address of a target device, an SNMP valid community, and an SNMP OID). With this information, the appropriate operation within the (WS) gateway is invoked, and the target device is accessed using SNMP in order to proceed with the management primitive requested. We have implemented the Get, Get Next, and Set services that generate, for each WS based manager request, one exact SNMP request from the gateway to the target device, and one exact SNMP reply from the target device back to the gateway. After the SNMP information is retrieved from the target device, the gateway compiles such information into a SOAP message and sends it back to the WS-based manager. In our semantic security solution for
ISBN number: 978-9931-9142-0-4
(SNMP/SOAP) Body
7
In semantic (SCADA/DCS) transport communication we used UDP protocol to support HTTP protocol, web technology as (Web Services, XML…etc.) and we appropriate transport mechanism for (6lowpan) designed for IPv6 to produce efficient MAC frames. Where IPv6 protocol based stack for 802.15.4 network standard. We resume the new proposed (SNMP/SOAP/ZIGBEE /SECURITY) protocol stack in Figure 12. (SNMP/ SOAP /ZIGBEE) Network Management
Security
UDP IPv6
6LowPAN Adaptation for IPv6 802.15.4 MAC 802.15.4 PHY Figure 12. The new Security Semantic (SNMP/SOAP/ZIGBEE/SECURITY) Stack
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
V. SNMP PROTOCOL SECURITY MECHANISM Security issues are still open, although the SNMPv3 [16] has been created to address such issues, configuration management is still a big problem (network administrators do not rely on SNMP to manage the devices configuration, although the IETF organization has been working on the SNMP for configuration) [17]. Other network configuration-related work is being developed by the IETF working group [18], where the use of XML and related protocols, in substitution of SNMP, is investigated. The gamma of available tools to deal with (WS) and the native support for (WS) in software development platforms are greater than the available facilities in the SNMP world. Also, (WS) are implemented through SOAP [19], a protocol whose security issues were taken into account since its beginning. SOAP itself may run on top of secure transport services implemented through protocols like HTTPS [20, 21], encrypted SMTP, and Secure-FTP. Added to that, while SNMP is almost only used in the communication between management stations and network devices (despite the SNMPv3 efforts), WS could be used to improve the communication between different management platforms or even to build complex management systems based on simpler management services.
other critical processes, such as semantic (SCADA/DCS) systems. Although this picture may be quite fascinating, the introduction of (WS) in network management and (ITSCADA) platform is not for free. Performance issues on the managed intelligent devices and the consumption of (ITSCADA) network bandwidth with the management traffic imposed by the use of (WS, SOAP) may prevent an effective intelligent management solution. Also, it is not proper to believe that (WS) would be deployed along all the elements of the whole (IT-SCADA) network management process. For example, SNMP-based devices surely will still remain in future networks. The (SNMP/SOAP/ ZIGBEE/ Security) gateway implementation, and (IT-SCADA) platform management to (WS) conception, presented in Figures (14 and 15).
Figure 14. (IT-SCADA) platform semantic security Security mechanism integrated in SNMP version3 presented in Figure 13. management with (SNMP/SOAP/ZIGBEE/Security) gateway
Figure 15. (SNMP/SOAP/ZIGBEE/Security) gateway to (WS) implementation An example of (SNMP – MIB) implementation in our security solution presented in Figure 16.
Figure 16. Our (SNMP-MIB) implementation Figure 13. Security mechanisms in SNMP version 3 protocol [16] Probably, one of the most interesting gains obtained with (WS) is the easier integration of network management with
ISBN number: 978-9931-9142-0-4
8
(WS) and SOAP, we named (WS/SOAP) will be more often found near to the network administrator interface, while SNMP will be obviously found at the device interface presented in Figure 17. One important issue arises here:
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
where, in the future management systems, will we find the line between (WS) and SOAP, the established management technologies as SNMP protocol.
XML document generated by the (smidump) tool prior to the parsing step. At the same time the parsing step creates the code for the new (WS), it also builds a WSDL document that describes the created (WS). The WS-based manager that requested the creation of a new (WS) optionally informs the URL of a UDDI repository where the created (WS) is then registered. The WSDL document is stored in a standard directory as well. With the gateway creation process, new MIBs can be easily added to a WS-based management environment. With this mechanism we create security semantic gateway (SNMP/SOAP/ZIGBEE/SECURITY), for security semantic intelligent management used (WS) and semantic security mechanism.
Figure 17. SNMP to (WS) and semantic Security This approach, however, looses flexibility when the target device SNMP agent is changed (either to support new SNMP objects or to remove the support of other objects). In this case, the associated (WS) need to be rebuilt in order to reflect the SNMP agent changes. This way, we need an efficient way to create object-level SNMP to (WS/SOAP) gateways. We have implemented systems that, given a (SMI MIB) file, automatically create a new (WS) used SOAP message [22] for data communication.
We proposed in our global security solution a new security (IT-SCADA) platform composed by six (06) levels (level 0 to level 5), with the use of a semantic security protocols framework [15] and security bloc [3, 11]. The new (IT-SCADA) platform, composed by control zone (SCADA/DCS) with level 0 until level 3, (DMZ- Demilitarized Zone), composed by public servers and services, and enterprise zone with level 4 and level 5, composed by (ERP – Enterprise Resources Planning)) and IT applications, as presented in Figure 19.
In Figure 18, we present the architecture that supports the creation of new object-level SNMP to (WS/SOAP) gateways, with MIB file to XML conversion.
Figure 18. Creating new object-level SNMP to (WS/SOAP) gateway Each node in the original MIB tree is transformed in operations of the new generated (WS). These operations are instrumented with code able to contact, via (SNMP/SOAP/ZIGBEE/SECURITY), a target device. The target device and its associated (SNMP/SOAP/ZIGBEE/SECURITY) string are treated, within the instrumented code, as parameters whose values will be further provided when the operation is invoked by the WS-based manager. The just created (WS) is stored in a standard directory in the Web server and available to be invoked just after its creation, and all data used SOAP format between server and remote intelligent devices. The original MIB provided by the manager is also stored in another standard directory for documentation purpose, as well as the intermediate
ISBN number: 978-9931-9142-0-4
9
Figure 19. New security wireless semantic (IT-SCADA) platform We present in Figure 20, new security semantic (SCADA/DCS) stack, we proposed for semantic wireless security (SCADA/DCS) applications, composed by five
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
(05) levels, level 1 (secure TCP/IP with IPv6, HTTPS and secure sockets), level 2 with XML security, level 3 with RDF security, level 4 secure ontology’s and level 5 (logic, Proof, Trust), for end to end semantic security, in semantic (SCADA/DCS) systems and (IT-SCADA) platforms.
Figure 20. New security semantic (SCADA/DCS) applications and software’s stack
VI. CONCLUSION In this paper we investigate two main issues, the integration of established management technologies with (WS) and SOAP, we named (WS/SOAP), and the impact of using (WS) in the managed network associated to SNMP with the use of semantic gateways associated to semantic security block. Gateways are a requirement for the integration of the established management technologies and (WS). Since we believe that, for more effective gains, (WS/SOAP) have to be offered as soon as possible in a bottom-up perspective of the network management software hierarchy, we have developed SNMP to (WS) , ZIGBEE gateways and security semantic protocol version, we named (SNMP/SOAP/ZIGBEE/SECURITY) , intended to be placed closer to network smart devices. Planning an SNMP to WS gateway, however, involves some important decisions that can impact the performance, bandwidth consumption, and (WS) deployment. The protocol-level gateways are quite simple and provide the mapping of SNMP primitives to (WS) operations. We proposed a new semantic security solution for (IT-SCADA) platforms connected to internet and semantic (SCADA/DCS) systems, a new semantic security stack, and a new (IT-SCADA) topology. In our future work we like to analyze the solution performance (bandwidth, energy) and implementing solution in real material in SONELGAZ group, Algeria.
REFERENCES [1]Devendra Kumar Sloni, V.K.Sharma, Safe semantic web and security aspect implication for social networking, International journal of computer applications in engineering sciences , vol ISSN: 2231-4946, I, Issue II, June, 2011, pp 141-149. [2]Smart Grid Cyber Security Strategy and Requirements, NISTIR 7628. www.nist.gov/smartgrid 2014. [3] N.Sahli, M.Benmohammed, “Security solution for semantic SCADA optimized by ECC cryptography mixed coordinates“, IEEE ICITeS 2 end international conferences Sousse Tunisia, ISBN: 978-2-4673-1167-0, 2012, pp 230-235. IEEE Explorer
ISBN number: 978-9931-9142-0-4
10
[4]R. Neisse, L. Z. Granville, M. J. Almeida, and L. Tarouco. A Dynamic SNMP to XML Proxy Solution. 8th IFIP/IEEE International Symposium on Integrated Network Management (IM 2003), pages 481–484, March 2003. [5]Sahli Nabil, Benmohammed Mohamed and El-Bay Bourennane, Secure solution for modern semantic SCADA, JST9, SONATRACH, 7-10 April 2013, Oran, Algeria, 2013. [6]K.Khan, “Security Characterization and Compositional Analysis for Component-based Software Systems”, PhD thesis, Monash University, April, 2005. [7]P.Lindstrom, “Attacking and Defending Web Services”, a Spire Research Repport, January, 2004. [8]S.Faut, “SOAP Web Services Attacks: Are you web applications vulnerable”, SPI Dynamics, 2003. [9]J.Mirkovic, “D-WARD: Source-End Defence Against Distributed Denial-of-Services Attacks”, The Phd thesis, University of California, 2003. [10]Amit Klein. Blind XPath Injection, http://www.sanctuminc.com/pdfc/ WhitePaper_Blind_XPath_Injection_20040518.pdf [11]Sahli Nabil, Benmohammed Mohamed and El-Bay Bourennane, Ontology and protocol secure for SCADA, International Journal Meta Data Semantic Ontology (IJMSO), Inderscience Etd, (ISSN: 1744-2621), vol 9, N° 02, 2014, pp 114127. [12] Y. J. Oh, H. T. Ju, M. J. Choi, and J. W. Hong. Interaction Translation Methods for XML/SNMP Gateway. 13h IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2002), pages 54–65, October 2002. [13]F. Strauss. libsmi - A Library to Access SMI MIB Information, August 2003. http://www.ibr.cs.tu-bs.de/projects/libsmi/. [14]J. Sch¨onw¨alder, A. Pras, and J. P. Martin-Flatin. On the Future of Internet Management Technologies. IEEE CommunicationsMagazine, Vol. 41, No. 10, pages 90–97, October 2003. [15] Sahli Nabil , Benmohammed Mohamed and El-Bay Bourennane, Security Solution for Semantic Wireless (SCADA/DCS) Systems, Second International Conference on Distributed Systems and Decision, ICDSD’2014, ISSN 23351012, Oran, Algeria , December 07-08, 2014. [16]M. MacFaden, D. Partain, J. Saperia, and W.Tackabury. Configuring Networks and Devices with Simple Network Management Protocol (SNMP), April 2003. IETF RFC 3512. [17] Network Configuration. netconf Working Group, 2003. Available at: http://www.ietf.org/html.charters/netconf-charter.html. IETF [18] F. Curbera,M. Duftler, R. Khalaf,W. Nagy, N.Mukhi, and S.Weerawarana. Unraveling the Web Services Web: An Introduction to SOAP, WSDL, and UDDI. IEEE Internet Computing, Vol. 6, Issue 2, pages 86–93, March/April 2002. [19] G. Goth. Grid Services Architecture Plan GainingMomentum. IEEE Internet Computing, Vol. 6, Issue 4, pages 11–12, July/August 2002. [20] A. Preece and S. Decker. Intelligent Web Services. IEEE Intelligent Systems, Vol. 17, Issue 1, pages 15–17, January/February 2002. [21] K. Moore. On the use of HTTP as a Substrate, February 2002. IETF RFC 3205. [22] N. Mitra. SOAP Version 1.2 Part 0: Primer, June 2003. W3C Recommendation. http://www.w3.org/TR/2003/REC-soap12-part0-20030624/
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
ACS - Advanced Cloud Simulator: A Discrete Event Based Simulator for Cloud Computing Environments Samy Sadi
Belabbas Yagoubi
University of Oran1 Ahmed Benbella Department of Computer Science Oran, Algeria Email:
[email protected]
University of Oran1 Ahmed Benbella Department of Computer Science Oran, Algeria Email:
[email protected] their proposals in a repeatable and controllable fashion without the need to invest in a highly costly environment. Many Cloud simulation tools have been presented in the literature. While a lot of Cloud aspects have been addressed, a lot of them still require from researchers a lot of development effort in order to evaluate their proposals.
Abstract—Computing, storage and networking resources have never been so accessible since the emergence of Cloud Computing. Users can easily deploy their files and applications, and access services in an unprecedented way. But as Cloud Computing is getting more attention, new challenges arise and more research efforts must be devoted in order to bring new solutions. But, evaluating these solutions under real test beds can be very harsh. In contrast, Cloud simulation is a handy way to achieve evaluation goals. In this paper, we present ACS - Advanced Cloud Simulator, an open source discrete event based Cloud simulator that enables simulation of different aspects of the cloud including: resource provisioning algorithms, load balancing algorithms, fault tolerance and checkpointing techniques, migration methods and different STaaS (Storage as a Service) aspects.
In this paper, we present ACS - Advanced Cloud Simulator [3], an open source discrete event based simulator that enables easy simulation and evaluation of different aspects of the Cloud. The organization of this paper is as follows. In the next section, we give an overview of ACS including our objectives during the development stage and ACS’s features. In section III, we present ACS architecture before we describe its implementation in section IV. In the penultimate section, we review existing Cloud simulation tools. Finally, in section VI, we conclude and give an overview of our future work.
Keywords—Cloud Computing, Cloud Simulation, Performance Evaluation, Load Balancing, Fault Tolerance, Checkpoiting, Migration, Energy Efficiency, STaaS (Storage as a Service), Storage Cloud.
I.
I NTRODUCTION
II.
The Cloud Computing paradigm [1], [2] has gained, mostly since the last few years, a very high interest and a high degree of prominence from both the industry and the research community. Thanks to its numerous attractive functionalities like the ease of application deployment, the elastic resource provisioning scheme, the transparently granted quality of service and a low running cost, the hype surrounding Cloud Computing has never been so marked. Nevertheless, this paradigm is still in its youth and is facing many issues. To mention just a few: data and application security, fault tolerance and energy efficiency.
Cloud Simulation is a convenient and, to some extent, a reliable and trustworthy way to achieve evaluation goals in research projects. These two traits highlight the main goals during the development of ACS. In addition, commonly pursued software and simulators’ characteristics are also sought, including: efficiency, extensibility, portability, reproducibility of tests and fidelity and accuracy [4]. As regards to its features, ACS aims to provide all common Cloud Computing environments functionalities. This is indeed not a simple task as Cloud Computing is a perpetually evolving paradigm. Nevertheless, ACS relies on its extensibility to deal with new functionality. Below are described the main features of ACS:
An in depth research to address one of these concerns ineluctably involves a testing phase where the brought solution is evaluated by means of various criteria. This testing phase is often a challenging task. Indeed, building an adequately sized test bed requires a high financial and time investment. On the other hand, using existing test beds, like Amazon EC2, does not grant results reliability and reproducibility. Such Clouds being public, whole Cloud reservation for the sake of an experiment is not an option. Thus, other applications may interfere. Moreover, some performance criteria may not be accessible in such Clouds. Cloud simulation has emerged as a natural solution to overcome evaluation difficulties when using real test beds. Simulation allows easily studying of multiple scenarios with different level of complexity. Cloud researchers can evaluate ISBN number: 978-9931-9142-0-4
OVERVIEW OF THE S IMULATOR
11
•
Evaluation of different resource provisioning algorithms (Computing, Storage and Networking);
•
Networking simulation and support for different network topologies;
•
Simulation of failure prone data centers and evaluation of fault tolerance techniques;
•
Evaluation of power saving strategies;
•
Evaluation of different Virtual Machine placement, consolidation and isolation techniques;
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
•
Evaluation of load balancing algorithms;
•
Evaluation of STaaS (STorage as a Service) models regarding files placement, replication and consistency management;
•
Evaluation of SLA (Service Level Agreement) infringements;
•
A full traceability of many simulation parameters including: power consumption, resources utilization and operating costs;
•
Easy extension and configuration using configuration files. III.
around to tell that that “the given hardware is used by such processes”. The above statement has one exception and does not apply for the Simulator entity which is the very first ancestor of all entities, and has no parent entity itself. This entity is responsible for scheduling and running events in an ordered manner starting from the oldest events. Each entity in the simulator can have an attached configuration. A configuration is a set of name-value pairs usually read from a configuration file. It contains the simulation input variables including the list of algorithms to use in the simulation. ACS defines an easy way for entities to communicate between each other by using notifications. A notification is defined by a notification code and optionally other notification data. When an entity wants to “listen” for a notification, it registers a new NotificationListener on the target entity. The target entity holds a separate list of listeners for each notification code, and notifies them when needed.
A RCHITECTURE OF THE S IMULATOR
ACS is a discrete event based simulator [5] whose architecture is composed by multiple layers. Each layer extends the functionality provided by the lower layers. In this section, we present those layers starting by the lowest level core layer and up to the highest layer. The whole simulator architecture is illustrated in Fig.1.
Probes and traces are used to gather the simulation output. A probe keeps track of changes regarding any kind of information and holds at any moment of the simulation the actual value of that information. A trace is used to gather probe value change history during all the simulation delay, and contains methods to save this information. B. Hardware Layer
Fig. 1.
This layer mainly contains the definition of common Cloud physical components. Its architecture is depicted in Fig.3.
ACS Layered Architecture
A. Core Layer This layer contains all core functionality of the simulator. Its main components are presented in Fig.2.
Fig. 3.
Fig. 2.
The networking feature of the simulator is particularly noticeable in this layer. Indeed, it is an important feature in the Cloud and so it does in ACS. The NetworkDevice component defines a networkable device. It may have one or more interfaces connected using a specific link. The RoutingProtocol component is responsible for finding a route given two different network devices. The Switch and Host components are the main sub-components of the NetworkDevice component.
ACS Core Layer Architecture
One of the most important components in the presented architecture is surely the simulation entity, which has three main variants. The PoweredEntity describes an entity that can be powered on or powered off during the simulation. The FailureProneEntity describes an entity that can fail at a given moment of the simulation. And the RunnableEntity describes an entity that can run for a computable simulation delay.
Besides, the hardware layer defines for the Host component three other children components namely: the ProcessingUnit, the Ram and the Storage components. These two last components have two counterparts: the VirtualRam and the VirtualStorage components. Moreover, the RamZone and the StorageFile are defined and represent a collection of data that can be respectively stored on ram or on storage.
Entities are organized in a parent-child fashion. Each entity in the simulation has one parent entity and may have one or more children entities. This allows, for example, to tell that “the given process runs on such hardware” and the other way ISBN number: 978-9931-9142-0-4
ACS Hardware Layer Architecture
12
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
during the simulation. This enables the simulation of Federated Cloud Computing environments [6].
C. Virtualization Layer This layer contains the virtualization functionality of the simulator. Its architecture is displayed in Fig.4.
Fig. 4.
The VmPlacementPolicy component receives requests from users when deploying new VMs. This component is responsible of finding a host that satisfies users’ requirements especially regarding available resources. Besides, other objectives may be considered during placement, for instance the VMs consolidation [7] objective. The JobPlacementPolicy component handles user requests for jobs placement. When a user wants to start a new job, this component selects a VM that will host the job between all the user’s VMs.
ACS Virtualization Layer Architecture
The MigrationHandler component handles VM migration from one host to another. Migration is a key feature in Cloud Computing environments as it is used both for power management [8] and fault tolerance [9].
A VirtualMachine (VM) uses its parent host’s hardware in order to run one or more jobs. Different resources can be allocated for a VM, typically a set of network interfaces, a virtual storage, a virtual ram and one or more processing units. All of these should be located on the parent host of the VM. Although, VMs can access remote files located on storages on other hosts (for instance on a SAN storage), if those storages are accessible through the network.
The PowerManager component enables power optimization and power saving simulation. This component contains methods to power on / off a specific device. It also determines at any moment of the simulation if a particular host can be powered on or powered off.
The Job component holds a set of operations. An Operation is defined by a length and a running state. When started, an operation allocates its needed resources using the appropriate resource Provisioner among all available resources on the parent virtual machine. The operation stays then in a running state for enough simulation time until all its length is processed. There are three main operation types matching the three resource types: network, computing and storage.
The CheckpointingHandler component enables fault tolerance simulation. Checkpointing in Cloud Computing is the process of frequently saving a VM’s state in a secondary host so that when the primary host fails, the VM can be restored and continued in a transparent manner [10], [11]. ACS provides STaaS (Storage as a Service) [12] simulation abilities. It relies on four main components providing four distinct functionalities: placement, replication, consistency management and replica selection. The placement component is responsible of selecting a storage when saving a file on the Cloud. The replication component handles auto-replication of the saved file. The consistency component keeps all replicas of a given file consistent with each other. Finally, the replica selection component selects the active replica to be used when starting a new storage operation.
Resource sharing between multiple operations is handled by the provisioners. They define policies and use proper algorithms in order to distribute networking, computing and storage (read/write) capacity. Regarding the computing resource, and because more than one processing unit can be allocated per VM, the PuAllocator is used to determine which processing unit to use when starting a new computing operation. D. Service Layer
E. User Layer
This layer contains components for the simulation of most common Cloud Computing services. Its architecture is represented in Fig.5.
This layer contains two main components the User component and the ThinClient component. The User component defines a client of the Cloud. It can deploy VMs and save files on the Cloud. This component holds a list of owned VMs and owned storage files. The ThinClient component describes a user device that is used when accessing resources on the Cloud. Thin clients only provides networking abilities and does not provide other computing and storage abilities. These can be compared to web browsers which rely on web servers to provide functionality. F. Utility Layer
Fig. 5.
This layer provides utility methods, and auto-tools to setup and generate simulation scenarios.
ACS Service Layer Architecture
Different factories are provided to automatically and easily generate different type of simulation scenarios. They especially cover the generation of the Cloud infrastructure, the generation
The central component of this layer is the CloudProvider. It provides an infrastructure and a set of services to the users. More than one instance of this component could be available ISBN number: 978-9931-9142-0-4
13
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
cycle. Logged notifications are delayed and are effectively performed only in the next simulation cycle. This behavior has two main implications which greatly reduce simulation complexity. First, this allows deleting duplicated notifications in the same simulation cycle. Then, this helps preventing the simulator’s call stack from being overloaded.
of different types of workloads and the generation of failures and repairs. The Cloud infrastructure generation includes the generation of hosts, switches and the links connecting these devices. Different topologies can be generated including the flat and the hierarchical topologies. Other custom topologies can also be defined.
2) Networking: Networking simulation can be performed using two main methods: packet-level simulation or flow-level simulation [14]. While packet-level simulations are more accurate they also need more time and may need an unreasonable delay to complete when simulating complex topologies with numerous nodes. Therefore, and because very high simulation accuracy of the communications is not its primary objective, ACS uses a flow-level network simulation.
Workloads can be automatically generated by ACS given a configuration. A workload may contain computing, networking or storage operations but also user requests for deploying new VMs or for creating new files in the Cloud. Failures in Cloud environments are considered as the rule rather than exception [13]. ACS provides an easy way to inject failures as well as repairs using a failures factory. This factory relies on each device MTBF (mean time between failures) and MTBR (mean time between repairs) to generate failures and repairs.
Besides, the implementation defines no constraints regarding the network topology. In addition, custom routing protocols can be defined as well. 3) Other features: Many common sense assumptions are transparently handled by ACS. Such assumptions include, for instance, transparent pausing of VMs if their parent host fails or is powered off. Though, making a detailed description of all these features is out of the scope of this paper.
In addition to these factories, this layer contains other utility methods. For instance, this layer contains methods for generating numbers following a lot of common distributions. G. Tracing Layer
B. ACS extensibility
This layer contains probes definitions for different quantifiable measures.
Extensibility was one of the main concerns during the development of ACS. It is achieved through a clear distinction of ACS’s code into interfaces and implementations. Users can then write their own implementations and ask the simulator to use them by modifying the appropriate configuration files. When starting the simulation, the simulator reads the configuration files and determines which implementations to use, thanks to the java reflection API. Hence, the user’s code is loaded.
Measurable parameters in ACS include power consumption, different resource type utilization (computing, storage transfer rate and bandwidth), memory usage and also billable dollar cost. All the aforementioned parameters are accessible at different levels (job, VM, host, cloud or user level). For instance, power consumption can be computed for the whole Cloud or for one specific host.
C. Validity and performances tip
Regarding bandwidth usage, both upload and download bandwidth can be evaluated. Also, for a specific host, this layer provides capabilities to estimate bandwidth usage inside the Cloud infrastructure and outside the cloud (ie: outgoing communications to the internet). IV.
All main functionality of ACS has been validated using the JUnit Java unit testing framework. However, the testing did not cover simulation models validation which will be covered in future work. In the rest of this section, we run a proof of concept simulation using ACS and we discuss produced output. 1) Test bed: Next presented results have been performed on a 64bits system running on a dual core 2.1 GHz CPU machine with 3 GB of RAM.
I MPLEMENTATION
ACS was implemented using the Java programming language. Its implementation consists of over 400 classes totalizing more than 40 thousands lines of code.
2) Simulation Configuration: We have configured the simulator to generate 1000 hosts connected using a flat topology. The whole topology is connected to the internet using one 100Mbps link. Hosts’ configuration is uniformly chosen between four configurations as shown in Table I.
In this section, we present key implementation features of ACS. After that, we discuss the extensibility of the simulator. Finally, we examine the simulator’s validity and performances by providing a proof of concept simulation example.
TABLE I.
A. Key Implementation Features 1) Notifications: Notifications has already been defined in the previous section. It has been noted that notifications allowed entities to easily communicate, allowing more intuitiveness and modularity during the implementation. But besides this, notifications also enables more efficient simulations. To do so, ACS relies on a lazy-notification mode. This mode consists on logging all notifications in the current simulation ISBN number: 978-9931-9142-0-4
H OSTS C ONFIGURATIONS
Configs
Computing
Storage
Network
Host0
4*2.2GHz
3*4TB
4Gbps
Ram 8GB
Host1
8*3.5GHz
1*1TB
2Gbps
16GB
Host2
16*3.3GHz
2*1TB
2Gbps
32GB
Host3
24*2.7GHz
2*1TB
2Gbps
64GB
We generate on top of this infrastructure 100 users. Each user deploys 3 virtual machines using different configurations as depicted in Table II. 14
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria TABLE II. Configs
V MS C ONFIGURATIONS
CPU Cores
Storage
Ram
Price/hour
Vm0
2
40GB
3.5GB
0.007$
Vm1
4
320GB
7GB
0.014$
6) Total Bandwidth: Both outgoing and incoming bandwidth stays at a fixed rate (100Mbps) during all the simulation stage. This limitation is imposed by the link connecting the simulated Cloud to the internet. 7) Energy consumption: The default implementation of the power manager was used during the simulation. It was specified in the configuration that maximum power consumption by any host is 250 Watts when fully utilized. Given this value, ACS estimates power consumption of each host. The test used the model described in [15] to estimate energy consumption.
Additionally, each of these users submits up to 10 workloads. The user submits these workloads in a sequential order with a mean time of one minute between two submissions. These workloads can either contain a computing operation of a random length (up to one trillion of instructions length) or a file operation of a random length (up to 10GB). File operations include file creation, and file write/read operations. Read/write operations imply the utilization of both the storage resources and the network resources in order to transfer the data to/from the user.
Using the given configuration, the whole cloud energy consumption is linear and attains a total of 137kWh at the end of the simulation. This is due to the fact that powered-on hosts stay used until the end of the simulation as we did not specify any directive to delete created files or placed VMs. Thus, the power manager cannot power off hosts during the simulation. The maximum number of powered-on hosts is of 305 over 1000. It is attained at the 6th minute of the simulation and stays so until the end of the simulation.
3) Simulation Overall Performances: The memory foot print of the simulator did not exceed the maximum allocated heap size of 512MB. The simulator took 1min and 20secs to process all events. Last event was registered near the 155th minute of the simulation time.
8) Total Billed Price: ACS provides cost estimation capabilities during the simulation. Billed price for any user can be estimated at any moment of the simulation. Regarding tested configuration, the Cloud provider would charge its users a total of 190$ at the end of the simulation. Which corresponds to an average of 1.9$ per user which is the summed price for storage renting, bandwidth usage and VMs execution.
Workloads
4) Completed Workloads: A total of 550 workloads have been submitted giving an average of 5 workloads per user. 217 workloads contain computing operations, 182 contain file write operations and 151 contain file read operations.
Fig. 6.
600 500 400 300 200 100 0 0
V.
Cloud Simulation has been addressed a lot in the literature and a many tools have been proposed. We review, in this section, the most prominent ones.
25 50 75 100 125 150 175 Time (Minutes)
A. CloudSim
Number of completed workloads of a test simulation using ACS
CloudSim [16] is a discrete event based simulator for Cloud Computing environments and claims to be the first Cloud simulation tool.
As denoted by the Fig.6, computing operations were the first to finish. Then, other network operations finished in turn. This interpretation is confirmed in the next sections.
CloudSim was initially developed on top of an existing simulator for grids, namely GridSim [17]. A new layer has been added to add Cloud simulation capabilities. Though, many bugs appear in early versions of the simulator. Most of them were due to issues in the GridSim simulator. Hence, a lot of corrections have been done and led to a new design of CloudSim which didn’t rely on GridSim anymore.
Computing (GIPS)
5) Total Computing: The total computing power leveraged by the Cloud at different moments of the simulation is presented in Fig.7.
Fig. 7.
250 200 150 100 50 0 0
5
10 15 Time (Minutes)
Many related research projects worked on extending CloudSim. However, CloudSim did not shine by its extensibility as most of these extensions implied the reimplementation of an important part of CloudSim. One of the most popular extension is NetworkCloudSim [18]. It provides networking capabilities to CloudSim. Another popular extension is provided in [19] and provides energy aware Cloud simulations. One bad point regarding CloudSim extensions is that they cannot be combined. For instance, networking effect on energy efficiency cannot be simulated without developing a new extension to CloudSim.
20
Cloud leveraged Computing power of a test simulation using ACS
Maximum leveraged computing power is attained near the fourth minute where more than 222GIPS (Giga Instructions Per Second) were leveraged. Computing stops near the 18th minute as all workloads have been submitted. However, the simulation does not finish yet as there are still remaining networking operations which take more time to finish. ISBN number: 978-9931-9142-0-4
R EVIEW OF E XISTING C LOUD S IMULATION T OOLS
A graphical user interface has also been proposed for CloudSim in the name of CloudAnalyst [20]. 15
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
B. GreenCloud
[9]
GreenCloud [21] is a packet level Cloud simulator for energy efficiency evaluations of data centers. It has been developed as an extension to the Ns2 simulator [22]. GreenCloud is able to estimate energy consumption of different components of data-centers including hosts, switches and links. Moreover, different network topologies can be simulated. In contrast, this simulator shows slower performances compared to flow level network simulators.
[10]
[11]
C. GroudSim
[12]
GroudSim [23] is another discrete event based simulator aimed for the simulation of both grids and Cloud environments. It uses SimJava [24] as the underlying simulation framework. GroudSim mainly focuses on the simulation of IaaS (Infrastructure as a Service). Further Cloud services simulation are not implemented, but the authors claim that the simulator is easily extensible.
[13]
[14]
D. iCanCloud [15]
iCanCloud [25] has been proposed to model and simulate Cloud Computing systems. It has been built on top of the SIMCAN simulation framework [26]. The main features claimed by the authors regarding the simulator are its usability, flexibility, performance and scalability. The simulator has been successfully used to simulate different Amazon instance types. VI.
[16]
[17]
C ONCLUSION AND F UTURE W ORK
We have presented ACS - Advanced Cloud Simulator a new discrete event based simulator. Its features, its layered architecture and its implementation have been described.
[18]
As future work, we aim to validate the simulator by comparing it to real Cloud environments. We also aim to use to simulator to inspect different facets of the Cloud, especially the fault tolerance aspect.
[19]
R EFERENCES [1]
[2]
[3] [4]
[5] [6]
[7]
[8]
[20]
P. Mell and T. Grance, “The nist definition of cloud computing,” National Institute of Standards and Technology, vol. 53, no. 6, p. 50, 2009. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica et al., “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, 2010. (2014) Acs - advanced cloud simulator. [Online]. Available: https: //www.github.com/samysadi/acs B. H. Thacker, S. W. Doebling, F. M. Hemez, M. C. Anderson, J. E. Pepin, and E. A. Rodriguez, “Concepts of model verification and validation,” Los Alamos National Lab., Los Alamos, NM (US), Tech. Rep., 2004. B. L. Nelson, J. S. Carson, and J. Banks, Discrete event system simulation. Prentice hall, 2001. B. Rochwerger, D. Breitgand, D. Hadas, I. Llorente, R. Montero, P. Massonet, E. Levy, A. Galis, M. Villari, Y. Wolfsthal et al., “An architecture for federated cloud computing,” Cloud Computing, 2010. S. Lee, R. Panigrahy, V. Prabhakaran, V. Ramasubramanian, K. Talwar, L. Uyeda, and U. Wieder, “Validating heuristics for virtual machines consolidation,” Microsoft Research, MSR-TR-2011-9, 2011. A. Beloglazov and R. Buyya, “Energy efficient allocation of virtual machines in cloud data centers,” in Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on. IEEE, 2010, pp. 577–578.
ISBN number: 978-9931-9142-0-4
[21]
[22] [23]
[24] [25]
[26]
16
C. Engelmann, G. R. Vallee, T. Naughton, and S. L. Scott, “Proactive fault tolerance using preemptive migration,” in Parallel, Distributed and Network-based Processing, 2009 17th Euromicro International Conference on. IEEE, 2009, pp. 252–257. B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: High availability via asynchronous virtual machine replication,” in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation. San Francisco, 2008, pp. 161–174. S. Sadi and B. Yagoubi, “Improved fault tolerance through a multi-zone checkpointing approach,” in Proceeding of JEESI’14, Algiers, Algeria, 2014. J. Wu, L. Ping, X. Ge, Y. Wang, and J. Fu, “Cloud storage as the infrastructure of cloud computing,” in Intelligent Computing and Cognitive Informatics (ICICCI), 2010 International Conference on. IEEE, 2010, pp. 380–383. K. V. Vishwanath and N. Nagappan, “Characterizing cloud computing hardware reliability,” in Proceedings of the 1st ACM symposium on Cloud computing. ACM, 2010, pp. 193–204. K. Eger, T. Hoßfeld, A. Binzenh¨ofer, and G. Kunzmann, “Efficient simulation of large-scale p2p networks: packet-level vs. flow-level simulations,” in Proceedings of the second workshop on Use of P2P, GRID and agents for the development of content networks. ACM, 2007, pp. 9–16. X. Fan, W.-D. Weber, and L. A. Barroso, “Power provisioning for a warehouse-sized computer,” in ACM SIGARCH Computer Architecture News, vol. 35, no. 2. ACM, 2007, pp. 13–23. R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De Rose, and R. Buyya, “Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms,” Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, 2011. A. Sulistio, U. Cibej, S. Venugopal, B. Robic, and R. Buyya, “A toolkit for modelling and simulating data grids: an extension to gridsim,” Concurrency and Computation: Practice and Experience, vol. 20, no. 13, pp. 1591–1609, 2008. S. K. Garg and R. Buyya, “Networkcloudsim: Modelling parallel applications in cloud simulations,” in Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on. IEEE, 2011, pp. 105–113. A. Beloglazov and R. Buyya, “Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers,” Concurrency and Computation: Practice and Experience, vol. 24, no. 13, pp. 1397– 1420, 2012. B. Wickremasinghe, R. N. Calheiros, and R. Buyya, “Cloudanalyst: A cloudsim-based visual modeller for analysing cloud computing environments and applications,” in Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on. IEEE, 2010, pp. 446–452. D. Kliazovich, P. Bouvry, and S. U. Khan, “Greencloud: a packet-level simulator of energy-aware cloud computing data centers,” The Journal of Supercomputing, vol. 62, no. 3, pp. 1263–1283, 2012. (2010) The network simulator ns2. [Online]. Available: http: //www.isi.edu/nsnam/ns/ S. Ostermann, K. Plankensteiner, R. Prodan, and T. Fahringer, “Groudsim: an event-based simulation framework for computational grids and clouds,” in Euro-Par 2010 Parallel Processing Workshops. Springer, 2011, pp. 305–313. F. Howell and R. McNab, “Simjava: A discrete event simulation library for java,” Simulation Series, vol. 30, pp. 51–56, 1998. A. Nu˜nez, J. L. V´azquez-Poletti, A. C. Caminero, J. Carretero, and I. M. Llorente, “Design of a new cloud computing simulation platform,” in Computational Science and Its Applications-ICCSA 2011. Springer, 2011, pp. 582–593. A. N´un˜ ez, J. Fern´andez, J. D. Garcia, F. Garcia, and J. Carretero, “New techniques for simulating high performance mpi applications on large storage networks,” The Journal of Supercomputing, vol. 51, no. 1, pp. 40–57, 2010.
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
CP ORBAC to secure access to data on cloud using implicit security Yasmina Ghebghoub
Saliha Oukid
Omar Boussaid
Faculty of Science Saad Dahlab Blida University Email:
[email protected]
Faculty Of Science Saad Dahleb Blida University,Blida Email:
[email protected]
University of lyon France Email:
[email protected]
necessary for the job to be done. Permissions can be added or deleted if the privileges of a role are changed [16]. To address a problem of access control to data on cloud, we propose to enforce a security on a model proposed in [6] based on Organization Role Based Access Control (ORBAC) using implicit security that permits a system to create partitions of the encryption key used in the proposed model because the traditional (explicit) approach to securing data is based on a single server and it permits access upon the use of passwords. But, there is a tendency among users to keep passwords simple and memorable leading to the possibility of brute force attacks [1]. Conversely in cloud, there is a large quantity of data, a user may find it ineffective to generate data partitions and distribute them over the network, but the user may create partitions of the key and allocate them over the network. Our paper is organized as follows; Section 2, 3 introduce the main elements used in our proposition. Section 3 shows our security model. Section 4 gives the security analysis and our discussion. Finally, Section 5 gives the concluding remark of the whole paper.
Abstract—Cloud Computing has been developed to deliver information technologies services on demand for organizations or as individual users. In this paper, we describe an approach which proposes a security model bases on Organization Role Based Access Control (ORBAC) and encryption system. This model aims to give users a possibility to control security of their data .In this scheme, a secret key is partitioned using a Galois Field GF (2m ).
Keywords: security, access, ORBAC, encryption. I. I NTRODUCTION Although Cloud computing has been developed to reduce Information Technology (IT) expenses, and to provide agile IT services to individual users as well as organizations. It moves computing and data away from desktop and laptop computer into large data centers. However, this technology gives the opportunity for more innovation in lightweight smart devices, and it forms an innovative method of performing business. Securing data stored on distributed servers have a fundamental importance in cloud computing. On the other hand, cloud is still in its initial stages of development as it suffers from threats and vulnerabilities that prevent the users from trusting it. Various malicious activities from illegal users threaten this technology such as inflexible access control which is generally a policy or procedure that allows, denies or restricts access to a system. Access control policies define the subjects and the permissions in a computer system to enforce the security of an organization. One of the fundamental best practices in security is developing, deploying, reviewing, and enforcing security policies. Various access control models are in use including the most common Mandatory Access Control (MAC) refers to a type of access control by which the operating system constrains the ability of a subject or initiator to access or generally perform some sort of operation on an object or target. Discretionary Access Control (DAC) is used extensively in commercial applications, particularly in operating systems and relational database systems. The central idea of DAC is that the object owners, who is usually its creator, has discretionary authority over who else can access that object [15]. Role Based Access Control (RBAC) determines users access to the system based on the Job role. The users role is assigned to be basically based on the least privilege concept. The role is defined with the least amount of permissions or functionalities which are
ISBN number: 978-9931-9142-0-4
II. O RGANIZATION ROLE BASED ACCESS C ONTROL (ORBAC) The ORBAC model generalizes RBAC models and adds an Organizational dimension to politic where an organization is an entity that is responsible for managing a set of security rules (obligations, permissions, prohibitions) which allow to control access of user. According to RBAC, regulating access to computer or network resources based on the roles of individual users within an organization is a structured group of active entities in which subjects play specific roles. An activity is a group of one or more actions. A view is a group of one or more subjects, and context is a specific situation that conditions the validity of rules. The role entity is used to structure links between subjects and organization. Similarly, objects satisfying a common property are abstracted as views, and actions are abstracted as activities. ORBAC rules can express positive or negative authorizations and obligation [5]. We deal only with permission considering that the same reasoning applied in prohibitions and obligations. Security rules have the following form [3]: if an organization noted org,role noted r, action noted a, resource notedsand context notedcthenP ermission(org, r, a, s, c) means that the organization org grants the role r a permission to realize a on
17
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
an Access structure is a tuple(r, a, s, c), r is the role, a is the authorized actions,s is a resource and c is the request contextual information after he encrypts their data and stores it in data center storage to share on the cloud and they decrypt it. Moreover, each data consumer is administrated by a Control System. Our Access Structure is defined as following: Definition (Access Structure [10],[13],[14]) Let {P1,...... Pn } be a set of parties. A collection A ⊆ 2{P1,...P n} .. is monotone if ∀ B, C : if ?? ∈ A and B ⊆ C, ,then C ∈ A. An access structure (respectively, monotone access structure) is a collection (respectively, monotone collection). A of non empty subsets of {P1,...... Pn } . i.e., A ⊆ 2{P1,...P n} \ {}. The sets in A are called the authorized sets, and the sets not in A are called the unauthorized sets
Fig. 1. Proposed model based CP encryption using implicit security
In our case, attributes will play the role of parties and we will only deal with monotone access structure.
resource s the context c.
In this case, attributes play the role of parties and we deal only with monotone access structure. User: It is an entity who wants to access the data. If a user has a set of roles satisfying the access policy of the encrypted data, and it is not revoked in any valid attribute groups, then he will be able to decrypt the ciphertext and obtain the data. Data storing center: It is an entity that provides a data sharing service. Control and Encryption system: It is a responsible entity to control access to data and encryption. First it gives an authorization response as a tuple is permitted(r, a, s)where r is a roleais action and s is source. After encryption, system encrypts data and generates a secret key to data owner and user. In this step, the secret key is partitioned into two or more pieces and stored at randomly chosen places on the network that are known only to the data owner. Our system retrieves the partitions to decrypt data if and only if the rule is checked. Our key partitioning scheme uses polynomials inGalois Field (2m ).
∀r∀a∀s∀cP ermission(org, r, a, s, c)∧ Empowers(org, u, r) ∧ U se(org, s)∧ Consider(org, α, a)∧ def ine(org, r, α, s, c) → is permitted(r, a, s) “if the organizationorg , in the context c, grants permission to the role r to realize the action on the resource s, if org empowers subject u in role r, if org uses the resource s in the context c, if within the organization org context cis true between r, a and s, then the role has a permission to perform the action on s. Based on defined rules, a decision is inferred for is permitted(r, a, s). +III. CP ORBAC MODEL USING IMPLICIT SECURITY A. Proposed model This model implements a robust management system of access to data sources stored in a cloud based on Organization Role Base Access Control (ORBAC) model and Encryption. ORBAC is an access control model in which authorization is given to users depending on their role in an organization in a given context. To enhance the security of data in this model, we add CP-ABE. This solution allows to control access of users due to access structure proposed by data owner. This formula defines that the only authorized users are allowed to access the data sources after decryption of data because data will be encrypted by assigning for each resource a secret key. However, in our case there is a large amount of data, the user may find it inefficient to create data partitions and distribute them over the network, but he may wish to encrypt the data and store it on a single server which he trusts and keeps encryption secret key. Almost always, encryption keys are very large random numbers and cannot be memorized. Therefore, the user may create partitions of the key and spread them over the network. This approach may be more efficient, if not more secure, than creating data partitions of enormous amounts of data. The next figure shows our security model which consists of the following system entities: Data owner proposes an access structure to define authorized users where
ISBN number: 978-9931-9142-0-4
1) Key partitioning in Galois GF2m : is said to be irreducible if it cannot be factored into two or more polynomials each one with coefficients in GF(2) and each degree less than m. [1, 10, 11] Therefore, a data owner can generate a random key and our system partitions it into K partitions using the following procedure. It generates K random polynomials of any random degree and computes their product modulo the irreducible polynomial g(x). The resultant polynomial of degree m-1 is taken as the required key in its binary representation and the randomly generated polynomials are the partitions, so that we can think of the polynomials as bit strings corresponding to the coefficients that can only be 0 or 1, and each power of x represents a specific position in a bit string. Example 1: In order to generate a random 8 bit key and create three partitions of it , the system may proceed as follows.Let we will consider all polynomials defined over modulo the irreducible 3 polynomial g (x) = x + x+1 . it chooses two polynomials of degree m − 1 (at random). Such as p1 = x2 + x + 1 , p2 = x2 + 1 and computes there product. p1 (x) p2 (x) ≡ K (x) mod g (x), where K(x) is the key polynomial and the
18
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
coefficients are the binary representation of the key. x2 + x + 1 ∗ x2 + 1 mod x3 + x + 1
2) Comparison between ORBAC and RBAC: To evaluate our proposal based on ORBAC, we propose to compare it with an application based on RBAC which is usually used but it does not introduce the concept of context. Recall and precision measures will define the capacity of the access selection of both models to data sources as follows:
= (x4 + x3 + x + 1) mod (x3 + x + 1) = −x2 − x = x2 + x
Recall =
Therefore, the random key generated is k = 00011011 and the partitions are p1 = 00000111, p2 = 00000101 . 2) Cipher text Policy Attribute-Based Encryption using Key partitioning in Galoi.s GF2m : In this section, we show our proposed algorithm : Setup: this algorithm takes as input the security parameters A set of attributes A proposed by data owner on Access structure A secret key proposed by data owner. KeyGen : This algorithm takes as input the secret key proposed data owner and it generate random partitions of key using the following procedure: Generating K random polynomials of any random degree inGF (2m ).Computing their product modulo the irreducible polynomial g(x). Taking the resultant polynomial of degree m-1 is taken as the required key in its binary representation and the randomly generated polynomials are the partitions. Encrypt (SK,M,A) This is a randomized algorithm that takes as input a message, a set of attributes A and secret KeySK. It outputs the ciphertext CT. Decrypt (SK,A,CT) This algorithm takes as input the ciphertext CT that was decrypted under SK and a set of attributes A.it outputs M.
P recision =
numberof authorizedaccessincontext numberof ressources
F M easure = 2 ∗
Recall∗P resicion Recall+P recision
Fig. 2. Histogram represent difference between Recall ORBAC and Recall RBAC where context=”emergency”
IV. E XPERIMENTS
When comparing results of ORBAC model with those of RBAC model, we note that the use of the model ORBAC increases a protection of resources against random use by categorizing them in context and as a result the number of authorized access is increasing against RBAC model. We notice that the recall of system based on ORBAC model is higher compared with the system based on RBAC model which signifies that our system selects access more correctly and precisely by eliminating more unauthorized access. We
To validate our approach, we have developed a prototype in Java and conducted an experimental study. Our prototype is installed on virtual network. A. Experimental Setup To evaluate our proposal based on ORBAC, in one side we propose to compare it with an application based on RBAC which is usually used but it does not introduce the concept of context .In another side, we try to secure our model in implicit and explicit approaches. We decide to use both models evaluation measures recall and precision. 1) Test Data: We applied our model on an organization = hospital and a collection of about 1000 documents which are medical files. However, we consider a query where a doctor wants to access to the medical files in a case of emergency System returns a list of 600 with context=emergency that contains 200 documents with context=emergency and 400 documents with context = treating and we will assign actions (read, modify, delete, print, download) to the following roles (emergency doctor, treating doctor).On the other hand, we applied both models to other roles (nurse, administrative agent) which belongs to the hospital organization that have no access to resources and the result was the same that dont access.
ISBN number: 978-9931-9142-0-4
numberof authorizedaccessincontextc numberof ressourcesincontextc
Fig. 3. Histogram represent difference between Precision ORBAC and Precision RBAC where context=”emergency”
19
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria Files 10 15 20 25 30
Recall ORBAC 0,050 0,075 0,100 0,125 0,150
Precision Precision ORBAC RBAC 0,017 0,010 0,025 0,015 0,033 0,020 0,042 0,025 0,050 0,030 TABLE I R ESULTS OF R ECALL , P RECISION AND F MEASURE TO ORBAC AND RBAC
cases
Servers
1 2 3 4
10 50 100 150
Recall RBAC 0,017 0,025 0,033 0,042 0,050
attacked Servers 3 10 15 50
servers /partitions 2 5 10 25
Recall (implicit) 0,30 0,20 0,15 0,33 TABLE II
F-Measure ORBAC 0,050 0,075 0,100 0,125 0,150
F-Measure RBAC 0,030 0,045 0,060 0,075 0,090
MODELS WHERE C = EMERGENCY
Recall (implicit) 1,00 1,00 1,00 1,00
Precision (explicit) 0,67 0,50 0,60 0,50
Precision (implicit) 1,00 1,00 1,00 1,00
R ESULTS OF R ECALL , P RECISION IN IMPLICIT AND EXPLICIT APPROACHES .
notice that the precision of system based on the model ORBAC is higher compared with the system based on RBAC model RBAC which signifies that the model based on the model ORBAC selects fewer documents to users and allows to reduce unauthorized accesses to documents, thus it allows to increase the confidentiality of documents not concerned by the given contexts. 3) Comparison between implicit and explicit approaches: : To evaluate the efficacies based on ORBAC, we propose to compare it using implicit and explicit approaches. Recall and precision measures will define the efficiency of implicit architecture as follows: Fig. 5.
Recall =
Histogram represent difference between Precision (implicit) and
numberof attackedserversincludespartitionsof secretkey Precision (explicit) numberof servers
P recision =
numberof attackedserversincludespartitionsof secretkey numberof serversincludespartitionsof secretkey different
servers increases the he confidentiality and a security of a secret key and data. V. C ONCLUSION
We have described an approach to protect access to data on cloud computing based on Organization Role Base Access Control (ORBAC) which authorization is given to users depending on their role in an organization in a given context. We propose a process which allows encrypting data using an access structure proposed by data owner .In this scheme a secret key is partitioned using A Galois F ield GF 2m such a way that each partition is implicitly secure. These partitions are stored on different servers on the network.We have described an approach to protect access to data on cloud computing based on Organization Role Base Access Control (ORBAC) which authorization is given to users depending on their role in an organization in a given context. We propose a process which allows encrypting data using an access structure proposed by data owner .In this scheme a secret key is partitioned using A Galois Field GF 2m such a way that each partition is implicitly secure. These partitions are stored on different servers on the network. We tested our approach, it gives a constructive results. However, we conclude with the following advantages,
Fig. 4. Histogram represent difference between Recall (implicit) and Recall (explicit)
We notice that the recall of our model used implicit approach is lower compared with explicit approach which signifies that a number of servers includes partitions of our secret key is more protected. We notice that the precision of our model used implicit approach is lower compared with explicit approach which signifies that the distribution of partitions of a secret key into
ISBN number: 978-9931-9142-0-4
20
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
our proposition permits to increase a control of access and a trust between provider of cloud and data owner. VI. REFERENCES [1] A. Parakh, S.Kak, Online data storage using implicit security, Information Sciences 179 (2009) 3323–3331, 2009 [2] Abou el kalam, R.El Baida,P. Balbiani, S.Benferhat, F.Cuppens, Y.Deswarte, A.Mige, C.Saurel and G. Trouessin . Or-BAC : un modle de contrle daccs base sur les organisations.Cahiers francophones de la recherche en scurit de linformation , CRIC, Universit de Montpellier I, Numro II, pages 30-43,2003 [3] y. ghebghoub , o.boussaid ,s. oukid Security Model Based Encryption To Protect Data On Cloud , Proceedings of the International Conference on Information Systems and Design of Communication Pages 50-55 , Lisbon, Portugal Copyright,ACM 2014 [4] D. Boneh and X. Boyen. Efficient selective-id secure identity based encryption without random oracles. In EUROCRYPT, pages 223 238, 2004. [5]B. Raja Sekhar,B. Sunil Kumar, L. Swathi Reddy, V. PoornaChandar CP-ABE Based Encryption for Secured Cloud Storage Access], International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 1 ISSN 2229-5518 [6] Z. Liu and Z. Cao, On Effciently Transferring the LinearSecret-Sharing Scheme Matrix inCiphertext-Policy Attribute-Based Encryption [7] K. Karkouda, N. Harbi, J. Darmont, G. Gavin, Confidentialit et disponibilt des donnes entreposes dans les nuages, 9me atelier Fouille de donnes complexes (FDC 12), 2012 [8] A. Lewko ,Fully Secure Functional Encryption: AttributeBased Encryption and (Hierarchical) Inner Product Encryption [9] X. Liang and al. Ciphertext Policy Attribute Based Encryptionwith Efficient Revocation, Copyright 200X ACM [10] A. Shamir. Identity-based cryptosystems and signature schemes. In CRYPTO, pages 47,53,1984. [11]R. Canetti, S. Halevi, and J. Katz. A forward-secure public-key encryption scheme. In EUROCRYPT, pages 255.271, 2003. [12] C. Gentry. Practical identity-based encryption without random oracles. In EUROCRYPT, pages 445464, 2006. [13] C. Gentry and A. Silverberg. Hierarchical id-based cryptography. In ASIACRYPT, pages548 566, 2002. [14] J. Horwitz and B. Lynn. Toward hierarchical identity based encryption. In EUROCRYPT,pages 466 481, 2002. [15] S. OSBORN, R. SANDHU, Q. MUNAWER.Configuring Role-Based Access Control to Enforce Mandatory and Discretionary Access Control Policies,ACM Transactions on Information and System Security, Vol. 3, No. 2, May 2000, Pages 85106 [16] A.Khan, ACCESS CONTROL IN CLOUD COMPUTING ENVIRONMENT,ARPN Journal of Engineering and Applied Sciences,ISSN 1819-6608 ,VOL. 7, NO. 5, MAY 2012
ISBN number: 978-9931-9142-0-4
21
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
A New Bio-Inspired Technique of Artificial Social Cockroaches for Spam Detection with Visual Result Mining Hadj Ahmed Bouarara, Reda Mohamed Hamou, Mohamed Elhadi Rahmani, Abdelmalek Amine, Amine Rahmani GeCode Laboratory, Department of Computer Science Tahar Moulay University of Saida Algeria Abstract— The internet era promotes electronic commerce and facilitates access to many services. Unfortunately, this technology has become incontestably the original source of malicious activities especially the plague called spam that has matured tremendously in the last few years. This paper deals on the unveiling of a new spam detection system (SDS) using a new insect behaviour algorithm called artificial social cockroaches (ASC). It has as input a set of artificial cockroaches (messages) to be classified (hidden) in a shelter (class) spam or ham depending on the aggregation rules (shelter darkness, congener’s attraction and security quality). Our experiments were performed on the SMS Spam V.0.1 dataset and using the validation measures (recall, precision, fmeasure, and entropy), aimed to show the benefit derived from using such approach compared to the result of classical algorithm (decision tree C4.5) and two algorithms inspired from the lifestyle of bees. Finally, a result-mining tool was achieved for the purpose to see the outcome in graphical form (3d cub and cobweb) with more realism using the functionalities of zooming and rotation. Keywords: Social Cockroaches; Bio-Inspired; validation measure; aggregation rule; spam detection;
1. Introduction and problematic The current scientific world was considerably built up with the inaugural appearance of novel concepts and paradigms. The advancement of research and the number of inspiration sources founded, represent a genuine opportunity henceforth. Nowadays, for each encountered problem, we must observe the nature; it may already have the same problem where it had found solutions, long years ago. The foremost part of our work is to develop a new algorithm called artificial social cockroaches ASC inspired from the lifestyle of cockroaches, their way of communication as a decentralized system without the conductor using the odour pheromone and their antennas in order to be grouped under the same shelter for hiding. The cockroach is attracted by the darker place (less luminosity) and follows the path of their congeners. It seeks always the safest place. Recently, the e-mail service has become enormously used, and the principal vector of communication in our digital society despite the emergence of social webs and the web 2.0 tools. Moreover, it allows to users with a mailbox (BAL) and address mail to exchange messages (picture, files, and text documents) from anywhere in the world via internet. Regrettably, among all the messages received by an individual in his mail box, we recognize two cases:
ISBN number: 978-9931-9142-0-4
22
Regular: the email (Ham message) sent by friends or by websites subscribed in. Irregular: the unsolicited emails (junk e-mail) sent in bulk by malicious people (spammers). Today 70-80% of email traffic is composed of spam. It is a rigorous problem in the electronic life, which typifies the great enemy for Mail server administrators, and responsible of information organizations. For that matter, several spam detection systems have seen the light, based on learning techniques and probabilistic techniques including Bayesian classification, artificial neural networks and text compression. It represents a supervised classification task, which has witnessed a burning interest from companies and particles. Merely, the spammer techniques were dramatically evolved where the conventional systems are inefficacious. The nuisance brought by the spam is not limited only on the influx of undesired mails or loss of legitimate mails; simply we can identify different cases of spam such as Scam, FUD, Hoax, the spam telephony (Spim) and Phishing... etc. The second component of our work is designed for the application of our algorithm ASC to construct a spam detection system (SDS). In order to better understand the spam detection results given by our system, and to extract valuable information that assists us to combat this phenomenon. The third part of our work is designed for the construction of a visual result mining tool. Our work is positioned in the intersection of several domains as presents the next figure:
visualisation
Security Our work Bio-inspired
Data-mining
fig .1. positionned of our problematic
2. Review of literature This segment will be composed of two parts: first, the works published around the bio-inspired technique in genetic algorithm [1] ant colony genetic programming [2] evolutionary strategy [3] particles swarm optimisation [4] bees colony [5] social spiders [6] immune system [7]. Second, the different spam detection systems existed in literature: K nearest neighbour [8] Naïve Bayes [9] decision tree [10] particles
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
swarm optimisation [11] worker bees [12] genetic algorithms [13] artificial immune system [14]. As present the next figures
isolated. This company of cockroaches is flexible, robust, decentralized and self-organized. Each cockroach is characterized by three essential characteristics: 3.1.1. Environment: The cockroaches seek shelter or dark places to hide, looking for safety. 3.1.2. Aggregation rules: The movement of cockroaches is based on the interaction between them, and with their environment to make the decision emerge. It is guided by three aggregation rules: i)
Shelter darkness
The degree of darkness of shelter plays a big role in the choice of the most attractive and secure shelter. ii)
Congeners Attraction:
Each cockroach follows the shelter where it finds its congeners (cockroaches from the same family). The cockroaches exchange the information between them in order to modulate the shelter choice of each cockroach from the same family to join them in the same place to hide. So the choice of cockroach is guided by the choice of his congeners
Fig .2. Bio-inspire techniques existed in literature
iii)
Security Quality SQ:
The cockroaches positioned in the heart of the shelter have an elevate security quality. Differently, the cockroaches positioned at the edge of the shelter have a quality safety bass. 3.1.3. Means of Communication The cockroach perceives the environment and other cockroaches whenever it touches them by their antenna.
3.2. The Inspired experience We put the cockroaches in a basin where there is the light everywhere and we built two artificial shelters (shelter is a place with less luminosity as presents the next fig 4.a) using two red circles because cockroaches do not discover the red colour as indicates 2.c. This experience demonstrates how the cockroaches of the same colony move to hide under the same shelter.
Fig .3. Spam detection existed in literature
3.2.1.
3. Research methodology: The principle of our artificial social cockroaches (ASC) algorithm is rather simple. Merely, we must explain the source of inspiration of our idea and the natural lifestyle of social cockroaches: 3.1. Natural life of social cockroaches The cockroach belongs to the family of Blattodea. It is a gregarious insect with incomplete metamorphosis cycle, which thinks better with a group than solo. It is characterised by the social phenomenon of searching the most attractive and secure place (shelter) for concealing. We can identify various types of cockroaches, in our work, we are interested by the cockroaches that live in apartments, which are fertile and they are never
ISBN number: 978-9931-9142-0-4
23
Aggregation process:
First, there's an exploration phase where the cockroaches walk randomly in all ways as demonstrated in the fig 2.b. gradually, they start to pick out a shelter basing on the aggregation rules previously detailed. After a moment, the cockroaches of the same colony are grouped under the same shelter as depicts the next fig 5.
a- two Shelters environment
b-exploration phase
Fig .4. Description of the experience
c- the experience
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Learning basis
Test basis
a) Global view
Decomposition
Message number
Spam learning basis
100
Ham learning basis
200
Spam test basis
222
Spam test basis
802
b) Detailed view
3.3.2. Fig .5. The groping of cockroaches under the same place
3.3. Artificial social cockroaches (ASC) The set of messages (cockroaches) will be classified (hidden) in a class (Shelter) spam or ham according to the attraction rate of each shelter, that’s based on the aggregation rules (shelter darkness, congener’s attraction and security quality) as presents the next figure, and detailed in the next process
Artificial (ACR)
cockroaches
representation
To make artificial cockroaches manipulated by the machine, they must be presented in digital format. Because in our case we are facing messages written in natural language so the processes, allowing the vectoring of messages will be as follows:
3.3.2.1. Text cleaning It is the transit of each original text to a text where numbers, punctuations and special characters does not existed.
3.3.2.2. Text representation This procedure allows the division of text into a list of terms. For our work we chose to use two representation techniques: i) Bag of word decomposes a text into a set of words. ii) N-gram characters: N-gram is a sequence of N consecutive characters that can take the value 2, 3, 4, or 5; where for each text a list of n-gram is generated via the movement of a window of N boxes on the corpus.
3.3.2.3. Coding This is the method employed to encode and calculate the importance of the term in the corpus using a weighting for our study, we will use the composite weighting TF-IDF that corrects the frequency of the term (Term Frequency) based on its frequency in the corpus (Inverse Document Frequency).
Fig .6. Spam detection system using artificial social cockroaches ASC
3.3.1.
Cockroaches Population
For the problem of spam detection the artificial cockroaches represent a set of messages (spam or ham) so the number of artificial cockroaches is equal to the message number in the data set used for our experimentation. Dataset: Spam V.0.1 dataset is a public collection of messages pre tagged built for the research in the field of spam detection, which is presented in [15]. It is composed of 1002 legitimate messages (ham) and 322 spam messages. As illustrated in table 1, we divide this dataset into two sets; the training data set with 300 instances and the testing data set with 1024 instances. Table 1: data set used in our work.
𝑀
TF*IDF= 𝑇𝐹(𝑑, 𝑊𝑖 ) ∗ 𝐿𝑜𝑔(𝐷𝐹(𝑊 )) with: 𝑖
𝑇𝐹(𝑑, 𝑊𝑖 ): The number of occurrences of the term in the document. M: is the number of documents in the collection. 𝐷𝐹(𝑊𝑖 ): It is the number of documents that contain the word i (DF means the document frequency).
3.3.3.
To say that two messages (artificial cockroaches) are from the same family (class) we must calculate a distance between them and if the distance is less than a threshold fixed in advance, then these two messages are from the same family as they are from two different families.
3.3.4.
ISBN number: 978-9931-9142-0-4
24
Communication
Artificial (ASE)
shelter
element
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
An element should be selected that refers to the shelter place. The ASE can be one or more depending on the problem. We can find two types of cockroach. The artificial cockroach that joined a shelter is called secure cockroach CS . The artificial cockroach that has not yet joined the shelter is called no-secure cockroach CN . In our case we have two shelters (spam and ham). The element that refers to the class (shelter) spam represents all the learning set of spam messages and the element that refers to the class (shelter) ham represents all the learning set of ham messages. 3.3.5.
Detection phase (Shelter Attraction)
This step allows the final decision if a message is spam or ham (shelter choice for hiding) by calculating the shelter attraction (SA) for each no-secure cockroach 𝐶𝑖 to the shelter 𝑆𝑖 based on three essential aggregation rules: Learning step (Shelter Darkness (SD)). Congeners’ attraction (CA). Quality safety (QS). SA (𝐶𝑖 ,𝑆𝑖 ) = α *SD (𝐶𝑖 , 𝑆𝑖 ) +β*CA (𝐶𝑖 , 𝑆𝑖 ) + ʎ*QS(𝐶𝑖 , 𝑆𝑖 )
Where: 𝐶𝑖 : Cockroach number i. 𝑆𝑖 : Shelter number i. 3.3.6.1 Learning phase (Shelter Darkness (SD)): In this procedure, we calculate the amount of darkness in a shelter. The darkness of the shelter (class) is the percentage of messages in the learning set that belong to this shelter LB (Si) divided by the total number of messages learning set.
LB(𝐶𝑆𝑖) SD (𝑆𝑖 ) = #CLB
#CLB: The total number of cockroaches learning basis. LB(𝐶𝑆𝑖 ): The number of cockroaches of the learning base which belongs to the shelter𝑆𝑖 . SD (spam shelter) = 0.33 and SD (ham shelter) = 0.66 3.3.6.2 Congeners’ attraction (CA): The CA is used to calculate for each no-secure cockroach the attraction rate of its congeners. This step depends on: The amount of pheromone deposit on the shelter𝑆𝑖 DP. The number of cockroaches secure in the shelter𝑆𝑖 CFN. CA (𝐶𝑛 , 𝑆𝑖 ) =PC (𝐶𝑛 , 𝑆𝑖 ) DP (𝐶𝑛 ,𝑆𝑖 ) i) Congener’s probability (CP): CP depends on the full number of cockroaches secure in the Shelter. CP is used to not promote a shelter over another. PC (𝐶𝑛 ,
1 𝑆𝑖) = #𝐶𝑇 𝑆𝑖
Pheromone density DP:
ISBN number: 978-9931-9142-0-4
∑𝑆𝑇=1 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝐶𝑛 ,𝐶𝑇 𝑆𝑖 )
DP (𝐶𝑛 ,𝑆𝑖 ) = CFN (𝐶𝑛 ,𝑆𝑖 )(
25
𝑆
)
S: the number of cockroaches in the shelter i. 𝐶𝑇 𝑆𝑖 : secure cockroach number T of the shelter i. iii) Congener’s family number CFN: CFN is the number of cockroach secure of the shelter 𝑆𝑖 from the same colony (family) with the no-secure cockroach 𝐶𝑛 based on: A distance (𝐶𝑛 , 𝐶𝑇 𝑆𝑖 ) between each cockroach nosecure 𝐶𝑛 and every cockroach secure 𝐶𝑇 of the shelter𝑆𝑖 . A threshold fixed in advance If the distance (𝐶𝑛 𝐶𝑇 ) SA (𝐶𝑛 ,𝑆ℎ𝑎𝑚 ) then 𝐶𝑛 Else 𝐶𝑛 ham Fin For each secure cockroaches Updating the aggregation rules Fin Spam class and ham class Secure cockroaches of each class.
We conducted a series of tests in two stages: 1- First, we test our system using ASC algorithm with the variation of the following parameters:
To evaluate our algorithm we used the following metrics: Contingency matrix for each class that provides four essential information: True Positive (TP): The number of message really spam and find by our system as spam
o
False Positive (FP): The number of message really ham and find by our system as spam
o
False Negative (FN): The number of message truly spam and find by our system as ham True Negative (TN): The number of message ham and find by our system as ham. Table 3: Contingence matrix. expert judgment
Contingence matrix Spam Judgement system
Ham
spam
TPi
FPi
ham
FNi
TNi
Text representation method 2, 3, 4, 5gram character and bag of words.
Distances measure: Cosine, Chebyshev, Euclidian
Threshold [0.1 - 0.9].
We fixed the weights α=1, β=1 and ʎ=1 of each filter.
For each test we fixed three parameters and varying the other, the most significant results are grouped into the following tables. TABLE 4: RESULT DISTANCE Text representation Recall Precision f-measure Entropy Accuracy Contingence matrix
OF
SPAM
DETECTION
WITH
2-gram
3-gram
4-gram
5-gram
0.761 0.6601 0.7042 0.274 0.8535 169 97
0.7927 0.7489 0.7712 0.2165 0.8974 176 59
0.7342 0.6059 0.6684 0.3035 0.8388 163 106
0.7117 0.622 0.6654 0.2953 0.8437 158 96
53
705 46
743 59
696 64
EUCLIDIAN Bag of word 0.7432 0.66 0.7 0.2742 0.8613 165 85
706 57
717
Table 5: result of spam detection with chebychev distance Text representation Recall Precision f-measure Entropy Accuracy Contingence matrix
2-gram
Bag of word 0.7567 0.666 0.6801 0.6081 0.6711 0.4827 0.5522 0.576 0.4179 0.5775 0.5934 0.6074 0.6264 0.503 0.625 0.3515 0.3279 0.3177 0.3646 0.317 0.7714 0.8105 0.8222 0.7314 0.82 168 180 148 120 151 111 135 188 149 109 54
3-gram
622 74
4-gram
682 71
5-gram
691 87
614 73
693
Table 6: result of spam detection with cosine distance
Recall (R): Calculate the ability of the system to 𝑇𝑃𝑖 detect the message correctly spam R= 𝑇𝑃𝑖 +𝐹𝑁𝑖
ISBN number: 978-9931-9142-0-4
Entropy (e): It allows to calculate the loss of information in our system e=-P*log(P)
5. Results:
spam
4. Validation metrics
o
Accuracy: The success rate is the percentage of messages correctly classified by the system 𝑇𝑃𝑖 +𝑇𝑁𝑖 Accuracy= . 𝑇𝑃𝑖 +𝑇𝑁𝑖 +𝐹𝑁𝑖+ 𝐹𝑃𝑖
distance(𝐶𝑛 ,𝐶𝑆𝑖 )
o
F-measure (F) is a measure that combined the 2∗R∗P recall and precision: F= (R+P)
#CLB
/*Congeners attraction*/ CA (𝐶𝑛 , 𝑆𝑖 )=PC(𝐶𝑛 , 𝑆𝑖 ) DP(𝐶𝑛 , 𝑆𝑖 ) /*Quality safety*/ 1 SQ (𝐶𝑛 ,𝑆𝑖 )=
Precision (P): Calculate the ability of the system to return only the message correctly suspicious. 𝑇𝑃𝑖 P=
26
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Text representation Recall Precision f-measure Entropy Accuracy Contingence matrix
2-gram
3-gram
4-gram
5-gram
0.6216 0.4457 0.5226 0.3601 0.8095 138 111
0.6486 0.578 0.6198 0.3168 0.8212 144 105
0.7792 0.7456 0.7682 0.2188 0.8945 173 59
0.7072 0.7584 0.7393 0.2097 0.8876 157 50
84
691 78
697 49
743 65
Table 7: result of spam detection with variation of weightings.
Bag of word 0.6036 0.563 0.585 0.3234 0.8125 134 104
752 88
Weightings
698
ʎ
Precisi Recall fEntrop Accur on measu y acy re 0.536 0.603 0.572 0.334 0.800 6 5 7
𝛼
𝛽
0.15
0.28
0.3
0.25
0.3
0.17
0.567
0.666
0.613
0.321
0.817
0.38
0.26
0.11
0.692
0.779
0.732
0.254
0.876
0.44
0.33
0.19
0.724
0.815
0.771
0.233
0.892
0.65
0.39
0.22
0.845
0.837
0.843
0.141
0.931
5.1. Discussion1: In order to determine the ideal configuration we will analyse the influence of each parameter:
5.1.1.
Influence of text representation method:
The n-gram character method Gives better result Compared to the bag of word because it is multilingual, tolerant to misspelling and does not require the use of a language processing. The variation of the parameter N influences greatly the performance of our algorithm ASC. The best results are obtained with N =3 that represents the optimal value.
Influence of distance measure:
The most ideal distance measure for spam corpus v.0.1 is Euclidian because it allows to the artificial cockroaches to better identify their congeners and make the best decision of choice shelter.
5.1.3.
Influence of threshold
The threshold is a very difficult parameter to fix that requires considerable testing to ensure that the colony even cockroaches will recognize their congeners by following their choice and have a better cooperation. The threshold that allows a better communication between artificial cockroaches is equal to 0.6 which is why the recall is always superior to the precision.
5.1.4.
In term entropy
In our system we did not use the dimension reduction so for that reason there is fewer of information loss and consequently the best entropy is equal to 0.2165. The ideal configuration of our algorithm ASC for spam detection is (distance Euclidian, 3-gram character and threshold=0.6).
2- Secondly, for the purpose of studying the impact of each parameter (shelter darkness congeners attraction and quality safety). We’ll test the performances of our ASC approach with ideal configuration previously fixed with the variation of adjustment weightings (α, β, ʎ). The results obtained are grouped in the following table.
ISBN number: 978-9931-9142-0-4
88 148 74 173 49 181 41 186 36
686 113 689 77 725 69 733 34 768
5.2. Discussion 2:
By observing the previous tables we notice that:
5.1.2.
Contingen cy Matrix 134 116
27
The variation of adjustment parameters affects the quality of results. The previous table shows that the best result are obtained When α and β is greater in ʎ which explains the parameter shelter darkness plays a more important role in the choice of shelter for each cockroaches compare to the quality safety and congeners attraction parameters.
5.3. Comparative study: In order to give our result a reference, a comparison was realized between the results provided by our algorithm ASC and the results obtained by the following algorithms: The ABC algorithm was proposed the first time in [5] for the problem of security, which had been adapted in [12] for the detection and filtering spam. A new algorithm based on the behaviour of worker bees (foraging guardian and cleaning) was proposed by Hamou in [12]. We had implemented a classical supervised learning algorithm decision tree C4.5 for spam detection using the tool weka [16]. Table 8: comparison between the best result of our algorithm ASC for spam detection and the result of algorithms existed in literature Artificial bees colony Social worker bees Decision treeC4.5 Our approach ASC
Recall 0.5464
Precision f-measure entropy 0.513 0.5243 0.219
accuracy 0.499
0.74 0.65 0.8454
0.93 0.865 0.8378
0.87 0.851 0.9316
0.82 0.803 0.8432
0.067 0.2164 0.1419
The table clearly shows that our approach CSA Gives best result compared to other approaches existed in literature since it is based spans more than one parameters and requires the exchange of information between artificial cockroaches.
6.
Result-mining (Visualisation):
This part of our system is characterised by the functionality of zoom and rotation, which provides a visualization of spam detection result:
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
In a 3d cube format as presented in the figure 7.
a) Detailed view
Reference: [1] Holland, J. H. (1973). ,Genetic algorithms and the optimal allocation
b) Global view
Fig .7. 3D visualisation of spam detection results as a 3D cub.
In a silky structure format (cobweb) as depicted the next figure.
Fig .8. Visualization of spam detection results as a cube web.
7. Conclusion This paper gives a fresh bio-inspired technique called artificial social cockroaches (ASC), based on the collective behaviours of cockroaches and their patterns of grouping into the same shelter using the aggregation rule (shelter darkness, congeners attraction and quality safety). The second part has proved that the social cockroaches were able to detect and filter spam compared to the result of social bees and decision tree (c4.5). Finally, in order to have a good representation of results with more realism and to meet the needs of the user. We have implemented a visualisation tool using java 3D.
7.1. Future works: As perspective, we plan to apply our approach for different problems such as: image processing, preservation of private life in big data, to solve the problem of information retrieval, clustering, and plagiarism detection. We can also combine our ASC algorithm with others meta-heuristic methods (simulating annealing, tabu search and genetic algorithm) in the adjustment of weightings. The functioning of our algorithm gives us the opportunity to solve the traveling salesman problem (TSP). The city represents an artificial shelter, and the choice of displacement of each cockroach from a shelter i to a shelter j depends on the decision of the group. First, the cockroaches will move randomly and after a moment the cockroaches will be grouped under the same shelter j that represents the nearest city to the city i and so on until city arrived. As a result, we can say that if we have a multiple paths to reach the arrived city. The shortest path is the path which connects the cities (shelters) where cockroaches have been grouped.
ISBN number: 978-9931-9142-0-4
28
of trials. SIAM Journal on Computing, 2(2), 88-105. [2] Koza, John R. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: The MIT Press. [3] L.J. Fogel. Autonomous automata. Industrial research, 4:14–19, 1962 [4] Kennedy, J.; Eberhart, R. (1995). "Particle Swarm Optimization". Proceedings of IEEE International Conference on Neural Networks. IV. pp. 1942–1948 [5] D. Karaboga, B. Basturk, A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm, Journal of Global Optimization 39 (2005) 459–471 [6] Bourjot, C., Chevrier, V., & Thomas, V. (2003). A new swarm mechanism based on social spiders colonies: from web weaving to region detection. Web Intelligence and Agent Systems, 1(1), 47-64. [7] Dasgupta, D., & Forrest, S. (1999, July). Artificial immune systems in industrial applications. In Intelligent Processing and Manufacturing of Materials, 1999. IPMM'99. Proceedings of the Second International Conference on (Vol. 1, pp. 257-267). IEEE. [8] Firte, L., Lemnaru, C., & Potolea, R. (2010, August). Spam detection filter using KNN algorithm and resampling. In Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on (pp. 27-33). IEEE. [9] Sahami, M., Dumais, S., Heckerman, D., & Horvitz, E. (1998, July). A Bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 workshop (Vol. 62, pp. 98-105). [10] Sasaki, M., & Shinnou, H. (2005, November). Spam detection using text clustering. In Cyberworlds, 2005. International Conference on (pp. 4-pp). IEEE. [11] Lai, C. C., & Wu, C. H. (2007). Particle swarm optimizationaided feature selection for spam email classification. IEEE, Kumamoto, 165. [12] Hamou, R. M., Amine, A., & Boudia, A. (2013). A New MetaHeuristic Based on Social Bees for Detection and Filtering of Spam. International Journal of Applied Metaheuristic Computing (IJAMC), 4(3), 15-33. [13]Mohammad, A. H., & Zitar, R. A. (2011). Application of genetic optimized artificial immune system and neural networks in spam detection. applied soft computing, 11(4), 3827-3845. [14] Oda, T., & White, T. (2005). Immunity from spam: An analysis of an artificial immune system for junk email detection. In Artificial Immune Systems (pp. 276-289). Springer Berlin Heidelberg. [15] Gómez Hidalgo, J.M., Cajigas Bringas, G., Puertas Sanz, E., Carrero García, F. Content Based SMS Spam Filtering. Dick Bulterman, David F. Brailsford (Eds.), Proceedings of the 2006 ACM Symposium on Document Engineering, Amsterdam, The Netherlands. [16] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
BIARV: Bio-Inspired Approach for Routing in Vehicular Ad Hoc Networks SAYAD Lamri #*1, AISSANI Djamil #, BOUALLOUCHE-MEDJKOUNE Louiza # #
Unité de recherche LAMOS, Faculté des Sciences Exactes, Université de Bejaia, ALGERIE 1
[email protected] * Département d’Informatique, Faculté de Mathématique et d’Informatique, Université de M’sila, ALGERIE
Abstract— In recent years Vehicular Ad Hoc Networks (VANETs) have gained a lot of popularity for both research and industry communities with the aim of providing safety and comfort for people. As nodes in VANETs have very high mobility, so the major research challenges lies in design of routing protocol, data sharing, security and privacy, network formation etc. We aim here to present a new routing protocol that deals with mobility aspect of VANETs. In this paper, we propose a bio-inspired routing protocol inspired from foraging behavior of ants. The main idea is to use autonomous agents (artificial ants) to construct and maintain multiple routes between two vehicles. Simulations results show that our solution provides better performances in terms of average end-to-end delay and Packets Delivery Ratio (PDR) in comparison to AODV.
Propagation Model. Unlimited Battery Power and Storage. On-board Sensors.
Keywords — Vehicular Ad Hoc Network, routing, Ant Colony Optimization, Bio-inspired
I. INTRODUCTION As today’s vehicles are equipped with more and more functions/applications (e.g., road navigation, music playing, vehicle diagnosis, traveling log, and calendar), data in a vehicle has grown rapidly in terms of both the volume and types. This has led to the introduction of a new type of networks called Vehicular Ad Hoc Networks (VANETs) as part of the Intelligent Transportation System (ITS). These kinds of networks are self-configuring networks composed of a collection of vehicles and elements of roadside infrastructure connected with each other without requiring an underlying infrastructure, sending and receiving information and warnings about the current traffic situation (see Figure 1). The purpose of the intelligent transportation system is to improve the security, and the efficiency of the road transport to decrease the accidents and to provide better conditions for drivers and passengers. VANETs are distributed self-organizing networks formed between moving vehicles equipped with wireless communication devices. It deploys the concept of continuously varying vehicular motion. VANETs are considered as a subclass of Mobile Ad Hoc Networks (MANETs). However, they are distinguished from MANETs with the following characteristics: Highly dynamic topology. Patterned Mobility.
ISBN number: 978-9931-9142-0-4
Figure 1 VANET architecture Since their emergence, routing was one of the most challenging tasks in VANETs. The function of packet routing in VANETs is very essential in the sense that it is used in very critical tasks like collision avoidance. Due to the dynamic nature of VANETs, routing has various challenges and constraints: [1] Constant topological changes due to high mobility of the nodes Varying density and velocity of the vehicles on the road Sparse distribution of vehicles in some geographical regions which leads to poor connectivity and performance degradation of the network
29
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Efficient clustering and selection of Cluster Head (CH) based upon some predefined criteria Intrusion detection and security
city environments. The nodes sense the network locally and collect information to feed the cognitive module which selects the best routing strategy. Their proposed algorithm performs the decision process locally without extra protocol overhead, without the need of additional protocol message dissemination Many routing protocols have been proposed since the or convergence mechanism, but the benefit is perceived advent of VANETs. The operational principles of VANET globally with the general improvement of the network and MANET matches in most aspects, except the high speed performance. This first approach to global adaptiveness has mobility and high nature of unpredictability of their made use of linear discriminant analysis with successful movement. This suggests the applicability of most MANET results. routing protocols in VANET. However, more specific J. Kim et al. propose in [6] a novel routing protocol for algorithms should be designed to meet VANETs cognitive radio vehicular networks, called Spectrum-Aware characteristics. In this paper, we propose a new hybrid routing BEaconless geographical routing (SABE). The main idea in protocol inspired from Ant Colony Optimization (ACO) [2] SABE is that the routing decision, as well as the resource algorithm designed for VANETs. Our contributions consist of allocation strategy, is made by the receivers on a per-packet the followings: and per-hop basis. A packet carrier vehicle broadcasts a Propose a new hybrid routing protocol that deals with forward request packet, and includes in it its available dynamic aspect of VANETs. resources and location. Receivers calculate a link weight with Implement and adapt ACO algorithm to solve routing consideration of their and source’s available resources and locations. Then, a timer to reply to the request is set issue in VANETs. The proposed algorithm has been evaluated based on a depending on the link weight. The receiver with the highest realistic mobility model using a scenario generated link weight replies first, establishing itself as the relay node. Simulation results show that our protocol increases the end-tofrom the city of Malaga. end network throughput by up to 250%and decreases the endto-end delay by up to 400 % compared with other The rest of this paper is organized as follows. In the next geographical routing protocols. section, the previous works are presented. Section 3 describes Liu et al. [7] proposed Relative Position Based Message the proposed routing protocol. Performance evaluation and Dissemination (RPB-MD) protocol to disseminate messages results are discussed in section 4. Conclusions are presented in more efficiently. Instead of single node, RPB-MD considers section 5. all vehicles in ZoR as destinations of messages. It also assumes that vehicles obtain relative distance between II. PREVIOUS WORKS neighbours through GPS position information. To make the candidate nodes hold the message with high reliability and to Based upon the above defined constraints, number of ensure high PDR and low delivery overhead, a Directional research proposals has been formulated in literature to address Greedy Broadcast Routing (DGBR) is proposed. The time various problems in VANETs. parameters are designed adaptively based on message Authors in [3] proposed a Prediction-Based Routing (PBR) attributes and local vehicular traffic density which guarantee for VANETs. To deal with the frequent link breakages due to efficiency. The proposed protocol is robust to traffic density the highly dynamic topology, vehicles in PBR use mobile and relative distance accuracy. This protocol is applicable gateways to connect to the Internet while travelling on road. only to highway scenario and needs to be revised to ensure Even though the vehicles on road have high velocity and real life working in urban city scenario. change the direction rapidly still their motion is predictable. The authors proposed PBR and use this predicted routes to Wang and Lin [8] proposed passive clustering based pre-emptively create new routes before the existing routes fail routing protocol named PassCAR. In passive clustering, a to minimize the failures. The simulation results show that cluster has one CH and multiple clusters can be connected PBR offers reduction in route failures and greatly improves through gateways. CAR works in three phases namely route Packet Delivery Ratio (PDR). discovery, route establishment and data transmission. During Lochert et al. [4] proposed GPCR (greedy perimeter the route discovery phase, suitable nodes are selected which coordinator routing) which is a position-based routing for become gateway and CH. These nodes forward RREQ packets. urban environment. GPCR protocol is very well suited for For route establishment, the protocol uses multi-metric highly dynamic environments such as inter-vehicle election strategy and considers the links reliability, stability communication on the highway or city. GPCR traverses the and sustainability. Protocol quantifies the links based on junctions by a restricted greedy forwarding procedure, and parameters of node degree, expected transmission count and adjusts the routing path by the repair strategy which is based link lifetime. Once the route is discovered the destination node on the topology of streets and junctions. replies the RREP packet to the source node. The data B. Blanco et al. proposed in [5] an intelligent routing transmission is done through the established path. algorithm called GARI, which adapts its operation based on the high mobility and changing characteristics of vehicular
ISBN number: 978-9931-9142-0-4
30
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Recently, Soares et al. [9] proposed GeoSpray routing protocol which combines store-carry-and-forward technique with routing decisions based on geographic location. These geographic locations are provided by GPS devices. In GeoSpray, authors proposed a hybrid approach making use of multiple copy and single copy routing scheme. In order to exploit the alternate paths, GeoSpray starts with multiple copy scheme which spreads a limited number of bundle copies. Afterwards it switches to single copy scheme which takes advantage of additional opportunities. It improves delivery success and reduces delivery delay. The protocol applies active receipts to clear the delivered bundles across the network nodes.
: is the probability that node i will be selected as the next hop to reach the destination d. : is the quality of the link ki (pheromone quantity) and is calculated as follows:
∑ and
is calculated according to the following formula : ⁄ ∑
III. PROPOSAL
⁄
α and β are parameters of the algorithm.
This paper proposes BIARV a new routing protocol for MANETs, and inspired from the Ant Colony Optimization meta-heuristic. It is hybrid multi-path protocol, designed to support high level of mobility and nodes speeds. BIARV has three phases: Route discovery phase, route maintenance phase and route failure phase.
D. Route discovery Phase During this phase, routes are discovered every time it is needed. The creation of new routes requires the use of a Forward Ant (FANT) and a Backward Ant (BANT). A FANT is an agent which is broadcasted until it reaches the destination node. In contrast, BANT establishes the route to destination node by following the same route as FANT but in A. Data structures the opposite sense. Nodes are able to distinguish duplicate Each node i in BIARV uses three data structures: packets on the basis of the sequence number and the source Routing Table RT(i): destination address d, next hop k, address of the FANT. pheromone value , number of hops , number of A node receiving a FANT for the first time creates a data packets relayed by node k : . record in its routing table. A record in the routing table is a Neighbors List NL(i): list of one hop neighbors. triple and consists of (destination address, next hop, Parents List PL(i): a neighbor k is considered as a pheromone value). parent of node i for the destination d if i appears in the E. Route maintenance Phase next hop list of RT(k) for the destination d. When routes are created in the discovery phase, they need, then, to be maintained in order to improve and adapt routes B. Packets types with topology updates. To do so, the same two special packets BIARV uses four types of packets: are used: Forward Ant and Backward Ant. We mean by route Hello, Update Ant (UANT) : sequence number, source maintenance: 1) discover new interesting routes 2) maintain existing routes by updating and values 3) and address, destination address, parents list Forward Ant (FANT): sequence number, source Pheromone evaporation. In fact, is increased every time a data packet is address, destination address, route records, hop count. Backward Ant (BANT): sequence number, source forward from node k. address, destination address, route records. BIARV uses both FANT and BANT in the discovery and New routes discovery If a node m detects a new node n into its neighbourhood, maintenance phase. However, UANT packets are used when it broadcasts a proactive ant packet (FANT) to all its active route failure occurs. destinations over this node n. FANT is sent with a list of all active destinations. The role of FANT is to search for possible C. Transition Probability new interesting routes to all the active destinations of m. A When a data packet arrives to a node k, the next node is route from node i to a destination d is considered interesting if selected according to the following formula: the new hop number is better than the old one. When it arrives to one of these destinations, FANT marks it as reached, and creates BANT. FANT dies when all destinations are reached, or TTL is equal to zero. If a new interesting route to a destination d has been discovered then, m sends an update where: ant UANT to all its parents related to this destination. UANT is sent with the ID of d and the number of hops, , to d.
ISBN number: 978-9931-9142-0-4
31
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Every node receiving UANT, compares its best distance to destination d with ( + 1) contained in UANT. If the new distance is better, the node updates its routing table with the new distance to d and it forwards UANT, with the new value of , to all its predecessors. Otherwise, UANT dies.
Traffic Type Packet size Packet rate α β
Pheromone Evaporation We used a negative feedback in the form of uniform evaporation on all paths. Thus, at regular intervals, number of data packets are decreased, for all routes, by :
Table 1: Simulation parameters
B. Simulations Results Two metrics have been used in order to evaluate BIARV performances: Average end-to-end delay, and Packet loss ratio. In order to study the behavior of BIARV under various mobility situation, we have generated many scenarios by changing Pause time values, and secondly by varying nodes speeds. Figure 1 shows the obtained results in terms of average endto-end delay under various pause time values. However, figure 2 displays the evolution packet loss ratio value.
F. Route failure Phase To detect link failures, BIARV uses HELLO packets sent by every node to its immediate neighbors. A link to a node k is declared as failed when the current node does not receive a HELLO packet during a certain amount of time defined by where: is the HELLO interval and number of allowed missed HELLO.
0,08
is the
Proposal AODV 0,06
Delay (s)
When a link, between a node i and a neighbor k, fails, a route to one or more destinations may be lost. The node k will be removed from neighbors list of node i and all the associated entries from its routing table. If node i has lost its best or only route to a destination, it notifies all its corresponding predecessors by sending a UANT. The packet UANT contains a list of the destinations to which the node lost its best route and the new number of hops to this destination (if it still has entries for the destination). Each node that receives UANT, updates its routing table using the new distance. If no changes are observed, the node stops broadcasting the UANT. If, in turn, it lost its best or only path to a destination, it will forward the notification further, until all concerned nodes are notified about the new situation or no changes in the routing table is observed.
0
0
10
25 75 Pause Time (s)
150
300
600
Figure 1 : Delay vs Pause time These results attest that BIARV outperform AODV particularly when mobility level is increased (low values of pause time). When pause time is between 300 seconds and 600 seconds, the two protocols show approximatively equivalent performances. This means that in the cas of static or less mobile networks we can not see the difference between the two protocols. This can be justified by little changes in the network topology so that no actions are needed after the routes are discovered. However, more the pause time is decreased more the improvement is observed.
A. Simulation parameters Parameters used in simulations are summarized in Table 1: Valeur 2 Mbps IEEE 802.11 RWM 50 12 600 s Random 1000m X 1000m
ISBN number: 978-9931-9142-0-4
0,04
0,02
IV. SIMULATIONS AND RESULTS We have used ns2 network simulator to implement and evaluate our new routing protocol. BIARV simulation results have been, then, compared to AODV.
Paramètre Bandwidth Physical Layer Mobility Model Nodes number Traffic number Simulation time Topology Simulation area
CBR 512 bytes 1 paquet / second 0.7 0.3
32
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Packet Loss (%)
10 Proposal AODV
8
V. CONCLUSION VANET is an emerging and attractive technology dedicated to safety and comfort services to the vehicle users. In this paper we have implemented the proposed BIARV protocol in NS2. The performance of the proposed protocol was compared with AODV. The results of the simulation indicate that BIARV outperforms AODV in terms of average end-to-end delay and Packet loss ratio.
6 4 2 0
0
10
25 75 Pause Time (s)
150
300
In future, the performance comparison can be made between the proposed protocol and other existing bio-inspired protocol for performance metrics such as end-to-end delay, routing overhead, etc. of ad hoc routing protocols with different simulation parameters.
600
Figure 2: Packet Loss vs Pause Time In the same context, node mobility can be modified by changing nodes speeds. Thus, figures 3 and figure 4 show the evolution of the same metrics under various node speeds. 0,1
REFERENCES
Proposal AODV
Amit Dua, Neeraj Kumar, Seema Bawa. (2014). “A systematic review on routing protocols for Vehicular Ad Hoc Networks ,” Vehicular Communications 1, pp. 33–52. [2] M. Dorigo, M. Birattari, T. Stutzle. (2006). “Ant colony optimization: artificial ants as a computational intelligence technique,” IEEE Computational Intelligence Magazine, 1(4), pp. 28-39. [3] V.Namboodiri, L. Gao. (2007). “Prediction-based routing for vehicular ad hoc networks,” IEEE Transaction in Vehicular Technology, 56(4), pp. 2332–2345. [4] C. Lochert, M. Mauve, H. Fera, and H. Hartenstein. (2005).“Geographic routing in city scenarios,” ACMSIGMOBILE Mobile Computing and Communications, Vol. 9,pp. 69-72. [5] Blanco B, Liberal F , and Amaia Aguirregoitia. (2013). “Application of cognitive techniques to adaptive routing for VANETs in city environments. Mobile Netw Appl. doi:10.1007/s11036-013-0466-7 [6] Kim J, Krunz M. (2013). “Spectrum-aware beaconless geographical routing protocol for cognitive radio enabled vehicular networks”. Mobile Netw Appl. doi:10.1007/s11036-013-0476-5 [7] C. Liu, C. Chigan, (2012). “RPB-MD: Providing robust message dissemination for vehicular ad hoc networks,” Ad Hoc Network, 10 (3), pp. 497–511. [8] S. S. Wang, Y. S. Lin, (2013). “PassCAR: A passive clustering aided routing protocol for vehicular ad hoc networks,” Comput. Commun. 36 (2), pp. 170–179. [9] V. N. G. J. Soares, J. J. P. C. Rodrigues, F. Farahmand, (2014), GeoSpray: A geographic routing protocol for vehicular delay tolerant networks, Inf. Fusion 15, pp. 102–113. [1]
0,08
Delay (s)
0,06 0,04 0,02 0
5
10
20 30 Speed (m/s)
40
50
Figure 3 : Delay vs Node Speed The described results confirm the remarks identified previously. BIARV perform better than AODV, in particular, when nodes move quickly. 12
Proposal AODV
Packet Loss (%)
10 8 6 4 2 0
5
10
20 30 Speed (m/s)
40
50
Figure 4 : Packet Loss vs Node Speed
ISBN number: 978-9931-9142-0-4
33
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Recommending relevant GitHub repositories: a collaborative-filtering approach Mohamed Guendouz, Abdelmalek Amine, Reda Mohamed Hamou GeCoDe Laboratory, Departments of Computer Science, Tahar Moulay university of SAIDA, Algeria
Abstract—With its huge number of hosted source code and software projects, the GitHub website presents an essential tool for developers, in fact it helps them creating and maintaining their projects, these projects are hosted in a form of repositories, developers lose a lot of time when they are searching for useful repositories that may help them in maintaining their own projects.
II. RELATED WORKS Recommender system creation and improvement is an active domain, many researchers were involved in and many methods are proposed by them through the recent years, we can classify these methods in three groups, Content-Based, Collaborative-Filtering and Hybrid systems. While contentbased systems focus on properties of items, collaborativefiltering systems focus on the relationship between users and items, the hybrid approaches combine them [3].
In order to help developers to find the right repositories according to their needs, and in the goal of trying to reduce time when they are searching for theses repositories. We propose in this paper a new recommender system for GitHub repositories to predict highly relevant projects to developers based on the collaborative-filtering approach. We evaluate our system on a real dataset to show how our repositories recommender system can helps developers to find potential projects. Our approach can reaches a precision of 78% and a recall of 80% for top-1 recommendation and top-20 recommendation successively. Keywords—github; filtering; social coding .
recommender
I.
system;
In the content-based recommendations similarity is determined by measuring the similarity in their properties, in the collaborative-filtering recommendations similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items [4]. Since we are using a collaborative-filtering based approach for our system, we focused our work in this section to the study of collaborative-filtering recommendation methods that are proposed in literature. The term “collaborative-filtering (CF)” was invented by the developers of one of the first recommender system [5], this term has been widely adopted by researchers and companies. Collaborative-filtering techniques are grouped in three main groups: Memory-based, Model-based and Hybrid techniques, these methods differentiate in the representation of user/items data and the algorithms used for similarity calculation.
collaborative-
INTRODUCTION
GitHub [1] is a web-based Git repository hosting service and a social coding platform, with over 3.4 million users and with 16.7 million repositories [2], GitHub is considered as the largest code host in the world, The site provides social networking-like functions such as feeds, followers, wikis and a social network graph to display how developers work on their forks of a repository.
Memory-based technique use the user rating data to calculate the similarity or weight between users or items and make predictions or recommendations according to those calculated similarity values [6]; this method is widely used in many commercial systems because they are easy-to-implement and highly effective.
However the GitHub website does not provide any official tool for recommending repositories that can be useful for developers, developers may lose a considerable time when they are searching for projects that respond to their needs, to help these developers in finding useful repositories for their projects, we present in this paper a new recommender system for GitHub repositories based on users’ activity history on the website and their links with each other.
But, there are several limitations for the memory-based CF techniques, such as the fact that the similarity values are based on common items and therefore are unreliable when data are sparse and the common items are therefore few. To achieve better prediction performance and overcome shortcomings of memory-based CF algorithms, model-based CF approaches have been investigated. Model-based CF techniques use the pure rating data to estimate or learn a model to make predictions [7]. The model can be a data mining or machine learning algorithm. Well-known model-based CF techniques include Bayesian belief nets (BNs) CF models [7–9], clustering CF models [10, 11], and latent semantic CF models [12].
The outlined of this paper is given as follow: section 2 gives an idea about related works in recommender system approaches and specially the collaborative-filtering technique. In section 3 we present our system approach and it architecture. Section 4 details the used dataset. Finally, section 5 illustrates the obtained results of our system evaluation.
ISBN number: 978-9931-9142-0-4
34
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Hybrid CF techniques, such as the content-boosted CF algorithm [13] and Personality Diagnosis (PD)[14], combine CF and content-based techniques, hoping to avoid the limitations of either approach and thereby improve recommendation performance.
for each developer v in our dataset we calculate their similarity using the method similar(u,v). Fig. 2 shows the steps of this method. At the end of this step, we obtain a vector that contains a set of developers with their similarities values; before we pass to the next step we sort this vector in a descendent order according the similarity value of each developer.
III. PROPOSED APPROACH In this section we describe our recommendation system approach, which is based on the collaborative-filtering technique; this technique has been used recent years in many real systems like books and movies recommendation.
Procedure SIMILAR Input: u, v // denotes two GitHub users u and v
Fig. 1 shows the base architecture of our system, first of all the system begins by finding highly similar users or developers for a GitHub user and sort them in a descendent order according to their number of common repositories with this GitHub user, after that our system collect all repositories for each user from the resulting list of the previous step and he chooses only the repositories which are forked the most by these users, in the final step our system rank these repositories and he chooses a set, which contains the relevant repositories to the user.
Output: Count // number of common repositories between u and v Begin Count ← 0; For Each r in Ru // Ru is the repositories of user u. If r in Rv // Rv is the repositories of user v. Count ← Count +1;
Now we are going to describe in details each step of our system, we have separated our system in two main steps, and each step is responsible of executing a specific tasks. In general we can say that these steps are respectively: finding highly similar users and finding highly relevant repositories for a GitHub user or developer.
Return Count; End. Fig. 2. The algorithm flow of similar.
B. Second Step: finding highly relevant repositories The goal of any recommendation system is to find a set of relevant items for a user from a huge number of items, in our system we use the result obtained from the first step to find a set of repositories that match the developer’s criteria. As we have mentioned earlier the result of the first step is a vector of developer similarities ordered in a descendent order, in this second step we begin by selecting the first n developers from this vector, we call these developers the highly similar developers, in our case we choose n equal to 20, after that we retrieve all repositories of these developers (forked repositories). Now we have a collection that contains a huge number of repositories, to find just relevant repositories from this collection we defined an another method called count_of_fork(r), this method takes a repository as parameter and returns the number of developers that forked it, Fig. 3 shows the algorithm flow of this method. Finally our recommender system returns a subset of repositories, this subset is called the list of recommendation, and it contains the highly relevant repositories to developer. This list is used as input for evaluating our recommender system.
Fig. 1. GitHub Recommender System Architecture.
A. First step: finding highly similar developers In order to find similar users or developers, we defined a method called similar(u,v), this method takes two developers u and v as parameters and return the number of common repositories between them. In this step we take a developer in which we want to recommend to him a set of repositories and
ISBN number: 978-9931-9142-0-4
35
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
In our approach we use the data from 2014-10-01 to 201411-30 (t1) as the training set and the data from 2014-12-01 to 2014-12-31 (t2) as the test set. V. EVALUATION
Procedure COUNT OF FORK
In this section we show the results that we have obtaining after evaluating our system on a real dataset, in order to achieve this step we choose three evaluation metrics, Recall, Precision and F-Measure.
Input: r // denotes a GitHub repository Output: Forks // number of forks by users Begin
These three metrics are widely used in literature to evaluate recommender systems in term of relevancy.
Forks ← 0; For Each user in Users // Users denotes the highly similar users. ‘user’.
Based on the selection of items for recommendations and their relevancy, we can have the four types of items outlined in Table. 1. Given this table, we can define measures that use relevancy information provided by users.
If r in Ruser // Ruser is the repositories of user Forks ← Forks +1;
TABLE I.
Return Forks;
PARTITIONING OF ITEMS WITH RESPECT TO THEIR SELECTION RECOMMENDATION AND THEIR RELEVANCY
FOR
Selected
End.
Relevant Irrelevant Total
Fig. 3. The algorithm flow of COUNT OF FORKS.
Nrs Nis Ns
Not Selected Nrn Nin Nn
Total Nr Ni N
Precision is one such measure. It defines the fraction of relevant items among recommended items:
IV. DATASET For testing and evaluating our approach, we have used the GitHub API [15] to download a collection of random repositories and their information like count of forks, count of stars, we downloaded this collection at two different times: t1 and t2, t1 is used to build our system (used as training set) while t2 is used for evaluating the relevancy of recommendations (used as test set).
P = Nrs / Ns
(1)
Recall provides the probability of selecting a relevant item for recommendation: R = Nrs / Nr
(2)
We can also combine both precision and recall by taking their harmonic mean in the F-measure:
We have collected a number of 20,000 repositories and 2000 developers’ information from GitHub. We have used the Mongo DB [16] software to store the data, Mongo DB is a NoSQL database that store data in documents and columns, and for our case we used two collections schema: users and repos. Fig. 5 shows the percentage of programming languages in our dataset.
F=2PR / (P + R)
(3)
As mentioned in the previous section, we use the collection of repositories collected at time t2 for the evaluation of our recommender system. A.
Evaluation of a top-N recommendation As mentioned early, we used the precision and recall metrics for evaluating the performances of our approach. We evaluate top-1, top-3, top-5, top-10, top-15 and top-20 recommendation repositories. Unfortunately until the writing of these lines there are no related works on creating a recommender system for GitHub repositories, so in this section we will discuss just the evaluation of our approach using well known metrics which were described above. In a first time we have calculated the precision and the recall values for each recommendation, these values are computed using formulas discussed above. Fig. 4. Percentage of Programming Languages in the dataset.
ISBN number: 978-9931-9142-0-4
36
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
VI. CONCLUSION AND FUTURE WORK With its huge number of registered developer and repositories, GitHub present a good source of materials and knowledge for any developer, to help these developers find relevant content and projects in this website, we have proposed in this paper a new recommender system for repositories based on the collaborative-filtering approach to predict useful projects. The evaluation of our system on a real dataset shows good results, our system can reaches a 78% of recall and 80% of precision. As future work we plan to explore another GitHub characteristics and machine learning methods to improve the performance of our system.
Fig. 5. Precision vs. Recall of repositories recommender.
REFERENCES Fig. 5 clearly exhibits the overall performance of our approach. On average, the precision reaches the highest point of 0.78 for top-1 recommendation and the recall considerably ascends to 0.803 at the point of top-20 recommendation. This means that our system can achieve a percentage of 80% of successful prediction.
[1] [2]
[3]
This means for example in the Top-20 recommendation when recall is equal to 0.8 that a good number of relevant repositories are also in the recommendation list or in the recommendation result, and for Top-1 recommendation when precision is equal to 0.78 that a good number of repositories recommended in the list or in the result were good.
[4] [5]
Fig. 6 shows the plot graph of the F-Measure value according to each value of N (Top-N recommendation), this graph shows that our system reaches good F-Measure results and so can give us good recommendations.
[6]
[7]
The results obtained in this evaluation of our system allowed us to say that our technique can give good recommendations.
[8]
[9]
[10]
[11]
[12] [13]
[14] Fig. 6. F-Measure graph for each N.
[15] [16]
ISBN number: 978-9931-9142-0-4
37
GitHub, https://github.com/ Whitaker. Marisa, "Former UC student establishes a celebrated website in GitHub that simplifies coding collaboration for millions of users". University of Cincinnati, http://magazine.uc.edu/favorites/webonly/wanstrath.html, Retrieved 2014-07-09. G. Adomavicius and A. Tuzhilin, “Towards the next generation of recommender systems: a survey of the state-of-the-art and possible extensions,” IEEE Trans. on Data and Knowledge Engineering 17:6, pp. 734–749, 2005. Rajaraman, A, Ullman, J, Recommendation Systems. In Mining of massive datasets, Cambridge University Press, 2012. K.Goldberg,T.Roeder,D.Gupta,andC.Perkins,“Eigentaste: a constant time collaborative filtering algorithm,” Information Retrieval, vol. 4, no. 2, pp. 133–151, 2001. B. N. Miller, J. A. Konstan, and J. Riedl, “PocketLens: toward a personal recommender system,” ACM Transactions on Information Systems, vol. 22, no. 3, pp. 437–476, 2004. A. Ansari, S. Essegaier, and R. Kohli, “Internet recommendation systems,” Journal of Marketing Research, vol. 37, no. 3, pp. 363–375, 2000. K. Miyahara and M. J. Pazzani, “Collaborative filtering with the simple Bayesian classifier,” inProceedings of the 6th Pacific Rim International Conference on Artificial Intelligence,pp. 679–689, 2000. X. Su and T. M. Khoshgoftaar, “Collaborative filtering for multi-class data using belief nets algorithms,” inProceedings of the International Conference on Tools with Artificial Intelligence (ICTAI ’06), pp. 497– 504, 2006. L. H. Ungar and D. P. Foster, “Clustering methods for collaborative filtering,” in Proceedings of the Workshopon Recommendation Systems, AAAI Press, 1998. . H. S. Chee, J. Han, and K. Wang, “RecTree: an efficient collaborative filtering method,” in Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery, pp. 141–151, 2001. T. Hofmann, “Latent semantic models for collaborative filtering,”ACM Transactions on Information Systems, vol. 22, no. 1, pp. 89–115, 2004. P. Melville, R. J. Mooney, and R. Nagarajan, “Contentboosted collaborative filtering for improved recommendations,” in Proceedings of the 18th National Conference on Artificial Intelligence (AAAI ’02), pp. 187–192, Edmonton, Canada, 2002. D. Y. Pavlov and D. M. Pennock, “A maximum entropy approach to collaborative filtering in dynamic, sparse, highdimensional domains,” inAdvances in Neural Information Processing Systems, pp. 1441–1448, MIT Press, Cambridge, Mass, USA, 2002. GitHub API. , https://developer.github.com/ Mongo DB. , https://www.mongodb.org/
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Hypergraph for System of Systems modeling Hafid Haffaf R.I.I.R Industrial Computing and Networking Laboratory Computer Science Department, University of Oran 1, BP 1524 Oran, Algeria
[email protected] In the other hand, optimization in Hypergraph theory provides new methodologies in system modelling thanks to operational research evolution, especially constraint programming concept. Indeed we can easily associate hypergraph to system of constraints and vice versa, allowing concentrating efforts in modelling tasks when resolution system is let to well-known numerical solvers. So given that those notions appear frequently in the context of complex system studies, we consider issues related to the classification of the different notions of complexity in graph theory. Well known problems are then viewed as particular cases of Hypergraph optimization problems, as decomposition problem. Especially, we can sate that, if the Hypergraph associated to the physical and behavioral System of Systems constraints checks some specific conditions, then the corresponding Constraint Satisfaction Problem (CSP) will give good numerical solutions. Many formal methods using different graphical approaches in Engineering systems have been developed. These works represent a current field which concerns researchers of the automatic control community and the computer science community. They concern monitoring system design, optimization control, different forms of diagnosis, and structural properties analysis [19]. In the latter, independent set matroid conditions have been established, highlighting the point that graphical methods are well suited for defining qualitative diagnosis methods. The paper is organized as follows: Section 2 introduces system of systems paradigm and its different application domains. In section 3, most of the problems in graph theory are recalled from complexity point of view, and then, we present the generalization of these problems to Hypergraph theory. Some direct applications of this theory are first recalled in different domains. In section 4, inspired by structural analysis methods, modelling system of systems by means of hypergraph is the major contribution of this paper, and then Section 5 deals with a case study consisting in a seaport intelligent automated vehicle system.We focus on how Fault Detection and Isolation (F.D.I) methods can help CSP reconfiguration procedure. Finally, we conclude our work and discuss some perspectives.
Abstract— Hypergraphs after being used to model the structural organization of System of Sytems (SoS) at macroscopic level, it has recent trends towards generalize this powerful representation at different stages of complex system modelling. In this paper, we first describe different applications of hypergraph theory, and step by step, introduce multilevel modeling of SoS by means of integrating Constraint Programming Langages CSP dealing with engineering system reconfiguration strategy. As an application, we give an A.C.T Terminal controlled by a set of Intelligent Automated Vehicle (I.A.V). Keywords— Hypergraph model, Structural analysis, Bipartite graph, Monitoring, system of systems, Reconfiguration analysis, Hypernetworks.
I. INTRODUCTION Nowadays, systemic approach is not sufficient to deal with complexity in different areas. The complexity of engineering systems is characterized by many optimization control problems which are expressed in automatic process to insure the production, the exploitation, and the security objectives. The study of the notion of “System of Systems” requires the consideration of new paradigms that deviate from the traditional engineering paradigms which are mostly based on the physical system. A system of Systems (SoS) consists of component systems that are spread across several hierarchical levels. In each level, component systems admit various structures [11]. A graphical model is essential to represent component systems at each level and for each structure as well as relationships between these components [19]. The problematic is what kind of relation i.e what kind of system’s aspect we represent? The system’s aspect could be topological, behavioral, functional, ..etc. Up till now, and because of complex information, there is no a unified model and even graphical models are not able to express, in a unique language, all these aspects. In this paper, we shall see that hypergraph is a good candidate in System of Systems Modelling. We shall see that when bringing together the two theories, this relationship could flower many interesting practical results. The goal is to draw benefits from the available optimization graph algorithms to solve complex problem arising in SoS applications. A new graphical approach was proposed in [1] using hypergraph model of SoS applied to reconfiguration systems. The hypergraph theory introduced by Berge [3] generalizes the concept of the graph by considering more than two nodes (or component systems) connected by only one hyper-edge.
ISBN number: 978-9931-9142-0-4
II. SYSTEM OF SYSTEMS PARADIGM System of systems is an evolution of complex systems which have been defined essentially in organization systems. The first works on general system theory have been carried out by Bertelanffy [25]. A complex system not reducible, i.e the decomposition procedure could not be applied. The notion of
38
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
causality itself must be reconsidered. Because of evolutionary concept and emergency property Multi-Agent Systems (M.A.S) are the first approach under SoS have been modelled [27]. In this methodology, local behavioral rules allow global behavior when system evolves over time. Definition: SoS is defined a set of interdependent systems that work to gather to achieve a common goal or a global mission. “The loss of any part of the system will degrade the performance or capabilities of the whole" [8]. SoS is also defined by its properties as stated by Jamshidi [2]: the topology or geographic localization and dispersion, the operational independence, the managerial independence of the individual system, the temporal evolution which implies emergent behavioral properties. The most known System of system is obviously the Internet (see fig 1). It involves at least three levels: the communication level system, the channel-flow control level, and the terminal level.
and supported by novel tools for analysis, simulation, and optimization"; ROAD2SOS (Roadmaps for System-of-System Engineering), aiming to develop "strategic research and engineering roadmaps in Systems of Systems Engineering and related case studies; DYMASOS (DYnamic MAnagement of physically-coupled Systems Of Systems), aiming to develop theoretical approaches and engineering tools for dynamic management of SoS based on industrial use cases; Boing 787 System data Network, and SCCOA (aero-space system defense) [W2]. Structural analysis is a powerful tool. It is possible to analyze structural properties like controllability and observability [8] of engineering systems only by means of checking existence of relationships between variables, independently of their numerical values. This analysis allows thus to determine failures detectability and localizability, by generating Analytical Redundancy Relations (ARRs) from the model. ARRs are satisfied only when the system operates normally, thus they are a good indicator of failure existence. Computer science is another application domain where SoS could be defined. U.M.L, a unified modeling approach which requires really nine diagrams has inspired SysML, another simulation language devoted to complex systems [W1]. III. HYPERGRAPH MODELLING First of all, Networks is the first domain that highlighted difficulties in representing relationships between nodes only by graph theory. Indeed, the size as well as relations is not static, so graph theory has proposed random graphs, dynamical graph [24], Vorony Graphs, in network representation. For instance, to represent social network, we can give SMA model as follows: Gt = (Vt, Et) is the graph at time “t” where agent “i” is denoted by ai. The fact that at+1 is in discussion with ai is subject to a probability P ({at+1, ai} Et+1). Another graph modelling for complex system deals with representation of information flow in networks. Based on percolation theory, the propagation of rumor for instance is a good example for probabilistic existence for nodes and edges in graphs [26]. In the last decades, the modeling with hypergraph has seen a wide attention by the researchers, in many areas of sciences. To represent parallel data structure [18] by modeling data with vertices and template that represents a group of data elements to be processed in parallel with hyperedge. The hypergraph is also used for image processing which is focused on the determination of the properties resulting from the hypergraphs theory and on the analysis of their adequacy with image problems, particularly edge and noise detection. In addition to simple hypergraphs, directed hypergraphs have also been used in other domains such as: chemical mechanism reaction and bioinformatics [13] by describing reaction through a directed hyperedge and chemicals by nodes. Using classes of acyclic hypergraph, the relational databases schemes are also modeled through directed hypergraph where the vertices correspond to the attributes and the directed
Fig1: Internet, an example of SoS
A defense SoS example is the FCM (future Combat Mission) implying the cooperation between troops at the air, water and ground. The multimodal and inter-modal communication between different kinds of transport is considered as a natural SoS. The Global Earth Observation System of System or GEOSS project started in 2009 entails Space based, Air Based and Ocean Based monitoring systems to supervise the environment through a data management system which is able to handle information from Health, disaster management (earthquake..), agriculture, and ecology. Here are some practical projects dealing with system of systems paradigm. As they involve many interdependent systems, we can obviously remark that a lot of military applications and transport systems require SoS approach. Among these applications, we can cite: The Exploration system Architecture Study, NASA established the Exploration Systems Mission Directorate (ESMD) organization to lead the development of a new exploration “system-of-systems”;T-AREA-SoS (TransAtlantic Research and Education Agenda on Systems of Systems); COMPASS (Comprehensive Modelling for Advanced Systems of Systems); DANSE (Designing for Adaptability and evolution in System of systems Engineering), which aims to develop "a new methodology to support evolving, adaptive and iterative System of Systems life-cycle models based on a formal semantics for SoS inter-operations
ISBN number: 978-9931-9142-0-4
39
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
hyperedges to the relations [14]. NoSql language is a generalization of classical relational database semantic tacking into account functional dependencies. In production and manufacturing system [15] as well as in scheduling process, hyperedge corresponds to production system activity linking the inputs (consumed goods) to the outputs (produced goods), and then to generalize flow problem optimization to hyperflow maximization. Propositional symbols in mathematical logic [17] are nodes, and Horn clauses correspond to hyperedges, such that the left side of the clause is the tail of the hyperedge and the right side constitute the head. Other domains covered by hypergraph investigations are: Context Free analysis, semantic inconsistency in information systems [15], Clustering, Max Spanning, Hypertree decomposition, Hypernetworks and social networks, Network Control Systems, and data Mining where the Hyperedges are the of the frequent item set queries [16]. Another dedicated computer programs that perform Hypergraph partitioning to solve some combinatorial problems as protein sequencing are: hMETIS, Paraboli and Clip-Pro [W3]. Definitions: The hyper-graph has been defined formally as any couple H=(V, E), where V={v1, v2, . . . , vn} is a finite set of vertices of H and E={e1, e2, . . . , em} the set of hyper-edges which is a nonempty subset V, such that: =V Simple hypergraph: A simple hypergraph on X is an hypergraph H = (E1,…, Em) with no edge contains another Ei Ej i= j An example is shown in fig 2 where V ={S1, S2, S3, , S5} and E = { A1, A2, A3} where A1 = {S3, S4}, A2 = {S1, S2, S3}, and A3= {A1, A2, S5}. Two vertices u and v are adjacent in H = (V, E) if there is an edge e ∈ E such that u, v ∈ e. The set of all vertices u that are adjacent to v in H is denoted by NH(v). The set NH[v] = NH(v) ∪ {v} is called the (closed) neighborhood of v. If any two distinct vertices u, v ∈ V can be distinguished by their neighborhoods, that is, NH[u] ≠ NH[v], then the hypergraph H = (V, E) is called thin. A vertex v and an edge e of H are incident if v ∈ e. The degree deg(v) of a vertex v ∈ V is the number of edges incident to v. The maximum degree max v ∈V deg(v) is denoted by ΔH or just by Δ.
vertices v, v′ of H is the length of a shortest path joining them. A hypergraph H = (V, E) is called connected, if any two distinct vertices are joined by a path. A partial hypergraph H′ = (V′, E′) of a hypergraph H = (V, E), denoted by H′ ⊆ H, is a hypergraph such that V′ ⊆ V and E′ ⊆ E. In the class of graphs, partial hypergraphs are called subgraphs. A partial hypergraph H′ ⊆ H is a spanning hypergraph of H if V(H′) = V(H). H′ ⊆ H is induced if E′ = {e ∈ E | e ⊆ V′}. Another interesting application concerns wireless networks, noting that a hypergraph model has been proposed for cellular networks [12]. Indeed, user nodes communicate with each other using a shared wireless spectrum. This resource sharing introduces co-channel interference, which may cause severe deterioration to the communication quality of a communication link. Using graph theory, two links i j are allowed to share the same resource if and only if (i ,j) does not belong to set of edges (E). A forbidden set is a group of cells all of which cannot use a channel simultaneously; this notion is close to independent set in matroid theory [8] which is a generalization of hypergraph. In this theory, searching for maximal independent set is a polynomial problem. Thus, Hypergraph model enhances TDMA scheduling communication in cellular networks [11], especially capacity erlangs per channel. Hypergraph is also a model for a virtual Network. “Maya” is a cell biology simulator which can represent polygon faces and multiple 3 D schemas leading to 3D animations [W4]. Kinematic of multibody systems is also a Hypergraph based Audi R8 project in industrial vehicles design. IV. SYSTEM OF SYSTEMS HYPERGRAPH MODELLING The constraint representation allows modeling any kind of relationship between variables, between semantic concepts like. These relations could be linear or not, discrete or continuous, …etc. All that we have to represent is the existence of these relations independently about the manner these concepts are related together. For instance, we can merely write the mathematical equation Y =f( x1, x2, g(x3, x4)) for the Hypergraph shown in fig 3. This idea is well known in graphical diagnosis methodologies as structural analysis [20]. Indeed, The graph structures are independent of the numerical values of the system parameters, and a structural graph represents only the existence of variable in a relation (or table, ..) with a non-vanishing value. It can represent also a rule in deduction system (expert system for example) x1, x2, x3, x4 y. Y could be cluster head of nodes x1, x2, x3,x4 in Wireless Sensor Network technology (WSN). In databases system, {Y, x1, x2, x3 ,x4 } could be either columns of table between which there exist functional dependencies in relational databases, either a semantic relation in data mining terminology.
Fig 2: Hypergraph example
A sequence P v0,vk = (v0, e1, v1, e2, . . . , ek, vk) in a hypergraph H = (V, E), where e1, . . . , ek ∈ E and v0, . . . , vk ∈ V, such that each vi−1, vi ∈ ei for all i = 1, . . . , k and vi ≠ vj, ei≠ ej for all i≠ j with i, j ∈ {1, . . . , k} is called a path of length k (joining v0 and vk). The distance dH(v, v′) between two
ISBN number: 978-9931-9142-0-4
40
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Fig3: Hypergraph Model
Fig 4: Constraint Hypergraph Model
All these situations can be represented by the same and only hypergraph model to represent structural, functional or behavioral system’s aspect, because up till now, we have been obliged to use different kind of models, each model corresponding to a given level. Physical level was modelled by Bond-graph for instance and macroscopic level by graphs, Petri nets or finite state automaton …etc. The fact that it is necessary to use different kinds of models or graphs is a drawback when we have to move through different abstraction levels. Multi-levels representation is the strong point of SoS hypergraph modelling as it allows to gather some low level concepts and/or components by considering inverse zoom process. Regrouping variables in only one hyper node is a natural down-top abstraction phenomenon. If we take the SoS properties, except for evolutionary property which requires more dynamical graphs to take into account temporal aspect of the SoS, we can easily check each of them in the hypergraph model. Managerial independence states that we can consider each system separately under systemic approach. Graphical approaches have been early proposed in SoS modelling. Unfortunately, due to multiple constraints and complexity of the underlying combinatorial problems, no general method, and especially no a unified approach for solving this problem has yet been found. Moreover, the interest of this approach is to characterize the structure of the hypergraph on which optimization algorithms will provide good solution as we have stated before. The major interest is that any CSP instance with an acyclic Hypergraph structure is solvable in polynomial time. Maximization of Horn SAT corresponds to minimization cut in directed Hypergraph [17]. By performing Graham algorithm, it is possible to get join tree decomposition. The innovative interest of the approach is the possibility to associate to the hypergraph a set of constraints where each hyperedge corresponds to a constraint. If the nodes are variables, then the resulting relationships constitute a Constraint Satisfaction Program (C.S.P), and all the process boils down to numerical solver invocation. For example let P = (X, D,C) X = { X1, X2, X3, X4} the set of variables, C = { C1, C2, C3} the set of constraints, D is the variables domain, and the corresponding hypergraph are shown in fig 4.
ISBN number: 978-9931-9142-0-4
V. APPLICATION TO AUTOMATED CONTAINER TERMINAL An efficient transport system can be realized only if robust and integrated Information and Communication Technology (I.C.T) systems are implemented in various transport components. Many researchers have been devoted to find the application of information and communication technology in transport which resulting in Intelligent Transport System (I.T.S). An assessment of ICT in ITS domain solutions of multimode traceability for the goods, safety in the supply chain, exchanges of information between the actors of the chain of provisioning, decision-making aid and planning for the infrastructures is the aim of many European projects. One of the major challenges which are facing by transportation engineers and planners in intelligent transport systems is the accuracy of models they use for planning and operational analysis of traffic. Intelligent Automated Vehicle (IAVs) has emerged from Automated Guided Vehicle (AGV) in the last decade [21] to integrate intelligence in vehicle. This intelligent is often associated with the possibility to communicate between vehicles. Terminal management and control is a good example for integration system in the future, where different technologies interment for automatic vehicle guidance, inventory tasks, code bars identification, and loading/unloading operations [23]. I.A.Vs coordination is centralized where a single decision maker is responsible for solving task allocation, motion planning and coordination problems. This decision becomes a performance bottleneck with severe limitations in terms of scalability. In the seaport application described below, Microscopic model describes both the space-time behavior of the system’s entities (i.e. vehicles and drivers) as well as their interactions at a high level of detail (individually). Macroscopic Model deals with more abstract information and decision system. Hypergraph model is a generic approach for modeling of traffic flow, which combines both the modeling approaches based on micro and macro level of details. Here we will describe briefly the three systems in this SoS. A) Communication system Communication system, many protocols are involved to represent three kinds of communications: IAV2IAV, IAV2infrastructure and IAV2RFID Container communication. The typical layout for the automated port container is shown in fig 5. The RF (Radio Frequency) radio module allows the
41
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
IAV node to communicate wirelessly with another IAV node or with a confined base station represented generally by the infrastructure, yielding to achieve various data exchanges in ITS. Between each IAV and the corresponding container in loading/discharging operations using Radio Frequency IDentification (RFID) system in which typically a reader is placed on IAV and a tag attached to the container [28]. The SoS allows a better data manager diagnosis in the case of communication failure by taking into account information from the other systems.
In transport system, more improvements are required to be done which support green alternatives, intermodality and efficient cooperation among the stakeholders. Some works hypergraph theory show that the algorithm of dual management allows to solve a problem of the network optimization in dynamics of initial information. D) The whole System of Systems Then, summarizing the whole SoS as it is shown in fig 6, many systems are involved at different levels [22]. The physical system at the lowest level is modelled by Bondgraphs [20]. At the mesoscopic level, we find the three systems described above, and the different aspects could be represented by Hypergraphs. The reconfiguration analysis is performed from a top-down graph. We suppose that the considered scenario for the reconfiguration strategy consists of removing the faulty IAV (in this case a container), after a FDIbased diagnosis phase would have located the source of failure by means of constraint satisfaction problem formulation. When a subset of constraints is not satisfied, this means that some faulty state variables provide not expected values, and then a MAX-SAT program takes over in a compensation process. We see that components and systems cooperate between them to accomplish a mission or a goal. This functional principle is depicted by gathering these components by hyperedges, and this corresponds exactly to the above definition of SoS. Finally, further investigations deal with applying Hypergraph multi-paths analysis to decrease complexity in structural analysis algorithms like controllability, observability and diagnosability; properties which could be then generalized to System of Systems.
Sea Containership
Buffer area Loading/unloading bay
Quay crane
Loaded IAV
Empty IAV Marshalling yard Reserved IAV
CBS
Yard area
Marshalling yard Block
CBS
Automated Yard crane
Freight container Transfer point Wireless communication link
Human interface
IAV mobility trajectory
CBS
Fig 5: Communication Model in Automated Control Terminal
B) Information system In general, decision support Systems are systems and/or human operators that make decisions based on data that is culled from a wide range of information sources. In this case, the system will act as an automaton reacting to failure detection and isolation procedure. The information concern the state of the vehicles, the state of the traffic, of communications between all the components and operators exchanging data when each operation (loading, unloading, transport, ..) in the A.C.T is carried out. In the case of breakdown of an IAV, F.D.I based-residual generation methods are used to identify the origin of the event. Reconfiguration messages are transmitted from the container terminal administration to remotely activate a reserved IAV parked in the ACT which will replace the failed IAV. This type of communications has to verify temporal constraints and also requires reliable transmissions. Classically, this kind of system is constituted by the model (automaton, ..), and different user interfaces. C) Transport system Transport system is the domain where many optimization problems arise. A transit system can be modelled as special network in which there are two sets of nodes and sets of arcs. Nodes represent either centroids of zones in which the area is partitioned or nodes corresponding to the line stops. The arcs connect centroids end nodes. The hypergraph allows proposing a traffic assignment problem formulation in terms of HyperFlow program.
ISBN number: 978-9931-9142-0-4
Fig 6: Intelligent Transport System Application of SoS
VI.
CONCLUSION
Due to increasing complexity in large scale systems, i.e the topology, the combinatorial structures and the multidimensional at multi-levels, relationships between components and subsystems, Hyper-graph is considered the most adequate tool for the purpose modelling. The multi or hyper adjectives are used in order to merge in the same model
42
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria Systems, 33(1):1–17, 2008. [15] Carvalho.A Cristo.M-Ziviani.N Couto.T Berlt.K, Moura.E.S. Modeling the web as a hypergraph to compute page reputation. Information Systems, 35(5):530–543, 2010 [16] Dan A.Simonvick & Chabane Djeraba “Mathematical tools for Data Mining” Set Theory, Partial Order, combinatorics; SpringerVerlag Edition 2014 [17] Stephen Muggleton, Ramon Otero, Alizera Tamadonni “Inductive Logic Programming”Berlin, New York Springer- 2007 [18] Kolda.T.G Hendrickson.B. Graph partitioning models for parallel computing. Parallel Computing , 26:1519–1545, 2000. [19] Kazuo Murota “Systems Analysis by Graphs and Matroids” Algorithms and Combinatorics, Springer-Verlag Berlin Heidelberg, 1987 [20] B. Ould Bouamama, G. Biswas, R. Loureiro , & R. Merzouki “Graphical methods for diagnosis of dynamic systems: Review” Annual Reviews in Control 38 (2014) 199–219; Elsevier 2014 [21] Jerome H¨arri, Fethi Filali, and Christian Bonnet “Mobility Models for Vehicular Ad Hoc Networks: A Survey and Taxonomy” IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 11, NO. 4, pp 19-41 FOURTH QUARTER 2009 [22] K.Wissam, R. Merzougui & H.Haffaf “Model-Based Supervision of a Platoon of Intelligent Autonomous Vehicles » IEEE SSRR 2011 (KYOTO) Japan [23] W. Yan, Y. Huang, D. Chang, J. He “An investigation into knowledgebased yard crane scheduling for container terminals” Advanced Engineering informatics (Elsevier) 0.1016/j.aei.2011.03.001, 2011. [24] Anderson Grant “Applications of Graph Theory to the Analysis of Chaotic Dynamical Systems and Complex Networks” A Senior Project submitted to The Division of Science, Mathematics, and Computing of Bard College; Annandale-on-Hudson, New York December, 2012 [25] Ludwig Von Bertalanffy “Théorie générale des systèmes” ; Dunod, Paris, 2012 [26] Gaogao Dong, Lixin Tiana , Ruijin Dua, Min Fua ,& H. Eugene Stanley “Analysis of percolation behaviors of clustered networks with partial support–dependence relations” Physica A 394 (2014) 370–378; Elsevier 2013 [27] Ling Yuan, Ping Fan Formal Modeling and Verification of Multi-agent System Architecture; 2013 AASRI Conference on Parallel and Distributed Computing and Systems; Volume 5, 2013, Pages 126–132 [28] S. J. B;Torres, T. M. Fernandez-Carames, M. G;-Lopez, C. J. EscuderoCascon, “Maritime freight container management system using RFID”, The Third International EURASIP on RFID Technology, pp. 93-96, 2010.
different kinds of knowledge, links, and rules that convey the system behaviour. In this paper, we have seen how to build the hypergraph from the constraints that aggregate the variables and/or components. We can then associate to these constraints a CSP which can check the satisfiability property. Finally, as an application, we have given the A.C.T in an automated seaport where different systems are involved. The three major systems in The SoS have been pointed out. Future works is to try how we can generalize the SoS hypergraph methodologies to other kinds of SoS different that stemming from engineering systems. REFERENCES [1] K.Wissam, H.Haffaf R. Merzougui , & Ould Bouamama “Hypergraph Models for System of Systems Supervision Design » July 2012 ISSN 1083-4427 IEEE Systems, Man, and Cybernetics , Vol 42 Number 4, pp 1005-1012 [2] M. Jamshidi, System of Systems Engineering: Principles and Applications, CRC Press, Boca Raton, 2009 [3] C. Berge. GRAPHS AND HYPERGRAPHS. North-Holland, Amsterdam, 1973. [4] Berge: Hypergraphs: combinatorics of finite sets; North Hollande ; mathematical library 1989 [5] D. Luzeaux, JR Ruault & JL Wippler, "Complex System and Systems of Systems Engineering", ISTE Ltd and John Wiley & Sons Inc, 2011 [6] Graph connections – edited by Lowell W.B and Robin J.Wilson Oxford Science publications 1997 [7] K.Pushpandra, H. Haffaf, R.Merzouki « Microscopic Traffic Dynamic and Platoon Control Based on Bond graph Modelling » IEEE –ITSC 6-9 Octber 2013, la hague; the Netherlands [8] Hafid Haffaf, & Belkacem ould Bouamama “Matroid Algorithm for monitorability Analysis of bond graphs” Vol 343 (Issue 1) Pages : 111-123 Revue : Journal of the Franklin Institute Fev 2006 Edition : Elsevier Sciences Inc. [9] A.Ibtissam & H.Haffaf « Hypergraph Reconfigurability Analysis » pp 22-32 Elsevier; Juillet 2013 [10] Wissam Khalil « Contribution à la modélisation graphique de système de systèmes », Thèse de Doctorat, Université Lille1, 2012. [11] Saswati Sarkar and Kumar N. Sivarajan Hypergraph Models for Cellular Mobile Communication Systems” IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 47, NO. 2, MAY 1998 [12] Qiao Li, and Rohit Negi “Maximal Scheduling in Wireless Ad Hoc Networks With Hypergraph Interference Models” IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL.61, NO. 1, Jan 2012 [13] Ze Tian†, TaeHyun Hwang† and Rui Kuang “A hypergraph-based learning algorithm for classifying gene expression and array CGH data with prior knowledge” Vol. 25 no. 21 2009, pages 2831–2838 doi:10.1093/bioinformatics/btp467 [14] Cambazoglu.B.B Demir.E, Aykanat.C. Clustering spatial networks for aggregate query processing: A hypergraph approach. Information
ISBN number: 978-9931-9142-0-4
Webography (last access 2015 January 26th) [W1] http://www.sparxsystems.com.au/downloads/ebooks/Embedded_Systems_De velopment_using_SysML.pdf [W2]http://rpdefense.over-blog.com/tag/sccoa/ [W3]http://glaros.dtc.umn.edu/gkhome/metis/hmetis/overview [W4]http://www.infiniteskills.com/training/learning-autodesk-maya2015/hypergraph.html
43
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Application layer versus IP Multicast in Manet Mounia DOUDOU, Lamia MAHNANE, Mehdi NAFAA LRS Laboratory, University of Badji Moktar P.O. Box 12, 23000, Annaba, Algeria
[email protected] [email protected] [email protected] multicasting in an ad hoc setting. For this we present extensive simulations of both ODMRP network-layer multicast protocol and ALMA application layer multicast.
Abstract— Multicast routing in mobile ad hoc networks (MANETs) poses several challenges due to inherent characteristics of the network such as node mobility, reliability, scarce resources, etc. In this paper we have focused our attention on some experimentation and simulation results that demonstrate the performance of IP multicast protocol (ODMRP) compared to the multicast application (ALMA) protocol to examine the benefits of using the application layer multicast from the multicast network layer. Based on the statistical results, we can conclude that the multicast application is a viable option for multicasting in ad hoc networks if the application needs reliability or other special conditions; It even performs very well, unless the group membership becomes extremely large. In such cases, a network layer protocol (in particular ODMRP) seems to be the favorable layer multicast, if the performance in terms of goodput or packet delivery ratio or energy consumption is the only metric of interest.
We study the factors that affect the performance of ALMA and the sensitivity of its performance to various system parameters. Our work can be summarized in the following points: • ALMA outperforms the previous best-performing application layer multicast protocol in terms of goodput and reliability. • ALMA performs favorably even when compared with network layer multicast protocols, more specifically ODMRP. ALMA exhibits better goodput than ODMRP for moderately sized to reasonably large 1 group sizes where 20% to 40% of nodes are a part of the group.
Keywords— Multicast; Ad hoc; routing in Manet; multicast streaming; multicasting; ad hoc wireless networks.
• We examine the sensitivity of our protocol to different scenarios and we show how we can fine-tune its performance by choosing appropriate values for various system parameters.
I. INTRODUCTION The high level goal of this work is to examine either the advantages of application layer multicasting as seen in mobile ad hoc networks can carry over in ad hoc networks. To do this, we compare the performance of IP multicast and application layer multicast, with respect to a variety of metrics through extensive simulation study.
The rest of the paper is organized as follows. In the first set, we discuss relevant related work in section 2. We describe IP multicast (ODMRP) and application layer multicast (ALMA) in the second set (section3). In section 4, we present the performance evaluation and the comparisons of IP multicast protocol (ODMRP) and application layer multicast (ALMA), Then, in section 5 we present the simulation results, we discuss our observations. In Section 6 we present our conclusions and possible future work.
The application of layer multicasting has received a lot of interest; however, many new challenges arise in using application layer multicasting in ad hoc networks.
II.
As of October 1999, only CAMP and ODMRP designers have performed simulation study of their protocols. AMRoute and AMRIS performance evaluations have not been published. In Simulation works reported in [8], [9], [10], the results are quite different from the results we have obtained in our experiments. In [8], [9], [10], a simplified simulator was used. A perfect channel was assumed and radio propagation was not considered. FAMA [11] was used as the medium access control protocol, which is different from IEEE 802.1 1 [12], the emerging standard MAC protocol for wireless LAN, that we used in our simulation. Only a small portion of network hosts had mobility (5 out of 30 or 15 out of 30) in their study. The
First, the use of application layer multicast can result in the transmission of multiple copies of multicast data packets over each physical link. This is exactly because non-multicast group members cannot make copies of multicast packets. This effect is especially visible when there are a large number of Multicast group members and/or if the network load is high. Second, with mobility, using logical links may lead to sub-optimal paths, since the communicating member nodes are not aware of increases to their possibly small initial physical hop count distances from the source. Reconfiguring the logical connections is possible, but it introduces overhead. Furthermore, the frequency of these reconfigurations has not been examined yet. In this paper we have focused our attention on the evaluation of the efficiency of the application layer
ISBN number: 978-9931-9142-0-4
RELATED WORK
44
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
by a group of nodes known as forwarding nodes. These nodes forward the data packets between the source and destinations, and keep a message cache which helps in the detection of duplicate data and control packets.
critical nodes for CAMP performance (e.g., core, senders), however, remained stationary. All the nodes in [8], [9], [10] were multicast session members, which is not realistic in typical multicast applications. The network traffic load was extremely light (4 packets/sec). Information on data size, radio propagation range, or simulation terrain range was not given. Thus, the results in [8], [9], [10] are somewhat limited. In any way, they cannot be directly compared to the results from this paper. III.
In the Mesh establishing phase between the source and receivers, a JoinReq control packet is flooded by the sender periodically for the creation of mesh. The receivers respond to the request by sending a JoinReply through the shortest reverse path. Each intermediate node that receives the JoinReq packet stores the upstream node Identity before broadcasting the packet. The JoinReply packet consists of the Source Id and the Next Node ID. An intermediate node on the receipt of a JoinReply packet sets a forwarding flag thus becoming a member of the forwarding group of that multicast group.
IP MULTICAST
A. Definition As we can see in figure 1, IP multicast is a bandwidthconserving technology that reduces traffic by simultaneously delivering a single stream of information to potentially thousands of corporate recipients and homes. IP Multicast delivers application source traffic to multiple receivers without burdening the source or the receivers while using a minimum of network bandwidth (see figure 1). A
Mesh Maintenance is carried out by soft state approach, in which routes are reestablished between the source and destination by the sending of periodic JoinReq packet by the source. This protocol is resistant to link and a node failure since it has a forwarding group which is in fact a merit of mesh-based protocols. The drawback is that it has higher control overhead and multiple transmission of same data packet through the network leads to decrease in efficiency of the multicast group.
B E
F
C
D
Fig. 1. IP Multicast 1) Advantage of IP Multicast is that No duplicate packets are sent across any physical link and hence there is efficient bandwidth utilization. 2) Disadvantages of IP Multicast
The first problem is that IP Multicast requires every router to maintain the group state information. This violates the initially envisioned “stateless” principle scalability constraints.
Fig.2. On-Demand procedure for membership setup maintenance
The second problem is that IP Multicast tries to conform to the traditional separation of network and transport layers. This worked well in the unicast context but other features like reliability, congestion control, flow control and security are difficult to implement.
IV.
C. Definition An alternative to this proposed approach is the Application Layer Multicast (see figure 3) in which all the functionality of multicast is pushed to the end systems or end hosts. Application layer multicasting can implement many complex features of multicast functionality basically constructs an overlay structure among all hosts in the network and then sends messages to the either end hosts in the overlay structure, implementing all other features of multicast is easier at application layer rather tat network layer.
The third and final problem is that it requires changes at the infrastructure level and hence it is not easy to deploy.
In our studies, the IP multicast protocol that we have focused our attention is the ODMRP protocol, it was chosen since it has been shown to be the best performer in the comparative study reported in [1]. B. The ODMRP protocol The On-Demand Multicast Routing Protocol (ODMRP) [2] is an on-demand mesh based protocol where a mesh is formed
ISBN number: 978-9931-9142-0-4
APPLICATION LAYER MULTICAST
45
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
V.
PERFORMANCE EVALUATION
We evaluate the performance of ALMA and other multicast protocols by performing extensive simulations with the GloMosim, parallel simulation software developed at UCLA using PARSEC [5]. E. The Simulation Environment We use two different simulation scenarios in order to include as many cases as possible. The first scenario is geared towards long paths, while the second scenario is the more typically used scenario in the literature.
Fig. 3. Application Layer Multicast 1) Advantages of Application Layer Multicast
2)
The overlay structure is built on existing physical links. So we may have multiple overlays over a single physical link hence there will be redundant traffic across the links.
No more routers need to maintain the per group state information. And the end systems or end hosts take up this responsibility. Since these end systems are part of very few groups it becomes easy to scale the systems.
Supporting higher layer features such as error, flow, and congestion control can be significantly simplified by leveraging well understood unicast solutions for these problems, and by exploiting application specific intelligence.
Scenario 1: The scenario has 120 wireless mobile nodes in a 1000m • 1000m region. To introduce longer distances, we select the radio transmission range to be 125m. Such a relative small range could lead to network partitions, which would obscure the results. A way around this is to guarantee a good spread of nodes. Therefore, we make 81 nodes statically positioned in a 9 • 9 grid. Each node in this grid is within a single-hop distance from its neighbors. We allow the other nodes to roam at certain chosen speeds (different speeds are considered). We use the random way point model in our experiments. In some of the experiments, we set the minimum speed to be equal to the maximum speed, i.e., the speed is constant for all nodes. Our motivation for using this model was based on recent results that show that with the random way point model nodes converge to slower speeds as the simulations progress [6] and our objective was to isolate the effects of speed on the performance of the multicast protocols. The pause time is 30 s as in other similar work [7].
Disadvantages of Application Layer Multicast If the application layer does not originate new packets as expected, the routing layer of the sender will issue special keep-alive packets to maintain the multicast tree. The sender occasionally uses network floods of data packets for finding new members.
Scenario 2: This scenario is the one used in the performance evaluation of ODMRP [2]. In fact, we used this scenario to establish that our simulator produces the same results for ODMRP as reported in the evaluation by its creators. The simulated network consists of 50 mobile nodes that move in accordance to the random-way-point mobility model within a 1000m • 1000m region.
The Application layer multicast protocol that we have chosen for our comparative study, is the ALMA protocol, It was chosen since it has been shown to be ranked among the best application layer multicast protocol in ad hoc network, in the comparative study reported in [3].
The radio transmission range is 250m, which leads to fairly short distances (approximately 3–4 on average). The modified random-way-point model as described earlier is used with a pause time of 30s and the chosen fixed speed is varied as before.
D. The ALMA protocol Application ALMA (Application Layer Multicast Algorithm) [4] creates a tree of logical links between the group members. The aim of this protocol is to reduce the cost of each link in the tree by reconfiguring the tree under mobility and congestion situations.
We assume a raw maximum achievable data rate of 2Mbps. Each member joins the group at the beginning of the simulation and remains in the group until the end of the simulation. Mobility causes reconfigurations and therefore, nodes often disconnect from and rejoin the tree. Each simulation lasts for 1000s of simulated time. We varied the group size from 5 to 40 and the moving speed is varied from 0m/s to 12m/s. The traffic generated is constant bit rate (CBR) traffic.
When a node joins the network it must select a node as a parent, so as to become part of the tree. If tree performance drops below a defined threshold, the node must reconfigure the tree by switching the parent or freeing children. This mechanism leads to a complex loop avoiding and detection system, since synchronous switching can occur. ALMA also considers the existence of a rendezvous host for obtaining the structure of the logical tree as well as neighbor information in the bootstrapping process.
Scenario 3: We have used this scenario for studding energy consumption. The simulated network consists of 120 mobile nodes that move in accordance to the random-way-point mobility model within a 1000m • 1000m region. The radio transmission range is 250m, which leads to fairly short distances (approximately 3–4 on average). The modified random-way-point model as described earlier is used with a
ISBN number: 978-9931-9142-0-4
46
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
pause time of 50s and the chosen fixed speed is varied as before.
compare the performance of ALMA and ODMRP in terms of the packet delivery ratio and goodput. The results are shown in Figs. 4–5, which we discuss below.
We assume a raw maximum achievable data rate of 2Mbps. Each member joins the group at the beginning of the simulation and remains in the group until the end of the simulation. Mobility causes reconfigurations and therefore, nodes often disconnect from and rejoin the tree. Each simulation lasts for 100s of simulated time. We varied the group size from 5 to 40 and the moving speed is varied from 0m/s to 12m/s. The traffic generated is constant bit rate (CBR) traffic.
ALMA compares favorably with ODMRP for large group sizes: Next we repeat the experiments with a large group size (20 nodes). This corresponds to a group density of 40%.We observe from Fig. 7 that ALMA performs favorably with ODMRP. The performance for low mobility is almost identical. The performance of ALMA degrades much more rapidly than ODMRP with group size since the number of multicast copies that traverse a single physical link now increases i.e., the stress increases. This in turn, increases congestion and causes the performance to degrade. Furthermore, with mobility, the performance worsens due to an increased frequency of reconfigurations which causes an increased number of control packets as well. However, in spite of these effects, the performance of ALMA is only worse than ODMRP by about 5% when the data rate is 2kbps in terms of the packet delivery ratio. The goodput for ALMA and ODMRP are almost identical over the range of speeds considered. The result is not presented here due to space limitations.
F. Performance Metrics We use the following performance metrics to evaluate ALMA and to compare it with the other multicast protocols: Multicast tree cost: The total number of the physical links that make up the logical links in the multicast delivery tree. This metric represents the goodness of the structure created by the application layer multicast protocol. Packet delivery ratio: The ratio of the number of packets actually delivered to the receivers versus the number of data packets that were actually expected. This metric is used to quantify the reliability of the multicast protocol. Goodput: The number of useful bytes (excluding duplicate bytes) received by the application process at a receiver per unit time. We use this instead of throughput since this definition is appropriate for comparing protocols with retransmissions (if applicable) as we do here.
In terms of energy efficiency: ALMA shows significantly lower energy consumption as evidenced in Figure We observe from the Fig.5 a decrease of energy consumption on the network for the ALMA protocol. This can be explained by the reduction in the number of relayed nodes and thus the number of sent packets. ODMRP achieves a reduction of 24 to 32% compared with ALMA. In terms of energy efficiency, ECMANSI shows significantly lower energy consumption per packet received than the other two protocols, as evidenced in Figure 6(c). In a 50-node network, it consumes approximately 65% less energy than ODMRP, and 35% less than MANSI. The differences become smaller as the network size increases, which is most likely caused by the overhead from periodic control packets transmitted at full power. So for longer energy consumption ALMA compares favorably with ODMRP. Our study of simulation results show a further degradation of ALMA performance as we increase the group size further (a group density of 60% was considered). ODMRP on the other hand continued to perform well (packet delivery ratio of about 80%). With a 2kbps source rate, ODMRP outperformed ALMA by about 18% when a speed of 6m/s was considered.
Energy consumed per data packet received: indicates the ratio of the amount of energy consumed for transmitting packets to the number of data packets received. Energy consumption is measured from every byte that is transmitted over the channel during an entire simulation. This metric reflects efficiency of a protocol in terms of energy usage. G. Performance Analysis We describe in detail our simulation results and provide explanations of the observed behavior. ALMA performs favorably as compared with ODMRP: We compare the performance of application layer multicasting with that of a network layer multicast protocol. Naturally, we pick the two most promising protocols in each class: ALMA and ODMRP; the latter was shown to have a very competitive performance as compared with other network layer multicast protocols for ad hoc networks [7]. For fairness, we restrict our studies to unreliable data delivery and we use UDP for the logical links in ALMA. ODMRP does not support guaranteed packet delivery like most known network layer multicast protocols for ad hoc networks. We stress that it is an advantage of ALMA that it can exploit the reliability of TCP. In this series, UDP is used in all of the following experiments with the set-up of scenario 2. We
ISBN number: 978-9931-9142-0-4
47
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Note however, that with these extremely large group sizes the multicasting to the group approaches the function of achieving a broadcast. Clearly, ODMRP is still a very good protocol under these Scenarios. We plot the good put achieved by ODMRP and ALMA versus the group size in Fig. 13. We see that ALMA outperforms ODMRP if the group density is below 46% (group size of approximately 23). Beyond this, ODMRP outperforms ALMA. In the scenario in Fig. 13, all nodes move in accordance to the mobility model described earlier with a speed of 6m/s.
Fig . 4. Packet delivery ratio versus speed (group size = 10).
Fig. 7. Goodput versus the group size (speed = 6m/s).
Fig . 5. Packet delivery ratio versus speed (group size = 20).
Fig. 8. Energy consumption vs group size VI.
In this paper, we investigate t h e b e n e f i t s o f u s i n g application layer multicasting in ad hoc networks. For our comparative study, we have selected an application layer multicast, the ALMA protocol that is it is arguably the best application layer protocol in terms of the metrics that we consider in ad hoc networks. The network layer multicast or IP multicast selected for this study is the ODMRP protocol,
Fig . 6: Goodput versus speed (group size = 10).
The reasons for this degradation in the performance of ALMA were again due to an increased number of copies of multicast packets.
ISBN number: 978-9931-9142-0-4
CONCLUSION
48
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
[5] Parallel Simulation Environment for Complex Systems (PARSEC), retrieved June 2010 from http://pcl.cs.ucla.edu/projects/parsec/.
which is, in turn, arguably one of the best network layer multicast protocols in ad hoc networks. So the main idea of this study was to compare the most efficient protocol of application layer multicast and the one of network multicast layer. We believe that ALMA performs well even when compared with ODMRP. It is a viable candidate for deployment given that it is simple to deploy, can exploit the ability of the transport layer in terms of providing reliability, and can be made secure with relatively simpler mechanisms. It even performs very well, unless the group membership becomes extremely large. In such cases, a network layer protocol (in particular ODMRP) seems to be the favorable layer multicast, if the performance in terms of good put or packet delivery ratio or energy consumption is the only metric of interest. ALMA performs favorably as compared with ODMRP for moderately sized or reasonably large multicast groups. However, beyond a certain group size, due to an increase in the number of multicast data copies injected into the network, the performance of ALMA degrades and is worse than ODMRP. We conclude that ALMA, and in general application layer multicast, is a viable choice for multicasting in ad hoc networks if the application needs reliability or any other special requirements. Furthermore, it is a good choice if the group size is small even for unreliable multicast of UDP data. Which means the application layer multicast is a best choice if the group size is small in ad hoc network. This conclusion allowed us to select the best layer mlticast, that is the application layer multicast if the group size is small in Manet; with the aim is to use it in future research and to develop it.
[4]
Min Ge, Srikanth V. Krishnamurthy, Michalis Faloutsos. Application versus network layer multicasting in ad hoc networks: the ALMA routing protocol. October 2004.
ISBN number: 978-9931-9142-0-4
S.J. Lee, et al., On-demand multicast routing protocol in multihop wireless mobile networks, ACM/Baltzer Mobile Networks and Applications, special issue on Multipoint Communications in Wireless MobileNetworks7(6) (2002). J.J. Garcia-Luna-Aceves and E.L. Madruga, “The Core-Assisted Mesh Protocol,’’ IEEE Journal on Selected Areas in Communications, vol September 2006.
[10] X. Xiang, X. Wang, Y. Yang, "Supporting Efficient and Scalable Multicasting over Mobile Ad Hoc Networks", IEEE Transactions on Mobile Computing, vol.10, no. 4, pp. 544-559, April 2011 [11] L. Boroumand , R. H. Khokhar, L. A. Bakhtiar2 and M. Pourvahab3 “A Review of Techniques to Resolve the Hidden Node Problem in Wireless Networks,” In Smart Computing Review, vol. 2, no. 2, April 2012. [12] IEEE Computer Society LAN MAN Standards Committee, Wireless LAN Medium Access Protocol (MAC) and Physical Layer (PHY) Specification, IEEE, 03 Park avenue, New York, October 2009.
[2] Yi, Y.,Lee, S., Su,W., Gerla, M.: On-Demand Multicast Routing Protocol (ODMRP) for Ad Hoc Networks. August 2014 M. Ge, M. Faloutsos, S.V. Krishnamurthy, Overlay multicasting for ad hoc networks, in: The Third Annual Mediterranean Ad Hoc Networking Workshop, Bodrum, Turkey, 2004
[7]
[9] Eric Astier, Abdelhakim Hafid and Sultan Aljahdali “An efficient meshbased multicast routing protocol in mobile ad hoc networks” Wirel. Commun. Mob. Comput. (2010)
S.J. Lee, W. Su, J. Hsu, M. Gerla, and R. Bagrodia. A performance comparison study of ad hoc wireless multicast
[3]
J. Yoon, et al., Random waypoint considered harmful, IEEE INFOCOM (2003).
[8]
REFERENCES [1]
[6]
49
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
Adaptive Atomicity in Web Services Composition
Zohra Mahfoud
Nadia Nouali-Taboudjemat
Department of Computer Sciences. Faculty of Sciences, Bouira University Bouira, Algeria
[email protected]
Division of Theory and Computer System Engineering, CERIST Algiers, Algeria
[email protected]
compensating transaction, which semantically undoes its effects. The effects of a sub-transaction are made visible after its commitment. If the Saga abort each of the committed sub-transactions are undone by applying the corresponding compensating transactions. The subtransactions of a Saga don’t guaranty the strict atomicity as these results can be consulted between the moment of the commitment and the moment of the compensation. Therefore, we talk about the “Semantic Atomicity”. Another notion of atomicity was suggested by the flexible model [5], it is the “Relaxed atomicity” that permits to validate transactions even if some of its subtransactions are not validated. These notions have been extended by allowing retriable sub-transactions to be retried a finite number of times in the case of failure of the first execution. In this paper, we propose AdapWS a transactional model for Web service composition, which allows Adaptive Atomicity that correcting the property of atomicity at runtime by adapting execution of the environment change and the needs of the users, in order to increase the chances of the compositions validation.
Abstract— The Transaction management is an elegant way to preserve consistency of applications. However, it is influenced by the characteristics of the applications and theirs execution context, which is the reason for what traditional transactional models are not suitable for Web services area where the transactions have varying durations, Web services have heterogeneous transactional properties, they can be modified at any moment, the needs of users can be changed at runtime, etc. In this paper, we analyze the transactional needs on the Web services domain, we focus on the property of the atomicity, we outlines the needs of flexibility and adaptability, and we propose a transactional model for Web service composition with Adaptive Atomicity, which allows correcting the property of atomicity at runtime by adapting execution of the environment change and the needs of the users, in order to increase the chances of the compositions validation. Keywords- Transactions; Web services composition; Advanced transactional models; ACID properties; Relaxed atomicity; Adaptive atomicity.
I.
INTRODUCTION In what follows, we discuss the transactional aspects in Web service composition (II). Then, we present our proposed model for the composition of the Web services with adaptive atomicity (III) and we expose the related work (IV) before the conclude (V).
A Web service is a self-contained modular program that can be discovered and invoked via the Internet or intranet. The composition of Web services permit to create the new services at values added called composite Web Services. A composite web service can be seen as a distributed application that presses the other distributed Web Services on the network. Like all applications, the execution of a composed Web service must preserve the consistency of the system which is classically guaranteed using transactional mechanisms. A transaction is a set of operations which, when executed alone in an environment without failures, transform the information system of an initial coherent state to a final coherent state [1]. The intermediate states may be incoherent temporarily. The transactions are characterized by the ACID properties (Atomicity, Consistency, Isolation and Durability). In this paper we will focus on the atomicity property, which is originally defined as follows [2]: “either all the operations of a transaction are executed and the transaction is validated, or none of them is executed and the transaction is aborted”. This atomicity called “Strict atomicity” has been adopted by the linear model which is the basic transactional model [3]. Other levels of the atomicity have been proposed by the advanced models: the Semantic atomicity was presented by The Saga model [4]. A Saga is sequence of sub-transactions; each sub-transaction has associated a
ISBN number: 978-9931-9142-0-4
II.
TRANSACTIONAL ASPECTS IN THE COMPOSITION OF THE WEB SERVICES
A composite Web service is an application that based on other Web services distribute on a network, from this definition we can deduce that the transactional properties of a composite Web services depend on: (i) the internal transactional properties of the composite Web service, (ii) the Transactional properties of the component Web services, And also (iii) the properties of the environment of the composition. A. Internal transactional properties of Composite Web services A composite Web service presents a set of tasks for its users. The internal transactional properties of a composite Web service depend to properties of component tasks which can be (replaceable or not) and (vital or not). In the example of the composite Web service “travel arrangements” consist of four tasks: (1) reserve a flight, (2) buy a train ticket, (3) reserve a hotel room (4) reserve a table of restaurant and (5) reserve a taxi, the task "reserve a flights" is replaceable, if we cannot reserve a flight, buying
50
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
a train ticket can solve the problem of displacement. Also, the task "reserve a table of restaurant" is not vital; the composition can be validated even if the table is not reserved. Note that these properties can be dynamically changed during the execution of the composition. If the composition is to be aborted, for example following the failure of a vital task, it is possible that the user changes his opinion on the vitality of the failed task to save the issue of the composition.
III.
The "AdapWs " model models composite Web service by a tree whose internal nodes are the tasks and leaves are the Web services designate to execute them (Fig. 1). - An adaptable Web service validate if all its vital tasks validate. - A task validate if one of an associated Web services validate. - A task is executed by the principal Web service or by one of its alternative, if it is replaceable. - If a replaceable task fails, it will be replaced by one of its alternative tasks, if it is replaceable. - A component Web service can be composite or elementary. - A composite Web service is modeled in its turn by a tree of tasks and Web services. - An elementary Web service can be (atomic, quasiatomic or non-atomic).
B. Transactional properties of the component Web services The Web services are loosely coupled components, hosted by independent providers, which have heterogeneous transactional properties. A composite Web services must support transactional properties of its component Web services. A component Web service can be composite or elementary. Transactional property of composites Web services are discussed above. Elementary Web services are classified according to their transactional properties to atomic, quasi-atomic and non-atomic [6]: - A service is said to be atomic (associated with the “all or nothing” semantics) when it provides the following operations: (1) resource reservation, (2) cancellation, (3) and validation. - A service is quasi-atomic, when it supports a validation operation, and a compensation operation which undoes the effect of the validation. - A service is defined as non-atomic, when the only operation it offers on resources is validation. It supports neither cancellation nor compensation.
WSAdap Ws
T1
WS1
al1-WS1
T2
al1-T1
aln-WS1
WSal1-T1
Ti : the task i alk-Ti : the alternative k of Ti WSi : the Web service i alj -SWi: the alternative j of WSi : principal task or Web service : alternative task or Web service
C. Properties of the Web services composition environment The domain of the Web services is dynamic and evolving, new services can be added, existing services are constantly modified, temporarily suspended, or finally suppressed. These characteristics recommend the dynamic selection and replacement of the component Web services. In addition, the interactions between the web services may have varying durations, they can be short-running or long-running depending on the state of the communication support (Intranet or Internet), which may be good, cluttered, or partially destroyed.
WS2
al1-WS1
T21
T22
WS21 WS22
al1-WS21
alt-WS1
al2-WS21
Figure 1. Tree structure of an adaptable Web service.
A. Definition of the « AdapWs » model: An adaptable composite Web service is modeled as follows: WSAdapWs = {(Ti , RTi, LWSTi, al-Ti , i≥1 )} with: - Ti is a task of the adaptable Web service WS. - RTi = (r1, r2) specifies the role of the task Ti . r1∈{Vt, NVt}, r2∈{Rp, NRp}. Vt : vital, NVt : Non-vital, Rp : replaceable, NRp : Non-replaceable. - LWSTi = (WSi, RWSi, al- WSi) : the list of Web services designated for execute the task i, with: - WSi : is the identity of the principal Web service selected for execute the task i. - RWSi : the role of the principal Web service WSi : RWSi = (rs1, rs2). with : rs1∈{At, QAt, NAt, }, rs2∈{Rp, NRp}. At : Atomic, QAt : Quasi-Atomic, NAt : Non-Atomic, Rp : Replaceable, NRp: Non- Replaceable.
Therefore the transactions in the Web services are influenced by: - The variability of the internal transactional properties of composite Web services. - The dynamic change of the internal transactional properties of composite Web services. - The heterogeneous of transactional properties of the component Web services. - The dynamism of the environment of the Web services. these features make it difficult to see impossible the static prediction since the conception of all scenarios can be presented during the composition of a composite Web service, hence the need of adaptable dynamically models.
ISBN number: 978-9931-9142-0-4
PRESENTATION OF THE “ADAPWS ” MODEL
51
2nd International Conference on Networking and Advanced Systems May 6-7, 2015 | Badji Mokhtar University, Annaba, Algeria
: The property is not pertinent. - al-WSi = {(alk-WSi,Ralk-WSi ), k≥0} : the list of the alternatives of the principal Web service WSi, with: - alk- WSi : the alternative number k of WSi. - Ralk-WSi ∈{ At, QAt, NAt, }: the role of alk- WSi. - al-Ti = {( alk-Ti , LSwalk-Ti, k≥0, ) } the list of the alternatives tasks, with: - alk-Ti : the alternative number k of the task Ti. - LSwalk-Ti : the list of Web services selected to executed alk-Ti.
transactional reliability of the compositions and how the “AdapWs” model manages this type of services. 1) Transactional reliability and non-atomic Web services: In transactional terms, the Web services that provide the recovery mechanisms don’t pose problems; it is enough to well define the order of invocations of the different exposed operations to ensure the coherence of the system. It is the case of the atomic Web services that provides the posteriori recovery mechanisms (Cancellation operations), and the quasi-atomic Web services that provide the priori recovery mechanisms (Compensation operations). However, the nonatomic Web services that don’t provide any recovery mechanisms (nor at priori nor at posteriori) can violate the transactional reliability of the system. For example, consider a Web service composed of two vital non-replaceable Web services: WShotel and WS-flight. Suppose that WS-hotel validate while the WS-flight fails, in this case the composition must be aborted and the resources provided by WS-hotel must be released. So, if WShotel is atomic, we use the operation of cancelation, if it is quasi-atomic we launch the operation of compensation, but if it is non-atomic, we cannot release the reserved resources and that violates the atomicity and so the transactional reliability of the system. To preserve the atomicity in the presence of nonatomics Web services, the “AdapWs” model requires that its invocation under the two following conditions:
There is a dependency relation RD between the tasks of the same composite Web service: ∀(Ti , RTi, LWSTi, al-Ti ), (Tj , RTj, LWSTj, al-Tj) ∈ WSAdapWs : (Ti , RTi, LWSTi, al-Ti ) RD (Tj , RTj, LWSTj, al-Tj ) RD is a relation of parallelism (║) or of sequential (