Application-Oriented Networking through Virtualization and Service ...

4 downloads 1377 Views 2MB Size Report
tions, in the second part of this thesis, we present the Virtualized Application ... tions and experimental measurements to show the effectiveness of the proposed  ...
Application-Oriented Networking through Virtualization and Service Composition

by

Hadi Bannazadeh

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Electrical and Computer Engineering Department University of Toronto

c 2010 by Hadi Bannazadeh Copyright

Abstract Application-Oriented Networking through Virtualization and Service Composition Hadi Bannazadeh Doctor of Philosophy Electrical and Computer Engineering Department University of Toronto 2010 Future networks will face major challenges in accommodating emerging and future networked applications. These challenges include significant architecture and management issues pertaining to future networks. In this thesis, we study several of these challenges including issues such as configurability, application-awareness, rapid applicationcreation and deployment and scalable QoS management. To address these challenges, we propose a novel Application-Oriented Network (AON) architecture as a converged computing and communication network in which application providers are able to flexibly configure in-network resources on-demand. The resources in AON are virtualized and offered to the application providers through service-oriented approaches. To enable large-scale experimentation with future network architectures and applications, in the second part of this thesis, we present the Virtualized Application Networking Infrastructure (VANI) as a prototype of an Application-Oriented Network. VANI utilizes a service-oriented control and management plane that provides flexible and dynamic allocation, release, program and configuration of resources used for creating applications or performing network research experiments from layer three and up. Moreover, VANI resources allow development of network architectures that require a converged network of computing and communications resources such as in-network processing, storage and ii

software and hardware-based reprogrammable resources. We also present a Distributed Ethernet Traffic Shaping (DETS) system used in bandwidth virtualization in VANI and designed to guarantee the send and receive Ethernet traffic rates in VANI, in a computing cluster or a datacenter. The third part of this thesis addresses the problem of scalable QoS and admission control in service-oriented environments where a limited number of instances of service components are shared among different application classes. We first use Markov Decision Processes to find optimal solutions to this problem. Next we present a scalable and distributed heuristic algorithm able to guarantee probability of successful completion of a composite application. The proposed algorithm does not assume a specific distribution type for services execution times and applications request inter-arrival times, and hence is suitable for systems with stationary or non-stationary request arrivals. We use simulations and experimental measurements to show the effectiveness of the proposed solutions and algorithms in various parts of this thesis.

iii

to the memory of my father

iv

Acknowledgements The completion of this thesis would not have been possible without the support of many people. First and foremost, I owe my deepest gratitude to my supervisor, Professor Alberto Leon-Garcia, for his guidance and generous support throughout my research. I would like to thank him for the insightful discussions and ideas that shaped my research and led me to the completion of this thesis. Professor Leon-Garcia is not only a great supervisor but also an admirable person that I will see as a role model in the future. I would like to thank the honorable members of my committee: Professors Ben Liang, Paul Chow, Baochun Li, Gordon Agnew and Ahsish Khisti for their evaluation of my thesis and their invaluable comments and feedbacks. I would also like to thank the university staff members, especially Ms. Linda Espeut, Mr. Vladimirio Cirillo and Ms. Darlene Gorzo for their generous help and administrative support. During the years at UofT, I received support, feedback and encouragement from my dear friends and teammates at the Network Architecture Lab, especially from Alireza Bigdeli, Armin Ghayoori, Keith Redmond, Ali Tizghadam, Ramy Farha, Ivan Hernandez, Agop Koulakezian and Houman Rastegarfar. I would like to express my gratitude and thanks to all of them. I also had the privilege to work with many students at UofT as part of their education process. I wish to thank them all for their dedication, hard work and willingness to experience. They are Arbab Khan, Gordon Tam, Saleh Dani, Justin Seto, Andrew Mehes, Michael Ens, Ian Gartley, Tom Yue, Darryl Chung, Mingliang Ma, Maxim Galash, Wenyu Li and Anthony Das Santos. Throughout my Ph.D. years I received family-like friendship from many friends. I would like to thank all of them for the memorable moments: Amin Farbod, Reza Safian, Amirali Basri, Maryam Bahrami, Mostafa Haghiri, Kamran Farzan, Mehdi Lotfinezhad, and David Brown. v

I would also like to thank my mother, brothers and sisters for their unconditional love and support without which the completion of this thesis would not have been possible. Meeting my wife was one of the most wonderful events during my Ph.D. years. I would like to thank my beloved wife, Sara, for her selfless love and support. I am grateful for her sacrifices and patience.

vi

Contents 1 Introduction 1.1

I

1

Vision of A Future Network . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.1.1

Motivating Application Scenarios . . . . . . . . . . . . . . . . . .

4

1.2

Research Goals and Challenges . . . . . . . . . . . . . . . . . . . . . . .

7

1.3

Proposed Solutions Overview . . . . . . . . . . . . . . . . . . . . . . . .

10

1.4

Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

Application-Oriented Networking

2 Background and Requirement Analysis

16 17

2.1

New Computing Models . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2

New Applications through Composition . . . . . . . . . . . . . . . . . . .

18

2.3

Emergence of Cloud Computing . . . . . . . . . . . . . . . . . . . . . . .

21

2.4

Evolution of Traditional Service Providers . . . . . . . . . . . . . . . . .

22

2.5

Introduction of Smart Phones . . . . . . . . . . . . . . . . . . . . . . . .

24

2.6

Advancements in Content Delivery Networks . . . . . . . . . . . . . . . .

25

2.7

Future Networks Architecture . . . . . . . . . . . . . . . . . . . . . . . .

27

3 Application-Oriented Networking

29

3.1

AON Application Plane . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.2

AON Control and Management Planes . . . . . . . . . . . . . . . . . . .

39

vii

3.3

Application-Oriented Routers . . . . . . . . . . . . . . . . . . . . . . . .

42

3.4

Application-Oriented Routers Use Cases . . . . . . . . . . . . . . . . . .

44

3.4.1

Telecom Service Providers . . . . . . . . . . . . . . . . . . . . . .

44

3.4.2

Enterprise Networks . . . . . . . . . . . . . . . . . . . . . . . . .

47

3.4.3

Overlay Networks and Content Distribution Networks . . . . . . .

48

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

3.5

II

Virtualized Application Networking Infrastructure

4 Virtualized Application Networking Infrastructure 4.1

52 53

VANI Design Requirements . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.1.1

VANI Architecture . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.1.2

Current Physical Resources in VANI (VANIv1 Resources) . . . . .

60

4.1.3

Example: Requesting a Resource in VANI . . . . . . . . . . . . .

63

VANI Control and Management Plane (VANI-CMP) . . . . . . . . . . .

64

4.2.1

User Management . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

4.2.2

Authentication Authorization Accounting

. . . . . . . . . . . . .

65

4.2.3

Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . .

66

4.2.4

Generic Resources and Registration . . . . . . . . . . . . . . . . .

66

4.3

SOA-Based Implementation of VANI-CMP . . . . . . . . . . . . . . . . .

67

4.4

Security in VANI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.5

Guaranteeing Bandwidth in VANI . . . . . . . . . . . . . . . . . . . . . .

70

4.5.1

Interconnecting VANI Nodes in IP Layer . . . . . . . . . . . . . .

71

4.5.2

Interconnecting VANI Nodes in Ethernet Layer . . . . . . . . . .

72

4.5.3

Experimentation with L3 Protocols . . . . . . . . . . . . . . . . .

73

4.6

SW-Based Resources in VANI . . . . . . . . . . . . . . . . . . . . . . . .

73

4.7

Federation with GENI . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

4.2

viii

4.8

A VANI Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.9

Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

4.9.1

Reprogrammable Hardware Resource . . . . . . . . . . . . . . . .

78

4.9.2

Processing Service and Network Virtualization . . . . . . . . . . .

80

4.10 Experiments & Applications . . . . . . . . . . . . . . . . . . . . . . . . .

83

5 A Distributed Ethernet Traffic Shaping System 5.1

84

Distributed Ethernet Traffic Shaping (DETS) system . . . . . . . . . . .

89

5.1.1

DETS Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

5.1.2

DETS for Linux OS . . . . . . . . . . . . . . . . . . . . . . . . . .

91

DETS System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

5.2.1

Rate Allocator Module . . . . . . . . . . . . . . . . . . . . . . . .

91

5.2.2

Performance Improvements

. . . . . . . . . . . . . . . . . . . . .

97

5.3

Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

5.4

Modifications to Ethernet Control Plane . . . . . . . . . . . . . . . . . . 103

5.2

III

QoS & Admission Control in Service-Oriented Systems 105

6 Allocating Services to Applications using Markov Decision Processes 106 6.1

6.2

Concurrent Service Executions . . . . . . . . . . . . . . . . . . . . . . . . 108 6.1.1

Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 108

6.1.2

Markov Decision Process Formulation . . . . . . . . . . . . . . . . 111

6.1.3

Optimal Policy with Different Services . . . . . . . . . . . . . . . 113

6.1.4

The Optimal Policy and Performance Comparison . . . . . . . . . 115

Sequential Service Executions . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2.1

Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2.2

Markov Decision Process formulation . . . . . . . . . . . . . . . . 123

6.2.3

Optimal policy and performance comparison . . . . . . . . . . . . 124 ix

7 A Distributed Probabilistic Commitment-Control Algorithm

130

7.1

QoS Control in a Service-Oriented System . . . . . . . . . . . . . . . . . 133

7.2

Probabilistic Modeling of Service Commitment . . . . . . . . . . . . . . . 137

7.3

Computing Over-Commitment Probability . . . . . . . . . . . . . . . . . 142

7.4

Distributed Algorithm for Service Commitment . . . . . . . . . . . . . . 144 7.4.1

DASC Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 146

7.5

DASC Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 148

7.6

Queue-enabled Distributed Algorithm for Service Commitment . . . . . . 156

7.7

7.6.1

Problem Formulation and Description . . . . . . . . . . . . . . . . 157

7.6.2

Q-DASC Performance Evaluation . . . . . . . . . . . . . . . . . . 158

Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8 Application Admission Control System

165

8.1

Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.2

Steady-State Based Application Admission Control System . . . . . . . . 168

8.3

Online Optimization-based Application Admission Control System . . . . 170

8.4

8.3.1

Feasibility Check . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.3.2

Scenario Generation . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.3.3

Optimal Admission Decisions For Generated Scenarios . . . . . . 174

8.3.4

Final Decision Making . . . . . . . . . . . . . . . . . . . . . . . . 176

Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

9 Conclusions 9.1

181

Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.1.1

Application-Oriented Networking . . . . . . . . . . . . . . . . . . 182

9.1.2

Virtualized Application Networking Infrastructure . . . . . . . . . 183

9.1.3

Scalable and Distributed QoS and Admission Control . . . . . . . 184

9.1.4

Related Educational Contributions . . . . . . . . . . . . . . . . . 186 x

9.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Appendices

191

A Queue-Enabled Service Commitment

191

A.1 Time to Enter Service in a G/G/C/N System . . . . . . . . . . . . . . . 191 A.2 TES for G/D/C/N System . . . . . . . . . . . . . . . . . . . . . . . . . 199 A.3 TES for G/M/C/N System . . . . . . . . . . . . . . . . . . . . . . . . . 199 B Computing Over-Commitment Probability using Chernoff ’s Bound

201

C Derivation of Gk (t) Probability

206

D Simulation Environment Description

208

Bibliography

210

Glossary

226

xi

List of Tables 4.1

Average maximum FPGA programming time . . . . . . . . . . . . . . . .

4.2

UDP and TCP traffic measurements in a VANI node in MBytes per second (MBps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xii

80

80

List of Figures 1.1

Vision of a future network . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2

Example of a future application: Smart Grids . . . . . . . . . . . . . . .

5

2.1

Basic Service-Oriented Architecture model(source:http://www.w3.org) . .

20

3.1

Three planes in an Application-Oriented Network . . . . . . . . . . . . .

30

3.2

Application Plane Resources . . . . . . . . . . . . . . . . . . . . . . . . .

31

3.3

Multiple Applications in AON . . . . . . . . . . . . . . . . . . . . . . . .

32

3.4

Application Plane Architecture . . . . . . . . . . . . . . . . . . . . . . .

34

3.5

Application-Oriented Network Reference Model . . . . . . . . . . . . . .

40

3.6

Overall view of an Application-Oriented Network with multiple AORs and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

3.7

Telecommunication services in an AON . . . . . . . . . . . . . . . . . . .

45

3.8

Enterprise Service Bus and AON . . . . . . . . . . . . . . . . . . . . . .

47

3.9

Peer-to-Peer network in AON . . . . . . . . . . . . . . . . . . . . . . . .

49

4.1

VANI design requirements . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.2

VANI architecture

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

4.3

Researcher interaction with VANI planes . . . . . . . . . . . . . . . . . .

59

4.4

Virtualizing physical resources in VANI . . . . . . . . . . . . . . . . . . .

61

4.5

A sample interaction between a researcher and VANI to secure a resource

63

4.6

A sample schema for generic XML content in a getRequest response message 67 xiii

4.7

Connecting VANI nodes in IP layer . . . . . . . . . . . . . . . . . . . . .

71

4.8

Connecting VANI nodes in Ethernet layer . . . . . . . . . . . . . . . . .

72

4.9

Large scale experimentation with new L3 protocols . . . . . . . . . . . .

74

4.10 Connecting VANI to GENI . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.11 Reprogrammable Hardware (BEE2 Board) . . . . . . . . . . . . . . . . .

78

4.12 Traffic measurement experiment topology . . . . . . . . . . . . . . . . . .

81

5.1

A system with five nodes and two virtual nodes on each . . . . . . . . . .

85

5.2

TCP rate back off due to interfering UDP traffic . . . . . . . . . . . . . .

86

5.3

DETS measurement and rate control points . . . . . . . . . . . . . . . .

89

5.4

DETS System Internal Modules . . . . . . . . . . . . . . . . . . . . . . .

92

5.5

DETS performance evaluations for system shown in Figure 5.1 . . . . . .

98

5.6

Performance evaluation of rate allocation algorithms a) RAA-SlowProbe b) RAA-FastProbe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.7

Performance evaluation of rate allocation algorithms a) RAA-FairShare b) RAA-ForwardExplicit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.8

DETS in Ethernet control plane . . . . . . . . . . . . . . . . . . . . . . . 103

6.1

A system with m different service types and N instance of each type

6.2

A system with three types of service and two classes of applications . . . 111

6.3

A system with three type of services, two classes of applications and two

. . 109

types of instances for service type 3 . . . . . . . . . . . . . . . . . . . . . 113 6.4

Optimal policy when the system is in state (n1 , n2 ), and α = 1, β = 0.1 . 115

6.5

Optimal policy when the system is in state (n1 , n2 ), and α = 1, β = 0.5 . 116

6.6

Performance Comparison between Complete Sharing, Complete Partitioning and MDP-based partitioning mechanisms . . . . . . . . . . . . . . . . 117

6.7

A system with m different service types and N instance of each type

6.8

A system with three types of service and two classes of applications . . . 122 xiv

. . 119

6.9

Optimal policy when the system is in state (n11 , n12 , n22 ), and γ = 0.1: a) n22 = 1, b) n22 = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.10 Optimal policy when the system is in state (n11 , n12 , n22 ), and γ = 0.3: a) n22 = 1, b) n22 = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 6.11 Performance Comparison between No Commitment Policy, Full Commitment Policy and MDP-based partitioning mechanisms (α = −0.1, β = 0.5, γ = 0.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.12 A sample beta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.13 Performance Comparison between No Commitment Policy, Full Commitment Policy and MDP-based partitioning with a beta distribution for service execution time and (α = −0.1, β = 0.5, γ = 0.1) . . . . . . . . . . . . 128 7.1

A sample service-oriented environment . . . . . . . . . . . . . . . . . . . 133

7.2

Composition Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.3

A service-oriented system with three agents, each controlling one service type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.4

Distributed Algorithm for Service Commitment in SDL (Specification and Description Language) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.5

Beta pdf for service execution time with parameters α = 2.333 and β = 4.666148

7.6

Application failure ratio for a system with two application classes and two service types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.7

Comparing DASC throughput with bottleneck-based admission control algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.8

A service oriented environment consisted of twelve service types and three applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.9

Applications failure ratios in the system . . . . . . . . . . . . . . . . . . 152

7.10 Failure ratios in services 1 to 6 vs. applications request rates . . . . . . . 153 xv

7.11 Comparison between four admission control mechanisms with stationary request arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.12 Comparison between four admission control mechanisms with on-off bursty request arrivals with burst time (T) . . . . . . . . . . . . . . . . . . . . . 154 7.13 Applications queuing probability with ample number of queuing spaces using Q-DASC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.14 Applications failure probability based on queue size in Q-DASC algorithm 160 8.1

A sample service-oriented environment . . . . . . . . . . . . . . . . . . . 167

8.2

Application Admission Control System using Online Optimization . . . . 171

8.3

System reward for four different techniques . . . . . . . . . . . . . . . . . 178

8.4

Application 1 and application 2 failure rates based on the applications request rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

A.1 Distributions for residual service times in a service with uniform execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 A.2 Distributions for residual service times in a service with Normal execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 A.3 Distributions for residual service times in a service with a Beta execution time α = 2.333, β = 4.666 . . . . . . . . . . . . . . . . . . . . . . . . . . 195 A.4 TES distribution and calculated bound for beta distribution with α = 2.333, β = 4.666

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

B.1 A sample d(s) for a service with 900 instances, and random pi s for 1000 application instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

xvi

Chapter 1 Introduction The present Internet has become an essential infrastructure in modern society in spite of its glaring and serious shortcomings in regards to security, reliability, and performance [1]. A constant in the thirty-year history of the Internet has been continual growth in scale, both in the number of Internet users and in the diversity of Internet applications. The growth in the Internet has been fueled by the steady improvement in cost and performance of computing and communications technology. The Internet is currently entering a new phase of more dramatic and diverse growth which is driven by: 1. New computing models where a new application can be created with the same ease as designing a new web page through the linking of service building blocks and 2. New Internet users in the form of communicating devices such as smart wireless phones and tablets, high definition monitors, smart sensors, alarms and controllers. The next generation Internet will be challenged to support a much more diverse and a much greater number of applications as well as a new generation of communicating devices. The goal of this thesis is to study how future networks can support and facilitate creation, deployment and management of these emerging applications. In particular, we study how service composition techniques in application creation and virtualization of 1

Chapter 1. Introduction

2

resources can empower future networks in this support. In the following, we will first present our vision of future networks, followed by brief descriptions of some example application scenarios that require a new network architecture. We then outline our research goals and challenges, and briefly preview our solutions. Finally, we describe the thesis structure and research contributions.

1.1

Vision of A Future Network

Figure 1.1 shows our vision of the emerging future networks. In this vision, the network will be mainly comprised of an optical backbone network, core/metro/access networks and finally the terminals, datacenters and end users. The backbone optical network will be responsible for transferring massive volumes of data between the core network components. The access network will provide very high bandwidth connectivity using various wireless and optical technologies to the network users. Terminals and users in future networks fall into various classes. One major class will be mobile computing nodes that will have a combination of advanced features such as high processing power, a long battery life and sophisticated user interfaces including touch screens, speech and image recognition, etc. Moreover, users will have high mobility that will require hand-offs between various types of access technologies. Another class of the network end terminals will be smart sensors and/or actuators, such as smart grid sensors, that require a very high level of responsiveness and reliability from the network and will be deployed in mass. The sheer volume of these communicating devices will challenge the scale and cost-points of future networks. Another major type of network “users” will be massive datacenters that exploit inexpensive computing and storage commodity resources and constitute factories used for creating applications that require massive processing, storage and bandwidth at low cost. These datacenters will be connected to the network using very high bandwidth optical

3

Chapter 1. Introduction

Enterprises

Datacenters (Cloud Comp.)

Optical Network

Core/Metro/Access Networks

Users

Figure 1.1: Vision of a future network networking technologies. In combination, these new network users at the edge of future networks together with deployment of very high bandwidth optical core network will open the door for introduction of a vast universe of applications and intelligence that will serve different purposes. In this application universe, different classes of applications (e.g., sensors, human communications, machine-to-machine, content distribution, etc) will have different and sometimes contradictory expectations from the network. These diverging requirements might force creation of separate networks, unless future networks become capable of serving these applications on a shared infrastructure. In this thesis our main research objective is to study future network architectures that can provide customizable support for applications. In the next subsection we briefly examine several motivating application scenarios to illustrate the type of challenges that

Chapter 1. Introduction

4

future networks will face and to motivate our research on future network architectures.

1.1.1

Motivating Application Scenarios

Future networks will support a diverse range of applications over a variety of access technologies. In this section, we discuss four sample application scenarios in two general categories to show the type of challenges the future networks will face and application’s requirement that need to be addressed.

Smart Infrastructures Smart infrastructures such as smart utility grids and smart transportation systems are an important class of future applications. In this class of application, sensors and actuators will be deployed in massive scale throughout the network. The sensors allow real-time environmental information to be gathered in a variety of settings and the actuators allow control actions to be exercised in response to environmental conditions according to various policies and objectives. On the other hand, inexpensive computing and storage in massive datacenters allow introduction of applications that can receive the sensors data, process it, and generate and forward commands to the actuators. The combination of smart sensors, actuators and affordable large scale processing and storage will enable introduction of smart infrastructures that will revolutionize the way we live. One of the major requirements of these future smart infrastructures is having a responsive network to reliably transfer the sensor data and actuator commands between the sensors, actuators and datacenters. Smart grids are an example of smart infrastructures that deploy sensors and actuators in homes and housing and industrial complexes as shown in Figure 1.2. The smart infrastructure in smart grids enables not only improved energy efficiency, but also new business models for energy pricing and trading as well as energy consumption that is sensitive to carbon emissions. The sensors in smart grids will generate large volumes

5

Chapter 1. Introduction

Smart Houses

Datacenters (Cloud Comp.)

Optical Network

Core/Access Network

Datacenters (Cloud Comp.)

Smart Grids Elements

Figure 1.2: Example of a future application: Smart Grids of data. The generated data needs to be securely transferred through the network to datacenters that are able to process data in massive scale and produce commands that are to be forwarded to the actuators resided in the homes and housing and manufacturing complexes. The amount of data that smart grids can potentially generate as well as reliability and responsiveness level that are needed will surpass existing applications that are supported by the current Internet. Another type of smart infrastructure involves smart transportation systems comprised of traffic sensors and traffic signals as well as networked cars and passengers equipped with wireless devices, GPS and cameras. These smart transportation systems generate a very large amount of data that need to be securely passed through the network to the datacenters for processing. The generated commands in these datacenters will not only guide the man-driven traffic but possibly will direct smart and machine-controlled moving elements. This automated control will be essential to address the future energy challenges and maximizing the use of green and renewable sources of energy.

Chapter 1. Introduction

6

From these two sample applications, we can see that future networks need to be highly reliable, responsive and be able to handle a large number of wireless and mobile users in massive scales with high security and accountability.

Content Distribution Networks Content distribution and human-to-human communications and interactions are driving the emergence of many new applications that will be empowered by the availability of high bandwidth in the backbone optical networks, inexpensive computing resources and smart mobile terminals. These applications need to handle a large number of mobile and heterogeneous users. The heterogeneity and mobility are driven by the universal acceptance of smart wireless phones, netbooks and new devices such as iPad that use and build on high bandwidth wireless access technologies. One example application will be high quality streaming of an event to a large number of users. The end-users in this application will be using heterogeneous devices and different access technologies and are mostly mobile. In this application the distribution and streaming needs to be adaptive to each user’s available bandwidth and device playback capabilities or user preferences. In a mobile environment, mobile devices may experience temporary disconnections, however, novel caching techniques are required to maximize Quality of Experience. In this class of applications, users need to interact with each other and produce metadata that can be consumed by other users. The users experience will also be improved with a large volume of metadata (speech-to-text, etc) that needs to be automatically generated using powerful computing resources. Also, the distribution model in this class of application needs to be customized to the application business models requirements, the type of content as well as the target end-users. In this class of applications, many novel functionalities are required including: efficient content multicasting, smart caching, content conversion, image rendering and recognition, specialized encryption and decryption functionalities, speech recognition and speech to text

Chapter 1. Introduction

7

as well as AAA (Authentication/Authorization/Accounting) operations. These functionalities are also required to be highly reliable, robust and affordable and sometimes they may only be needed for a short period of time. Another example of content distribution applications is 3D presence systems [2] that will enable a group of individuals to interact in 3D across a wide area network. This application, unlike the previous scenario, might involve a small group of users, however it requires very high speed and high bandwidth connections and large amounts of processing. Introduction of 3D technologies, inexpensive computing resources in the datacenters used for image processing operations, together with very high bandwidth backbone optical network will enable introduction of these types of content distribution and streaming applications that will need different types of functionalities than what the current networks can provide. Having seen these scenarios, we can expect that future networks will face unprecedented challenges from a diverse range of applications. These challenges will push networks to match the advancements in the different technologies in its periphery (users, commoditized computing, and access technologies) and within (optical networks). In this thesis, our goal is to study future networks architecture and the type of capabilities that these networks need to cover.

1.2

Research Goals and Challenges

In the previous section, we presented our vision of future networks and we briefly previewed a few sample future application classes. We also discussed some challenging requirements of these applications. To enable introduction of such applications, it is fundamental for the network to address these challenges and more. In this section we name a few of those challenges and outline the main research questions studied in this thesis. These challenges are grouped in four main categories including configurability

Chapter 1. Introduction

8

and application-orientation, facilitating application-creation, scalable service management and QoS control, and mobility and security. • Configurability and Application-Orientation: Traditional networks are usually designed, installed and configured once for all applications that operate over them. Future network need to be flexible and configurable to adapt to application requirements. This configurability should be in different levels from the lower links configuration up to application-specific routing functions. For instance, applications should be able to customize the network architecture according to their distribution model as well as their chosen caching, forwarding, broadcasting and multicasting approaches for their specific content. Future networks require configurable and application-oriented components which enable each application to fulfill its own set of requirements. We call such a network an Application-Oriented Network. One of the main research goals in this thesis is to develop a framework in which application-orientation and configurability can be realized in future networks. • Facilitating Application Creation: Future distributed applications need to be created, deployed and retired rapidly to adapt to future agile and competitive business models. Networks can foster this agility by offering common services used in the full life cycle of an application. Many of these common services will use computing and communication resources. Future networks should provide a network of computing and communication resources on which these services and ultimately applications can operate. To provide such support, virtualization and composition techniques will be heavily used in future networks. Virtualization has been introduced as a technique to hide the underlying hardware resources from the applications. Virtualization will be used in different levels in future networks to facilitate application creation and to fulfill application-orientation and configurability requirements. Virtualization will be both a solution and a

Chapter 1. Introduction

9

challenge in future network since the success and scope of future networks will depend on the advancements in virtualization technologies in various domains such as bandwidth virtualization as well as computing virtualization. To facilitate application creation, we also need to be able to compose new applications using these common service components and virtualized resources. Another major challenge in future networks is to define open and flexible interfaces to these common services to enable their incorporation in different applications in the face of heterogeneity. • Scalable Service Management and QoS Control: Guaranteeing QoS has been and will be a major challenge for any network. The scope and diversity of the applications that will operate over future networks depend directly to the level of QoS, responsiveness and predictability that these networks will provide. Future networks will offer a much diverse range of features to the applications. Therefore their management scope will cover managing the new features as well. To limit the costs associated with management in future networks, scalable and automated service and resource management solutions will be required. A major challenge in introduction of future application-oriented networks is to design and develop such scalable service management systems and their associated algorithms. Another major research goal in this thesis is to study QoS control mechanisms that are able to guarantee a desirable network behavior in large scales. • Mobility and Security: Two of the main challenges in future networks are mobility and security management. Generally, networks with IP-centric transport stratum have difficulty in handling mobility and security issues. With emergence of a new generation of applications and users, these two challenging issues will continue to play a major rule in success of any network architecture. Although we do not directly address these challenges in this study, one of our main research goal

Chapter 1. Introduction

10

is to provide a flexible platform for enabling incorporation of novel mobility and security management systems in future applications. Addressing the complete list of challenges in a dissertation is impossible.

Nev-

ertheless, we focus on a subset of these challenges in areas including configurability, application-orientation, network-facilitated application creation, virtualization and scalable QoS management. We also direct the interested reader to other studies in our research group on autonomous service management [3] and core network management [4] that have addressed other challenges in future networks.

1.3

Proposed Solutions Overview

In this section, we briefly overview our proposed solutions for some of the studied challenges. In this study, we propose a new network architecture called Application-Oriented Network architecture to address the configurability and application-orientation challenges in future networks. Application-Oriented Network (AON) is a converged computing and communication network that facilitates creation of diverse range of applications through virtualization of resources and service-oriented application creation paradigm. In AON, an application is a high-level distributed function that is composed of several lower-level service components and is designed to either deliver a service to the end-users or to be used in another more high-level application. One of the main objectives in AON is to enable application marketplaces in which application providers can find available components that are designed and developed separately and use them in their applications. To do so, AON follows the Service-Oriented Architecture (SOA) [5] application creation paradigm. In SOA, high-level applications and business processes are created by composing service components that can be accessed through well-defined and standard interfaces. Service-Oriented Architecture enables loose coupling and higher interoperability among

Chapter 1. Introduction

11

service components that are used in creating an application while each can be developed and deployed independently. Our proposal for AON utilizes this paradigm to facilitate application creation in future networks. Virtualization is another major technique that we heavily use to facilitate application creation and to provide configurability and application-orientation. “Virtualization” corresponds to different technologies in areas such as computer hardware, software, memory, storage, data and etc. [6]. Nevertheless, we refer to a virtualized resource as a resource that provides the essential capabilities of the real physical resource and is abstracted from the physical resource. Through virtualization, AON allows application providers to rapidly deploy and retire an application. Application providers are also able to flexibly and dynamically configure the virtualized resources to satisfy their applications requirements. Our proposed architecture for AON consists of three main planes: control, management and application planes. The resources that are required for creating an application are virtualized and abstracted as service components in the application plane. In AON application plane, a virtual network of computing and communication resources is allocated to each application in which that application could operate. The application providers secure access to these resources through the open and well-defined interfaces of the control plane. The AON management plane, on the other hand, is responsible for managing the resources in the application plane. Although applications in the application plane can follow any network architecture that they choose, we propose a generic application plane architecture to address future applications requirements. This generic architecture includes two enriched layers; a service layer and a transport layer that covers content-delivery as well as data-delivery functions. To validate this architecture, we designed and developed a prototype of this network called Virtualized Application Networking Infrastructure (VANI) [7, 8] that allows ap-

Chapter 1. Introduction

12

plication providers and networking researchers to create new distributed applications. In the realm of network experimentation testbeds, VANI is a major contribution as it enables network researchers and application providers to experiment with new distributed applications and network architectures. Moreover, VANI allows experimentation with new layer three protocols instead of Internet Protocol (IP). VANI also includes a reprogrammable hardware resource that allows application providers to perform customized hardware-based processing in the network. Another major contribution of this study in the field of network virtualization is the introduction of a Distributed Ethernet Traffic Shaping system (DETS) [9] that is able to guarantee the send and receive bandwidth on virtual networks created for applications in VANI. Although DETS is proposed for VANI, it is also capable of operating in any virtual machine-based computing cluster or large datacenter such as cloud computing [10] datacenters to improve network performance and minimize interference between different users traffic. One of the main advantages of DETS is that unlike other Ethernet congestion control mechanism it does not require any changes in Ethernet equipments and can operate on the hosts systems. We propose four algorithms for the DETS core module which is the rate allocator. We compare these four algorithms performance and describe their characteristic through experimentations and measurements. One of the main challenges in a service-oriented system such as VANI is to guarantee an agreed level of QoS for the composite applications. To address this issue, we investigate large scale QoS and admission control mechanisms and propose new algorithms to guarantee the QoS for applications created through service composition in serviceoriented systems. Specifically, we focus on application admission and QoS control to guarantee successful application completion in service-oriented systems where a set of service components with a limited number of instances are shared between different applications. The goal is to allocate these limited resources in a way that the system revenue is maximized and an agreed level of application QoS is met. In this problem, there are

Chapter 1. Introduction

13

several defining parameters that can affect the possible solutions including service execution times distributions, applications request arrival processes as well as the scalability concerns.

We first formulate this problem using Markov Decision Processes (MDP) [11, 12] for small scale systems that have exponential service execution times distributions and are subjected to stationary Poisson request arrival processes. Next we introduce the problem of QoS control in these environments [13] and we propose distributed heuristics to guarantee the probability of successful completion of an admitted application instance [13, 14]. The proposed algorithm is called Distributed Algorithm for Service Commitment (DASC) [15]. DASC is able to operate with both stationary and non-stationary request arrival processes and covers both queue-less and queue-enabled services. Moreover, it does not limit the service execution times distributions to be exponential. DASC uses a probabilistic model to predict future resource usage in the system and makes admission decisions based on the current and the projected state of the system.

We present alternative steady-state based admission control approaches that can guarantee the probability of successful completion in the steady-state. Through simulations and performance comparisons we show that DASC is able to operate with both stationary and non-stationary request arrivals, while the steady-state based approaches can only operate with stationary request arrivals.

We also propose an application admission control system to both guarantee an agreed level of QoS using DASC and maximize the system revenue by admitting more valuable application classes to the system [16]. For the application admission control system, we investigate the steady-state based solutions as well as online combinatorial optimization approaches to maximize system revenue.

Chapter 1. Introduction

1.4

14

Thesis Structure

The thesis is composed of three parts that correspond to the major contributions of this study: • Part I: Application-Oriented Networking: In this part, we present the AON architecture. We start, in Chapter 2, by analyzing the background and requirements for AON and studying the major trends in computer and communication technologies [17]. The AON discussion is followed with the description of the AON main planes and layers as well as their responsibilities and functionalities in Chapter 3. We also present the main component in an AON network which is an Application-Oriented Router (AOR) that utilizes hardware and software components to support content processing and delivery. We present several sample use cases in which AORs can add value to different applications scenarios. • Part II: Virtualized Application Networking Infrastructure: In Chapter 4, we describe VANI architecture and its main resources and we show how a researcher or an application provider can interact with VANI. We also discuss different functionalities of VANI control and management plane. We describe how new resources can be created and registered in VANI. Performance measurements on VANI reprogrammable resources and VANI internal fabric are presented as well. In Chapter 5, we present the DETS system that is able to guarantee the send and receive bandwidth on virtual networks created for applications in VANI. In this Chapter, we also present DETS main modules and its corresponding algorithms as well as measurements and performance evaluations. • Part III: QoS & Admission Control in Service-Oriented Systems: In this final part, we investigate the problem of service allocation in service-oriented systems. We first formulate this problem using Markov Decision Processes in chapter

Chapter 1. Introduction

15

6. Next, we introduce the problem of QoS control in these environments and we describe the DASC system in Chapter 7 and present the probabilistic and predictive model used in DASC for both queue-less and queue-enabled systems. In Chapter 8, we propose an application admission control system in service-oriented systems. Performance evaluations of each of the proposed algorithms are properly placed in the related chapters. Finally in Chapter 9, we offer the concluding remarks and discuss our contributions as well as our future work.

Part I Application-Oriented Networking

16

Chapter 2 Background and Requirement Analysis In this chapter, we focus on the challenges to network architecture presented by new applications. We examine how the well-known trends in the commoditization of hardware, software, and communications technology have enabled new distributed computing models. New applications based on these models have been very disruptive because of the clear advantages that they have over traditional ones. We examine the features of these new applications that have made them successful and identify the potential additional benefits that may result from their associated computing models. We discuss how these new models are leading to a new service or application provider infrastructure in which computing and communications technologies converge in new ways.

2.1

New Computing Models

Relentless technology advance, captured by the rubric of ”Moore’s Law”, has been a steady driver for change in networking equipment, devices, services, and applications. Improvements in computation power and cost have facilitated the execution of more complex software, which in turn has stimulated more demand for improved hardware. 17

Chapter 2. Background and Requirement Analysis

18

This virtuous cycle has taken a dramatic turn in the last few years as the basic enabling computing, communications, and software technologies have become commoditized. New distributed computing models have appeared that are fundamentally disrupting traditional models for offering services and applications by leveraging commodity resources and introducing new business models. Peer-to-peer applications are a prime example of these disruptive trends [18]. Commodity computing, communications, and software have also enabled new applications that attain entirely new levels of scale, with Google search the preeminent example. Peer-to-peer applications and Google search represent extremes of distributed computing in terms of ownership and control of resources, but they also share the advantages inherent in distributed computing. Both examples can achieve huge levels of scale. Their design provides for the delivery of the application through systems that are loosely coupled so that faults can be addressed through simple mechanisms that exploit inexpensive redundancy. Both designs incorporate self-organizing mechanisms to address the management of a huge aggregation of resources. Self-organizing mechanisms are also used to ensure connectivity and basic levels of performance. It is clear that these new infrastructures that are built atop of shared and/or commodity resources can achieve very large scales, while have the potential to provide higher reliability, performance, and much lower operating costs.

2.2

New Applications through Composition

We have seen that the Internet has become the platform to support new applications and that the associated infrastructure is becoming more decentralized. Moreover, the approaches to creating new applications are also changing in a direction where innovation becomes more decentralized. In this environment, new applications are created through composition of service components, both in the form of ”mashups”, as well as in a more

Chapter 2. Background and Requirement Analysis

19

rigorous form by using a Service Oriented Architecture [19, 20, 5]. The success of Web protocols and standards in enabling the deployment of a massive system through the uncoordinated efforts of a global community provides support for efforts to create applications through the linking of interoperable, loosely coupled software components that are accessed through Internet protocols. New application providers such as Google and Yahoo now offer access to components that provide services such as search, map, chat, and photo sharing, to other application developers through Application Programming Interfaces (APIs). The term ”mashup” is used to denote web applications where several sources are used to create a new service [21]. For example, Google maps have provided the basis for a huge number of mapping mashups. The importance of the mashup phenomenon is that it marks the emergence of a new mode of application creation where applications are created through a distributed and collaborative process and where the application at any given point in time is the cumulative result of a community effort. The term Web 2.0 refers to this network-centric platform [22]. The emergence of Service Oriented Architecture (SOA) standards represents another related major trend for the delivery of applications [19, 20]. The key architectural concept in SOA is the service orientation that enables the rapid and easy composition and management of large-scale distributed services in the face of component autonomy and heterogeneity. Service composition concepts are not new but have increasingly come into the spotlight in recent years with the emergence of the new technologies such as XML and Web Services (WS). SOA model follows a three step phase: register, find, and invoke as shown in figure 2.1. Web Services set of specifications [23] is an instantiation of this model. Web Services specifications provide uniform interfaces to loosely-coupled software components. These specifications provide a messaging framework for the transfer of information, an XML-based [24] grammar for defining web services [23], and the means for locating web services. As in mashups, a key concept of interest to us in Web Services is the ability

Chapter 2. Background and Requirement Analysis

20

Figure 2.1: Basic Service-Oriented Architecture model(source:http://www.w3.org) to create new applications through the composition of service components that can be accessed through standard Web Service interfaces. However, SOA goes further through the development of the Business Process Execution Language (BPEL) [25], which allows business processes to be implemented as workflows involving multiple web services. From a network architecture perspective, SOA shifts the focus to an overlay network of computing resources where messages are exchanged according to content and in doing so opens the way to application-oriented networking. The emergence of SOA as a new paradigm for service provisioning highlights the importance of service orientation in the future application-oriented networks. SOA-based loosely-coupled systems are giving enterprises greater agility, when it comes to adjusting the structure of their businesses to meet changing business requirements. This model of flexible and decentralized application creation has enabled introduction of many new services and applications providers, and future network needs to consider this application creation paradigm to facilitate and enable agility in application creation.

Chapter 2. Background and Requirement Analysis

2.3

21

Emergence of Cloud Computing

Recently cloud computing model has emerged as a platform for deployment of applications and services. This model relies on very large scale datacenters attached to the Internet cloud [10, 26]. These datacenters heavily utilize virtualization techniques on top of commodity hardware and software components. The resources in the cloud are consequently inexpensive and affordable for many application providers that find it more economically viable to use these resources rather than investing on in-house deployment of these resources. The cloud computing resources are primarily based on virtualization of two pillar resources; computing and storage. Application providers can use these virtualized resources to store and process data. Moreover, they can dynamically secure or release resource based on their current and/or anticipation of the load on their applications using programmable and open WS interfaces as in Amazon Elastic Cloud Computing (EC2) services [27] or open-source systems based on that interface as in Eucalyptus project [28]. The cloud computing datacenters are connected to the network with very high speed optical connections. Together with massive processing power and storage they facilitate introduction of applications that require processing large volumes of data. Another advantage of the cloud computing model is that it has shortened the deployment phase of an application lifecycle significantly and it has made creation of new applications considerably faster than before. It also enables applications to dynamically adapt to change in load by increasing or decreasing the amount of resources they need. Nevertheless, the improvements should be made in the network virtualization since it has been shown that [29, 9, 30] even internal traffic inside cloud datacenters does not enjoy a guaranteed performance, and there is considerable interference between different users traffic. The introduction and success of cloud computing model is another indicator that future applications are moving toward platforms that enable rapid application creation through composition of basic service components over shared platforms. It is also a prime

Chapter 2. Background and Requirement Analysis

22

example of how commodity resources and virtualization techniques can facilitate creation of low cost and short-life applications.

2.4

Evolution of Traditional Service Providers

The infrastructure of traditional service providers (i.e. telephone companies) is also changing towards one that is based on the Internet Protocol, under pressure from technology advances and new application providers. While suffering from some clear disadvantages relative to new application providers, traditional service providers remain superior in terms of mobility as well as reliability, security, and well-established business models. The infrastructure of the traditional service provider is undergoing a fundamental transition to a multi-service packet-switching architecture based on Internet Protocol (IP). This transition includes the introduction of a new control plane based on the Session Initiation Protocol (SIP) [31] that enables the replacement of existing services such as voice, and the introduction of services such as instant messaging and presence. This IP based network architecture is articulated by the Next Generation Networks Focus Group at the International Telecommunication Union [32]. According to the ITU-T definition, a Next Generation Network (NGN) [32, 33] is a packet-based network able to provide services including Telecommunication Services and able to make use of multiple broadband, QoS-enabled transport technologies and in which service-related functions are independent from underlying transport-related technologies. The NGN’s architecture allows decoupling of the network’s transport and service layers. This means that whenever a provider wants to enable a new service, they can do so by defining it directly at the service layer without considering the transport layer - i.e. services are independent of transport details. IP Multimedia Subsystem (IMS) [34] is an effort by telecom-oriented standard bodies

Chapter 2. Background and Requirement Analysis

23

to realize the NGN concepts and extend the new control plane to any access network, and it presents a natural evolution from the traditional closed signaling system to the NGN service control system. IMS was developed for controlling the access to services by customers of Third Generation wireless access networks. In the IMS approach, servers, in the user’s home service provider’s network, control access to all services. Consequently, the service provider can determine what services are delivered, at what quality level, and at what cost. IPSphere and Service Signaling Stratum (SSS) [35] are another telecom industry effort to enable end-to-end services across multiple service providers. An interesting aspect of the SSS is that web services are used in its implementations. Operators can publish the services they are willing to provide, and other operators can use web services to negotiate and secure the resources to enable end-to-end services. The development of SSS presents interesting possibilities for the development of an environment where traditional and emerging providers can work together in the delivery of services and applications. Further development of systems such as SSS can provide the means for these players to interact dynamically in distributed fashion, and in doing so, create an open market for applications. While initial implementations of IMS are based on traditional client/server architectures, it is clear that the emerging distributed models associated with new disruptive applications are applicable. We can therefore anticipate that future application and service provider infrastructures will be based on similar, if not identical, infrastructures that converge computing and communications to accommodate the emerging paradigms for application creation and delivery. In addition, in the future, we will see the impact of the main principle of NGN and IMS architecture, which is the independence of the service-related functionalities from the transport-related functionalities, in the future application-oriented networks. The main advantage of this separation model is the emergence of numerous new service and application providers that utilize telecommunication

Chapter 2. Background and Requirement Analysis

24

infrastructure for delivering their services to the users.

2.5

Introduction of Smart Phones

The introduction of smart phones and associated applications has been another major trend in the past few years that makes the transition to mobile computers as the default user devices. This success is mainly due to the advances in introduction of low power consuming powerful processors, multi-touch interactive displays, and high bandwidth 3G and 3G+ wireless networks [36], and upcoming Forth generation (Long Term Evolution) of wireless access networks; 4G [37]. The introduction of these devices has triggered an explosion in the number and diversity of Internet-based applications. This success has also stimulated the introduction of other novel devices, e.g. the iPad, which in turn will generate another wave of applications. Applications on smart phones enjoy enhancements in other services such as locationbased services, and Instant Messaging (IM) services. They also are able to utilize cloud based computing for performing processing intensive tasks such as speech recognition. Another interesting trend in smart phones applications is the popularity and universal acceptance of application marketplaces such as Apple store [38], and Android open application marketplace [39] in which users and application providers interact and required transactions are managed. In combination with other trends and technologies like cloud computing and SOAbased technologies, success of these devices is an exemplification of how the Internet, and the infrastructure that has emerged around it have truly become the platform for the delivery of an unlimited number of applications. It is also an indicator that the wireless, high mobility and high bandwidth terminals will constitute the majority of users in future networks, in contrast to the traditional view of network users as being mainly fixed and wired.

Chapter 2. Background and Requirement Analysis

2.6

25

Advancements in Content Delivery Networks

During the last decade content delivery networks have seen major advancements and technological breakthroughs. A Content Distribution (or Delivery) Network (CDN) is an overlay network upon which content (e.g., video) is distributed and delivered to the end users. In CDN, content is usually copied on multiple servers across a wide area, and users connect to one of these server to receive a copy of the content in contrast to contacting the central server. Akamai [40] is a major content delivery network which was a pioneer in this field. In Akamai, the content producers push their content to the Akamai edge servers and users receive the content from these servers. The major shortcoming in this model is that different content delivery networks are not usually inter-operable and the users interactivity levels with the provided content is limited. Another prime example of content delivery networks is the peer-to-peer file-sharing networks such as BitTorrent [41] that consume a large portion of global Internet traffic with a minimum centralized management, and proved how innovative, distributed and self-managing systems can operate effectively over commodity and shared resources of ordinary Internet users. Publish/subscribe systems are another important class of content delivery networks [42]. Publish/subscribe systems use an asynchronous messaging paradigm to link publishers and subscribers of event information. One of the main protocols for pub/sub systems is the Extensible Messaging and Presence Protocol (XMPP) [43] that is an XML-based protocol first developed for Instant Messaging services and now is becoming one of the main candidates for asynchronous message delivery. A big advantage of the publish/subscribe paradigm is that publishers are loosely coupled to subscribers. Publishers need not know of the existence of specific subscribers and they can remain ignorant of the system topology. Publish/subscribe provides the opportunity for better scalability than traditional client-server paradigms, through parallel operation, message caching, tree-based routing. This scalability, however, requires using

Chapter 2. Background and Requirement Analysis

26

hardware-based message processing and rule matching.

Multimedia streaming applications are another emerging class of content delivery applications. Adaptive streaming [44] is the latest trend in this class of application. In adaptive streaming, the streamed content format, and consequently required bandwidth, is adapted to the end-user device capabilities and the available bandwidth. HTTP protocol is widely used in adaptive streaming. This class of application is going to see another major challenge by emergence of 3D streaming applications [2]. Moreover, with high bandwidth availability and growing demand for short delay and high quality video, streaming uncoded and raw high definition multimedia content will become more attractive, especially since the content will be converted to many formats and be played on heterogeneous devices.

According to a recent Cisco Visual Networking Index report [45] video traffic currently accounts for more than third of Internet traffic, and another third of the Internet traffic is associated with peer-to-peer file sharing networks. It is expected that by the end of 2014 all video traffic (P2P, TV, on-demand, and Internet) will take more than 91 percent of global consumer traffic. It is anticipated that 57 percent of consumer Internet traffic in 2014 will be Internet video, mainly due to the expected advancements in HDTV and 3D video.

Considering the statistics and all other major content delivery services such as YouTube [46], and Hulu [47], and IPTV [48] services, we conclude that content delivery will continue to be one of the most bandwidth, processing and storage consuming applications on future networks. Therefore, any future network architecture has to offer solid solutions for efficient content delivery including smart caching, forwarding, broadcasting, and multicasting live as well as on-demand content (and associated metadata) to a large number of heterogeneous (and mostly mobile) devices.

Chapter 2. Background and Requirement Analysis

2.7

27

Future Networks Architecture

Although the Internet has been the essential infrastructure in modern society and it has enjoyed enormous success in delivering myriad of services, it suffers from major shortcomings in several aspects. These shortcomings are in different areas such as security flaws, mobility support, QoS guarantees, traffic interference and isolation, addressing and forwarding (multicasting/broadcasting) problems. With the introduction of new systems and applications in Internet, these problems will become more significant and will affect the network performance more than before. Although many patch solutions have been proposed for these problems (e.g, firewall, proxy, NAT, TCP-friendly protocols, etc), however it is widely accepted that the current Internet has reached its limits and suffers from ossification [1, 49], and research into proposing new architectures is needed to address the challenges that future networks will face. There have also been several proposals for introducing new network architectures and protocols [50, 51, 52]. Among the proposed architectures, we can mention [53] in which the authors have proposed a new network architecture based on pub/sub model. Also in [54] Palo Alto Research Center (PARC) researchers propose a content-centric networking in contrast to a location-centric view of the network. In this proposed content-centric network, packet address points to a content rather than a location. Our work in this thesis falls into this body of research and we try to study this problem from applications point of view. Our goal is to create environment in which networks can participate more actively in full applications lifecycle including applications creation, deployment and retirement. In the next chapter, we describe our view on future network architectures and we present a new architecture called Application-Oriented Network (AON). To address the challenges imposed by future applications, AON is designed as a converged computing and communication network. To arrive at AON architecture, we considered major trends and architectures discussed in this chapter such as Next Gener-

Chapter 2. Background and Requirement Analysis

28

ation Networks architecture, Service-Oriented Architecture, Content Delivery Networks and mobile networks. One of the major obstacles in introducing new network architectures was and still is experimentation with proposed network architectures in a large scale environment and possibly with massive numbers of end users. To address this problem, there have been several initiatives to build large scale testbeds for networking research. Examples of these initiatives are GENI [55, 56], PlanetLab [57, 58], ProtoGENI (Emulab) [59, 60], and ORCA [61] in the United States, FEDERICA in Europe [62], G-Lab in Germany, and i2CAT in Spain. In the second part of this thesis we present Virtualized Application and Networking Infrastructure (VANI) that is designed and developed based on AON principles and enables experimentation with new network architectures and protocols and distributed applications for future networks.

Chapter 3 Application-Oriented Networking In the previous chapter, we analyzed several trends in computer and communication networking and in particular we discussed how commodity hardware and software led to new paradigms in application creation and to the introduction of numerous applications on the Internet platform. In this chapter, we consider the role that future networks can play to further advance the creation of new applications and services. We introduce an Application-Oriented Network (AON) as a converged computing and communications network that provides flexible and dynamic support to application providers for delivering diversified compositional services. AON support is provided through enriched transport and service strata and a service-oriented approach to utilization of virtualized shared resources. AON is a converged network; in AON we eliminate the separation of computing and communication technologies and combine them in a new approach. In particular, a collection of networking and computing resources can be secured through AON to create a distributed application. Unlike many other networks that deliver their services to the end-users, the AON users are application providers. Application providers, on the other hand, deal with the end-users of their applications. Applications in AON are created through composition of service components and virtualized resources and can be in a 29

30

Chapter 3. Application-Oriented Networking

Management Management Plane Plane

(manages AON & resources in Application Plane (manages AON & resources in Application Plane

Control Control Plane Plane

(allocates resources in Application Plane) (allocates resources in Application Plane)

Application Application Plane Plane

(virtualized resources, and service components) (virtualized resources, and service components)

Figure 3.1: Three planes in an Application-Oriented Network diversified range of applications such as telecommunication services, enterprise services and content delivery networks. In AON, multiple applications are able to coexist and have on-demand access to network resources as well as to flexibly configure and manage these resources while each has different requirements. They might also have short life cycles and can be easily deployed, grown/shrunk in scale and finally retired. In the rest of this chapter, we present a reference model for an Application-Oriented Network. The reference model is designed to describe how the AON goals can be fulfilled and also to describe the framework of collaboration and interaction between the main players in an AON namely service providers and application providers. The AON reference model has three main planes: management plane, control plane and AON user plane, also called AON application plane (Figure 3.1). As we explained earlier, AON users are application providers. Application providers can deploy applications in the application plane using the resources instantiated in this plane. The control plane, on the other hand, is used by the application providers to secure access to the resources and service components in the application plane. The management plane is responsible for managing these resources as well as the Application-Oriented Network. In the next section, we describe AON application plane characteristics, and its architecture.

31

Chapter 3. Application-Oriented Networking

Application ApplicationPlane PlaneResources Resources SC

SC

SC

Facilitating Service Components

SC

Processing

Storage Networking Resources

Figure 3.2: Application Plane Resources

3.1

AON Application Plane

The AON application plane is composed of virtualized resources and service components required for creating an application. These resources can be communication resources (e.g. virtual links), and computing resources such as virtual processing, reprogrammable hardware resources, and storage resources as well as any hardware-based or softwarebased service components needed for creating applications. These service components can include, for example, database services, orchestration services, and content conversion services (Figure 3.2). Other examples of resources are general service components and software-as-service components such as application-specific authentication, authorization, accounting, as well as security-related services such as encryption/decryption services. In the AON application plane, all resources are virtualized and represented by one or more service components. Virtualization, as we explained earlier, is a technique that is

32

Chapter 3. Application-Oriented Networking

used to instantiate a virtual resource that provides the essential capabilities of the real physical resource. The virtual resources are abstracted from the physical resource and can be shared among many users without interference. For instance, a virtual computing resource is a processing resource that might share a physical processing resource with other virtual resources. Virtual resources assigned to each application by AON Control plane, and managed by AON Management Plane

AON AONApplication ApplicationPlane Plane AON Control Control AON

AON Management Management AON

SC SC SC

SC

App2 SC

SC

SC

SC

App3

SC

SC

SC

SC

SC SC

SC

App1

Figure 3.3: Multiple Applications in AON

The virtualized resources and service components expose their functionalities through well-defined open interfaces, such as Web Services, that are platform-independent and can be invoked from heterogeneous environments. Application providers are able to program, configure and compose these resources using SOA technologies according to their own requirements to create a more complex application or a service that can be used by other applications. For instance, an orchestrator service can be built on top of processing and storage services, or as another example, a content conversion service can be built using a virtualized reprogrammable hardware resource. The virtualized resources and service components are assigned to each application per application provider request. These resources are secured for each application provider

Chapter 3. Application-Oriented Networking

33

through the AON control plane. The AON management plane performs the managementrelated tasks such as monitoring and fault management on these virtualized resources. In other words, the AON control and management planes cooperatively create a resource pool for applications in which they can operate. Consequently, multiple applications are able to coexist in an AON while each owns one of the created resource pools. (Figure 3.3). Applications in the application plane can follow any layered network architecture that satisfies their requirements. However, to address the applications requirements discussed in the previous chapter, we propose a generic architecture for the application plane. In proposing this generic architecture, we study the functionalities as well as resources that need to be embedded in the application plane. To arrive at the application plane architecture, we considered major trends in the computing and communication fields (described in the previous chapter), especially Next Generation Networks architecture [32], Service-Oriented Architecture [5] as well as Content Distribution Networks. The AON application plane architecture has the main characteristics of the NGN and SOA architectures: the separation of services from transport and a service-oriented design for the service layer. Traditional transport layers are mostly designed to perform pure digital data delivery between two geographically separated points. As new applications emerge, however, the need for performing content-delivery functions in a network in addition to data-delivery becomes more significant as content-delivery becomes the default and dominating communication transfer mode in future networks. For this reason, the transport layer in AON application plane incorporates content-delivery related functionalities to accommodate content distribution applications. In comparison to the Next Generation Networks (NGN) reference model, the AON reference model can be seen as a new interpretation of NGN principles. In AON, the key principle of NGN that is the separation of the service layer and the transport layer is

Chapter 3. Application-Oriented Networking

34

adapted in the way that it changes the “abstraction level” of the delivery concept in the transport layer from raw digital data-delivery to the more advanced content-delivery. The AON reference model acknowledges the key principle of SOA which is the serviceorientation in the service layer. Therefore, this architecture for applications appears as an evolution from the NGN reference model and SOA. This evolution provides the platform for achieving benefits of both in a converged network. Figure 3.4 shows the AON application plane architecture that includes two main strata and internal planes of an application. As can be seen, within the AON application plane each application can have its own user-plane, control plane and management plane operating on its allocated resource pool. Mg m t Plan e Co n t ro l Plan e Us e r Plan e Se r vice

Tr a nsport

Figure 3.4: Application Plane Architecture

Service Stratum The service stratum in the AON application plane embraces the functions facilitating development and deployment of services and applications as well as service modules and applications. Based on service-orientation concepts, functionalities inside the service stratum will be those that enable the rapid development and deployment of services including search, location, identity, instant messaging and application-specific authentication, authorization and accounting. Other example components in this layer are modules responsible for orchestrating services and creating new complex services and

Chapter 3. Application-Oriented Networking

35

applications. As we stated before, the cornerstone in this layer is the service-orientation concepts and provision of facilities for creation of new applications. In-network service layer components can also include third-party produced services that can be used in serviceoriented application creation. Among these services we can name alternative accounting services, orchestration engines, inter-networking services, instant messaging services, localization services and etc. AON control and management planes provide functionalities necessary for interactions between the application-providers and generic service providers. In the control plane description we discuss this functionality in more detail.

Transport Stratum The main differentiating characteristic of the transport layer in future application-oriented networks compared to conventional networks is the inclusion of content-delivery tasks in the transport layer in addition to the pure data-delivery tasks. As we described in the previous chapters, the majority of global Internet traffic is currently consumed by contentdelivery and file-sharing applications, and these types of applications are becoming the dominant type of application while the traditional one-to-one human communication application is becoming a special-case scenario. Therefore, we propose inclusion of content delivery to accommodate the applications in this class and to fulfill the requirements discussed in the previous chapters such as efficient and smart content distribution, caching, conversion, encryption, decryption, etc. Moreover, this change enables improved and efficient handling of mobility and security challenges in future networks. The inclusion of content delivery tasks in transport layer functionalities implies inclusion of major content-delivery related resources into this layer in addition to traditional networking resources. The most important of these resources are processing and storage that are the most basic needs of future applications in the transport level. In the rest of this subsection, we elaborate more on requirements and advantages of inclusion of these

Chapter 3. Application-Oriented Networking

36

resources into the transport stratum.

Processing In-network processing resources in AON can be used in many content-delivery related functionalities such as content conversion, compression and decompression, encryption and decryption, content validation, content-based routing and content transformation. In-network transport nodes equipped with processing resources can also host a range of essential services for application creation provided by third parties. Among these services we can name pub-sub engines, message-passing services, security engines, conversion services and compression and decompression engines all deployed on the processing resources. The AON control plane provides functionalities necessary for interactions between the application-providers and these third-party service providers. Although most of the current content processing systems are software-based, to fulfill the scalability requirement of future applications hardware-based content processors may be needed to empower in-network content processing functions. These hardware-based content processors have to be configurable, customizable, and reprogrammable to meet application-specific requirements. An example network architecture that will benefit from this capability is the pub-sub architecture, especially for hardware-based rule-matching and content-based routing. 3D video distribution networks or even software-defined mobile networks can also benefit from such reprogrammable hardware resources. The inclusion of powerful processors in the transport layer allows for significant improvements in privacy-related and security-related operations as well. For instance processing intensive tasks such as rigorous security checks on the packets, messages and content can be done in the network to meet the applications security requirements. Later in this chapter, we discuss security concerns in AON in more detail. Processing can also be used in mobile networks to improve quality and efficiency in mobility related functions. For instance, it can be used in performing handover during

Chapter 3. Application-Oriented Networking

37

video streaming to a mobile user. In this scenario, in order to provide a smooth handover experience, multiple format conversions can be performed on a video stream and the generated streams can be forwarded to heterogeneous devices or end-points associated with the user. Combining this with adaptive streaming approaches in content distribution will lead to a far better video and multimedia streaming experiences in future networks. Other application scenarios in which in-network processors can be useful are numerous. As another example, network-coding-based content distribution networks can significantly benefit from in-network processors to perform image processing and content coding/decoding tasks using the general processors, Graphics Processing Units (GPU), network processors as well as reprogrammable hardware resources.

Storage Storage is another basic requirement of most applications especially in content distribution. Applications choose various methods for dealing with the problem of storing content according to constraints such as the content type and target users. Some applications exploit distributed commodity storage resources spread throughout the network such as BitTorrent [41] which utilizes a peer-to-peer based configuration while others use a more centralized approach. Storage is also required for reliable and efficient cache and delivery of content especially to temporarily unavailable nodes in mobile networks [63] as well as in pub/sub systems. Storage is also needed for efficient content multicasting and broadcasting with advanced functionalities such as playback. Content storage together with processing is also useful in live and adaptive streaming applications where a large number of end-users need to receive a metadata-enriched multimedia stream over heterogeneous access technologies and mobile devices. Different conditions in the access network, such as handover or signal loss may lead to temporary disconnection of the mobile node. In a mobile network, in-network and near-to-the-

Chapter 3. Application-Oriented Networking

38

user storage capabilities become very useful especially for performing smart and efficient content caching. The need for in-network storage in delivering content to mobile nodes have also been discussed in [63] in which the authors have proposed a clean-slate cacheand-forward architecture for video delivery in mobile networks.

Networking Networking resources are the resources that provide pure data-delivery between different resources inside an AON or between the AON resources and applications end-users. These resources have been traditionally the main part of transport layers. To satisfy future applications data-delivery requirements, in an AON the applications are able to specify their networking requirements. Many applications require guaranteed network connections while others prefer the traditional best-effort connectivity. For instance, the Internet transport has been designed based on a simple best-effort packet-forwarding approach. Although there have been many efforts to introduce advanced QoS guarantees in IP-based transport networks, many applications are continuously finding this model of data-delivery simplistic and insufficient. AON enables the application providers to request for different levels of Quality of Service by specifying rate, delay, etc. Configurability is one of the main requirements of an Application-Oriented Network. Therefore, in an AON, unlike traditional transport layers, the application providers are allowed to even configure the data-delivery network topology to adapt to applications requirements. This functionality has been also demonstrated in CANARIE network using User Controlled Light Path (UCLP) Web Services [64, 65]. An Application-Oriented Network also has to provide different levels of communication services in varying granularity to applications that require such services. These communications services include (but not limited to) optical light path connections, circuit-switch, packet-switching or MPLS-based connections, multicast and broadcast network links, and connections to multi-homed end-points. The networking resources in AON are virtualized and made

Chapter 3. Application-Oriented Networking

39

available to application providers through well-defined and open interfaces of the AON control plane. In comparison to the current CDNs, the combination of content-delivery (processing, storage) and the data-delivery (networking) in AON transport layer enables a more advanced content-delivery that can be customized for an application and a specific content. The inclusion of processing, storage, and content-delivery in general, in future networks, is the true manifestation of the convergence between computing and communications resources in an Application-Oriented Network, as we discussed in the previous chapters.

3.2

AON Control and Management Planes

The AON reference model is composed of an application plane (which itself can have multiple stacks of applications) and a control plane which is responsible for dynamic allocation of application plane resources to each application (shown in Figure 3.5). The AON control plane is also responsible for fast failure management while the management plane performs long-term management tasks on the application plane virtualized resources such as provisioning, re-provisioning, prediction, and pricing, and fault monitoring and management. The three-plane model is traditionally used in telecommunication networks to draw a boundary between different functionalities required in high-quality operation of a network. The Internet Protocol and TCP/IP model of communications, however, lacks a clear identification of the control and management plane. While this fact has been a major advantage point for IP and has enabled IP to grow in scale and be managed, it is generally believed that the lack of a well-designed control and management plane in the Internet will be a major reason for replacing this architecture in future networks. Therefore, AON success directly depends to its control and management plane architecture and the flexibility, scalability and the type of functionalities that they can provide.

40

Chapter 3. Application-Oriented Networking

Mg m t Plan e Co n t ro l Plan e Us e r Plan e

Se r vice

Se r vice

Tr a nsport

Tr a nsport

AONControl Control AON Plane Plane

Mg m t Plan e Co n t ro l Plan e Us e r Plan e

AON Management AON Management Plane Plane

AON AONApplication ApplicationPlane Plane

Multiple Applications (stacks) in Application plane

Figure 3.5: Application-Oriented Network Reference Model The most important feature in Application-Oriented Network is that it has to be configurable and application-oriented. The AON control plane provides mechanisms for on-demand allocation and configuration of the network resources per application provider request. For example, in most conventional networks, transport-level network topologies are mainly determined by the topology of physical links connecting the routers. In an application-oriented transport layer, however, network topology should be determined based on the application requirements. It is also possible for an application to dynamically change the topology according to various factors, such as changes in load or operation costs (e.g., power). Another main functionality of the AON control and management planes is to enable interaction between application providers and service providers. Service providers can provide services which can be used by other application providers. Therefore, AON provides the main functionalities and a framework for these types of interactions and handles the accounting, management and monitoring issues related to it. Service providers can register their services in the management plane and the control

Chapter 3. Application-Oriented Networking

41

plane allocates them to the application providers on-demand. The control and management plane together handle the authentication, authorization and accounting aspects of this interaction between the application providers and service providers. In other words, the service producers do not need to know the identity of the service consumers and vice versa. In AON, services follow generic, platform-independent and well-defined interfaces so that they can be allocated, controlled and managed by the AON control and management plane. The generic interface is also needed to enable application providers to incorporate the resources and service components in their applications as simply as possible. In AON, each class of resources follows a generic interface template. For instance, programmable resources (e.g., processing resources) can follow one generic interface, while storage resources can follow another generic interface. Network resources are another class of resources that need their own generic interface. Another important function in AON management plane is the monitoring and measurement operations. The monitoring and measurement need to be done in order to control QoS and moreover, it is required for applications to adapt to the changes in the environments that they operate in. All the communications between the control plane and management plane need to be secured and authenticated so that the security risks in an AON are minimized. In terms of interactions between the application providers and the AON control and management planes, all such communications are done through secure channels and different levels of authentication and authorization are performed to enable secure access to AON resources. Also different access levels for application providers are defined to efficiently authenticate and authorize the usage of the resources in the application plane. The AON control and management plane then allocates the resources to the application providers based on their profile limit, and provides the isolation between the resources allocated to different applications in the application plane so that they can not interfere with each other’s

Chapter 3. Application-Oriented Networking

42

operation. This also stops applications from performing a security attack, such as sniffing, on another application. In the application plane, however, the application providers are free to follow any security approach they choose and AON does not limit them to use a particular method and technique. The communication between the application providers and the resources in the application plane extend beyond AON security checks and AON does not impose any specific protocol, format or encryption technique on these interactions as well.

3.3

Application-Oriented Routers

In this section, we focus on Application-Oriented Routers (AORs) which are network-level nodes in Application-Oriented Networks. We examine the types of functionalities which should be embedded in these emerging network elements. We also discuss several use cases of AORs, based on the proposed network architecture including AOR in enterprise networks as well as in telecommunication networks and content delivery networks. Due to inclusion of content-delivery functions in the transport level, applicationoriented routers are not only able to perform the conventional routers’ task of pure data delivery, but also perform content-delivery related tasks. The AOR operations also include the previously described tasks of content processing and content storage. In order to meet the high throughput and low latency requirements, applications need processing technology that goes beyond the conventional software processing. For instance, in case of XML processing, relative to transactional database processing, it has been found that the desired response times and transaction rates cannot be achieved without major improvements in XML parsing [66, 67]. To reach the required throughput, AORs will exploit hardware techniques for processing intensive operations especially in the form of hardware-based XML processing, validation, transformation, encryption, decryption, compression, decompression, and content-based routing [68]. Thus AORs

Chapter 3. Application-Oriented Networking

43

emerge as a networking component that is high in performance, has high reliability, and includes traditional layer three routing capabilities as well as the described contentdelivery tasks. Another important requirement for application-oriented networks is the ability to configure the network elements based on the applications requirements. In most conventional networks transport-level network topologies are mainly determined by the topology of physical links connecting the data routers. In an application-oriented transport layer, however, network topology should be determined based on the application requirements. Some applications are best suited for a flat peer-to-peer topology and others might require a hierarchical architecture. In AON applications share the same infrastructure for application development and deployment. However, there is no “one size fits all” configuration in AON , and network resources and elements are configured based on the applications’ requirements. As a result, Application-Oriented Routers are not pre-configured devices which provide some basic functionality to all applications and force the application providers to follow the predefined configuration. On the contrary, AORs open the doors of the network level entities to application providers and give application providers the option to configure their allocated resources as they prefer. This will be achieved through virtualization of resources and providing well-defined open interfaces to configure the resources ondemand. Figure 3.6 shows our overall view of AORs in an Application-Oriented Network. As it can be seen multiplicity of applications share a converged communication and computing infrastructure as well as the hardware and software components embedded in applicationoriented routers. Each application in this view has a resource pool in a set of AORs and the end-users and terminals can be part of one application or more. Application-oriented routers will also have to meet many of the traditional requirements of existing service provider infrastructure. The traditional capabilities to engineer

44

Chapter 3. Application-Oriented Networking

Enterprise Nodes

App.1 A O R

App.2

A O R

A O R

A O R

App.3

Application Oriented Network

Figure 3.6: Overall view of an Application-Oriented Network with multiple AORs and applications reliability and performance into the overall system will be needed, as will the incorporation of novel self-management mechanisms that reduce operating expenses.

3.4

Application-Oriented Routers Use Cases

The success of Application-Oriented Routers depends on the value they add to the current applications and also the facilities they provide for future ones. Therefore, in this section, we present some use cases for application-oriented routers in the context of the proposed architecture for application-oriented networks.

3.4.1

Telecom Service Providers

IP Multimedia Subsystem (IMS) [34] is one of the major candidate architectures for the next generation telecommunication networks. In this subsection we focus on the potential contributions of AOR to IMS networks. IMS is an effort by the telecomoriented standard bodies to realize the NGN concepts and extend the new control plane

45

Chapter 3. Application-Oriented Networking

User A AOR

Application Oriented Network

AOR

User B

Figure 3.7: Telecommunication services in an AON to any access network and it presents a natural evolution from the traditional closed signaling system to the NGN service control system. IMS was developed for controlling the access to services by customers of Third Generation wireless access networks. In the IMS approach, servers, in the user’s home service provider’s network, control access to all services. Consequently, the service provider can determine what services are delivered at what quality level and at what cost. In the context of future application-oriented networks, IMS service providers can utilize the content-processing and storage functionalities embedded in the AORs to increase the quality level of their current services and to introduce new services using newly available functionalities. Among these newly available functionalities we can mention content transformation and transcoding, which enable connectivity between heterogeneous devices, and also content multicasting and other sophisticated content processing tasks such as encryption/decryption, pattern matching, and compression/decompression. The need for these types of network-support for content delivery has recently gained more attention as in [69] the authors have proposed the idea of network support for content-delivery for ambient networks that is aligned with our view on application-oriented networks as well. For example, consider a case which is shown in Figure 3.7. User A has an active

Chapter 3. Application-Oriented Networking

46

multimedia session with user B. At one point, user A decides to change his or her device from a SIP Phone with a limited set of capabilities to a more powerful device like a laptop or to another SIP phone device with a different type of capabilities. To do so, user A initiates a transfer procedure and transfers the session from his or her first device to the second device. If for any reason, User B’s device does not handle the coding required for the new device the transfer procedure would fail. Also, the transfer procedure might be unsuccessful due to the incompatibilities between protocol stacks.

The storage and processing capabilities in AORs will be very useful in handling mobility and security related tasks in IMS networks. For instance, in the above scenario and in the case of hand-off, content can be temporarily stored in AOR since the user device might be temporarily disconnected from the access network. In this scenario, smart caching combined with adaptive stream processing in AORs enable a fast and efficient connection resume phase in which the user experiences minimal disruption.

Issues like content transformation from one format to another can be solved in a much easier way using AORs. For example, an intermediate AOR is able to perform the necessary conversion between different media formats, or to transform one SIP message to another. Also, it can compress, decompress, encrypt or decrypt the content. In addition, another important common task which is processing intensive is content validation, which can be performed in AORs.

Another use case for AORs is content-based policy enforcement. For example, if user A in Figure 3.7 sends a high priority message in an emergency situation, an AOR, based on a policy, can identify the priority of the message and treat it in a way different than ordinary messages. In another use case, if a SIP device needs to access an XML-based service, an AOR can perform the transformation required between the protocols.

47

Chapter 3. Application-Oriented Networking

Service Provider

Service Requester

Application Oriented Network Enterprise Service Bus AOR

AOR

37

Figure 3.8: Enterprise Service Bus and AON

3.4.2

Enterprise Networks

Enterprise Service Buses (ESB)[70] are used to provide necessary functionalities for deployment of enterprise applications, mainly in the context of Service-Oriented Architecture [5]. In the context of Application-Oriented Networks, ESBs can perform their processing-intensive tasks on the Application-Oriented Routers. These tasks include content validation, compression/decompression, pub/sub message delivery, rule-based matching and content forwarding, content encryption/decryption, and content transformation. Current proprietary ESBs use XML-processing appliances to do the content processing tasks, especially encryption/decryption tasks and content transformation [71]. These XML-processing appliances, however, are not standardized and not available to small vendors, and also are not affordable by many enterprises. In an Application-Oriented Network, an enterprise can exploit the content processing facilities, embedded in the

Chapter 3. Application-Oriented Networking

48

AOR, to increase its application’s quality and decrease expenses, eventually leading to better quality of services and reaching very large scales. In addition, enterprises can use the storage capacity provided in the AORs to store their content with lower costs and for reliable message delivery to temporary unavailable nodes in pub/sub model of message delivery, especially to the mobile nodes. For example, consider the case which is shown in Figure 3.8 where an enterprise is using an ESB to support its SOA-based operations. One of the main tasks usually done by an ESB is the content validation, which is very processing intensive and thus places much burden on an ESBs’ shoulders. An AOR can perform this task for the ESB. It can also perform security-related tasks such as content encryption and decryption as well. Another use case for AOR in an enterprise environment is content-based routing and multicasting. This functionality is valuable when there is a request for an unspecified service and AOR can forward the request to a server based on a policy in a way similar to a distributed ESB. Having seen these use case scenarios, we can conclude that by using AORs, ESBs and enterprise networks can scale and adapt to the business demand with high agility.

3.4.3

Overlay Networks and Content Distribution Networks

In the context of media and content distribution networks, one of the major issues is effective content caching and multicasting, especially for live content streaming [72]. In Application-Oriented Networks, application providers can utilize AORs’ capabilities in both content storage and content processing, and deliver high quality services to their users. Among these applications, we can point to the Video-On-Demand and TV applications. AON content delivery can cover functionalities provided by traditional CDNs such as Akamai, and it can also cover advanced content delivery functions by enabling application providers to create their own application-specific CDN architecture using features such

49

Chapter 3. Application-Oriented Networking

AOR

AOR

Application Oriented Network

AOR

P2P Network

Figure 3.9: Peer-to-Peer network in AON

as locality and identity to provide a customized content delivery for their users. Users can interact with the content, and other users, and can produce metadata that can be used by other application users. As another example, we can mention peer-to-peer networks and other flat or hierarchical overlay networks which are currently unable to flourish due to their need to robust nodes with processing and storage capabilities in the network. An example of these hierarchical architectures has been studied in our research group [73]. In this structure, unpopular content can be stored on smaller number of powerful nodes, while popular ones are copied on many computers at the edge of the network and distributed in a peer-to-peer topology. AORs content delivery features allow these types of new network architectures to flourish. In peer-to-peer networks, AORs can be used to store critical data which is needed to be stored in a robust node. In addition, as it is shown in Figure 3.9, AORs can be used to store popular content based on the usage patterns and users’ locations. As a result, these networks can deliver better quality services while being efficient on the bandwidth usage in the network. In this application scenario, AORs’ content storage functionality can be used to store content in the places near to the interested users.

Chapter 3. Application-Oriented Networking

3.5

50

Related Work

There have been many attempts on developing new architectures for future networks [50, 51, 52, 53, 54], however, our proposed AON is different from many other proposals. This is mainly because AON is designed to serve application providers instead of end-users, and allows multiplicity of applications with different internal architectures to coexist over same shared virtualized infrastructure. In other words, multiple proposed network architectures, each with its own reason for their design and existence, can be deployed inside an Application-Oriented Network. Nevertheless, among the proposed network architectures, we needed to distinguish our work from two related works. The first network is the Cisco’s Application-Oriented Network [74]. Cisco has introduced its Application-Oriented Network line of products that performs on-the-edge XML processing functionalities especially for Cisco’s enterprise customers. Application-Oriented Network that we defined in this chapter is different than Cisco AON in many aspects. In our AON we provide content-delivery in addition to message processing, and we provide a framework for allocating virtualized in-network resources to different application providers. In other words, we create a network of computing and communication resources and allocate them to applications on-demand. Another related work is the Slice-based Facility Architecture (SFA) proposed by GENI [75] for federating network research testbeds in the United States. In SFA, researchers are able to request instantiation of a slice of testbed resources across a federated network of testbeds to perform an experiment on a new network architectures. SFA and AON architectures are different in many ways. SFA’s goal is to enable experimentation over multiple testbeds, and in this regard, it is not designed to address future network challenges. SFA does not have a three plane architecture, and clear statement about content delivery support in future networks. Moreover, SFA does not acknowledge the fact that the service-oriented application creation paradigm and its related set of technologies are crucial in future networks success and is central in addressing future networks require-

Chapter 3. Application-Oriented Networking

51

ments. The last but not the least important difference is that SFA is not concerned with the management of the resources to deliver the required quality of service.

Part II Virtualized Application Networking Infrastructure

52

Chapter 4 Virtualized Application Networking Infrastructure In the past few years, the idea of clean slate network design has been circulated in the networking community and there have been several proposals for introducing new network architectures and protocols [50, 51, 52]. One of the major obstacles in introducing new network architectures was and still is experimentation with proposed network architectures in a large scale environment and possibly with massive numbers of end users. To address this problem, there have been several initiatives to build large scale testbeds for networking research. GENI [55] is one of these initiatives that tries to create a testbed by federating different testbeds such as PlanetLab [57, 58] and Emulab (ProtoGENI) [60] on top of a research dedicated network in the United States. GENI is still in the design and development phase, but currently it follows a slice-based architecture [56, 75]. In GENI different testbeds would be able to connect to each other through GENI wrappers. The exact communication protocol between the GENI wrapper and the testbed is left to each testbed’s control plane and currently there are a few major control planes in GENI that are trying to federate using the wrappers. 53

Chapter 4. Virtualized Application Networking Infrastructure

54

Probably among above testbeds PlanetLab [57] is the most developed. PlanetLab provides edge hosts on Internet and implements a slice-based architecture using the Linux vServer [76] technology. PlanetLab, however, does not have a clear solution for experimentation with new layer three protocols, and it’s not clear how it would facilitate building high scale new routers that would need hardware-based acceleration. In Canada, there is a research dedicated optical network called CANARIE [77] that provides light paths connecting universities and research centers across Canada. CANARIE has sponsored design and development of a User Controlled Light Path [65] (UCLP) software that enables researchers to configure CANARIE network elements through Web Services (WS) interfaces on-demand. Another major initiative is FEDERICA [62] in Europe that is under development through federation of several research network platforms in Europe such as i2CAT in Spain and HEAnet in Ireland. FEDERICA uses WS-based UCLP software for creating on-demand virtual networks atop of involving test platforms. Another project for experimentation with lower layer protocols and networking algorithms is NetFPGA [78]. NetFPGA is a PCI card with a Field Programmable Gate Array (FPGA) chip, and four Gigabit Ethernet interfaces that could be used for developing different networking components such as a layer three router or a hardware accelerator. In this chapter, we present a new testbed for networking experiments and networked systems. This testbed is different than the above mentioned projects in several aspects. It benefits from a novel architecture for control and management functions capable of managing various hardware-based and software-based resources. It also allows experimenting with new network architectures that require in-network content processing and storage capabilities. Moreover, it includes a new high performance and high throughput hardware resource that makes experimentation with hardware-based or hardware-accelerated networking algorithms and protocols as easy as experimentation with software-based

Chapter 4. Virtualized Application Networking Infrastructure

55

protocols.

Our vision in designing this testbed was to develop a converged application-oriented computing and communications infrastructure to support an open applications marketplace. We investigated architectural aspects of this application-oriented network and presented a proposal in the first part of this thesis. We also investigated autonomic management issues and proposed an approach using virtual networks in [4, 79].

The essential aspects to enabling the above application-oriented environment are: 1. Service-oriented application creation; 2. Infrastructure as a Services methods for configuring and scaling resources to support applications; 3. Virtualization of physical resources.

Based on this view of an Application-Oriented Network, we began the development of a testbed that would allow university researchers and application providers to develop new networked systems and networking architectures. This testbed, Virtualized Application Networking Infrastructure (VANI), allows the creation of virtual networks of computing and communications resources. A VANI node consists of resources such as processing, storage, networking and programmable hardware. A service-oriented control and management plane allows VANI nodes to be interconnected into virtual networks to support applications operating in the applications plane.

In the rest of this chapter, we describe the main requirements in VANI design, and its architecture and main components. Also, we explain how our design would satisfy the requirements. Moreover, we present the performance evaluations on the developed resources for this infrastructure including a virtualized reprogrammable hardware resource that enables hardware-based experimentation of networking algorithms and protocols.

Chapter 4. Virtualized Application Networking Infrastructure

4.1

56

VANI Design Requirements

Virtualized Application Networking Infrastructure (VANI) is a testbed that allows university researchers and application providers to utilize its internal resources to rapidly create and deploy networked systems, and to even experiment with new layer three protocols. Although the underlying concepts of the VANI testbed comes from our view on Application-Oriented Network [17], but networked systems running in VANI environment could follow any architecture in any networking layer. The only limitation that the researchers are facing in VANI is that their experiments should run on top of Ethernet as their layer two. Next, we describe the main requirements in designing VANI. The first requirement for VANI testbed is that it should allow experimentation for future network architectures that might not fit into the traditional layer three definitions. Currently networks are primarily responsible for delivering raw data but in future it would be possible for future network architectures to shift-up the network tasks to new functionalities that might be required by emerging applications. Among these functionalities could be the task of content-delivery in addition to data-delivery (such as the network architecture discussed in [17]) that would imply having content processing and storage functions in the infrastructure. The second main requirement was to allow researchers to experiment with new layer three protocols (as in the traditional definition of L3) instead of the current Internet Protocol. To do so, we designed the testbed assuming that everything above layer two could be redesigned and experimented with, and we chose Ethernet protocol as the basis of our layer two design. Another main requirement in the testbed is to be able to setup experiments or create new applications rapidly using already developed and ready to use components that could be accessed through open interfaces. These components could be the virtualized resources such as processing, low-latency hardware processing, and accelerator nodes, or software components such as event processors that are used in many experiments for

Chapter 4. Virtualized Application Networking Infrastructure

57

Future network arch

Rapid exp setup/ app creation

Isolation/ Security

VANI

Testing new L3 protocols

Monitoring/ Testing

Figure 4.1: VANI design requirements data gathering and analysis. This requirement could be satisfied through the use of the SOA technologies and standards that could allow flexible and dynamic composition of reusable service components. The fourth main requirement was to provide an isolated and secure environment for researchers to carry on their experiments and develop their networked applications. This requirement has to be satisfied at different levels such as traffic separation, bandwidth allocations, storage access, secure access to the physical resources and isolation between different physical resources. The fifth main requirement was the monitoring and debugging mechanisms. In our design, we envisioned powerful complex event processing components that could be customized to gather and analyze test and debugging data for each experiment separately as well as for the testbed itself.

4.1.1

VANI Architecture

Based on these main requirements, we designed a two plane architecture for our platform: control and management plane (VANI-CMP) and applications plane (VANI-AP). VANI-CMP is responsible for virtualizing physical resources and allocating them to

Chapter 4. Virtualized Application Networking Infrastructure



Two main planes –



All resources needed for experiment setup are in app plane Control/Management used for allocating a resource pool to a researcher/application provider Applications can have their own architecture inside app plane –

Example applications: a new network instead of IP network with a new layering architecture, or a new Content Delivery Network

Application Plane



Control and Management Plane Application Plane

Control & Management Plane





58

Figure 4.2: VANI architecture

the researchers and application providers. On the other hand, researchers deploy their applications and experiments in the VANI applications plane (VANI-AP). Applications operating in the applications plane can have their own architecture inside an applications plane slice that is created by VANI-CMP. For example, an experiment/application could be a new layer three protocol that covers OSI layer three and four functions, could replace TCP/IP layer, or could be a new content delivery network. Figure 4.2 shows this architecture including its two planes. All virtualized resources and service components that can be used by researchers for creating an application reside in the applications plane. Researchers can ask for these resources through the testbed control and management plane and then they can directly connect to the virtualized resource in the applications plane through any resource specific protocol such as HTTP, UDP/IP, or ssh. For example, a user can ask for uploading or downloading of a file to the storage service through the control plane, and then if permitted by the control plane, it has to directly contact the storage file service using HTTP/TLS connection and download or upload its files.

59

Chapter 4. Virtualized Application Networking Infrastructure

Researcher Web Service-Interface

WS

UDP/IP

SSH HTTP/SSL

WS

WS

Virtualization Layer Virtualization Layer

Virtualization Layer Virtualization Layer

proprietary

Virtualized Resources

proprietary

Virtualization Agents

Phy Res

Phy Res

Phy Res

Phy Res

Phy Res

Phy Res

Phy Res

Phy Res

Physical

Application Plane Phy Res

Control and Management Plane

Phy Res

Virtualization Agents

Virtualized Resources Virtualized Resources

Test-bed Control and Management Test-bed Control and Management

Resources

Figure 4.3: Researcher interaction with VANI planes

VANI control and management plane (VANI-CMP) is responsible for allocating testbeds resources to the researchers. Researchers ask VANI-CMP for a resource using VANICMP’s Web Service interface. WS interface is chosen due its universal acceptance for SOA, and the abundance of available tools for orchestrating and creating new applications using independent Web Services. After receiving the requests for resources from a researcher, VANI-CMP authenticates the researcher and authorizes its request and then sends the request to the resource virtualization layer. The resource virtualization layer is the layer which abstracts a physical resource and offers it as a service to the control and management layer. If the allocation is successful, VANI-CMP records the allocation, and replies back to the researcher with a successful return result.

Chapter 4. Virtualized Application Networking Infrastructure

60

VANI-CMP also programs and releases the resource whenever an authorized researcher wants to do so. Figure 4.3 depicts the logical view of the VANI testbed and how a researcher interacts with VANI planes.

4.1.2

Current Physical Resources in VANI (VANIv1 Resources)

Currently, several physical resources have been virtualized and made available to VANI users. In [8], the design and development details of these resources have been presented, and here, we briefly overview these resources and type of functionalities that they can offer to researchers. In VANI all physical resources are virtualized. Through virtualization, we separate applications from their underlying physical resources. To do so, we developed a virtualization layer and virtualization agents for each physical resource as shown in figure 4.4. The task of the virtualization layer is to coordinate the system wide virtualization of a resource and to expose the resource as a service component with Web Service interface to the rest of the system, and the agents task is to launch or destroy the virtual resources on top of each physical resource. The first physical resource that we have virtualized is the reprogrammable hardware resource. To develop this resource we have used BEE2 boards [80]. Each BEE2 board has four high-end Xilinx Field Programmable Gate Arrays (FPGA) each connected to four 10GE interfaces. We have virtualized all four FPGAs in a BEE2 board so that a researcher could ask for one or more FPGAs and program it as s/he likes. Researchers can ask for an FPGA through the control plane and then program it, configure it, or release it. They also have access to the libraries for controlling the 10 GE interfaces and some other commonly used hardware blocks such as DDR2 memory modules. After programming an FPGA, a researcher can directly connect to the FPGA through the 10GE interfaces according to whatever protocol designed for that FPGA. For example, a researcher can use one FPGA or all four FPGAs to develop a layer three router

Chapter 4. Virtualized Application Networking Infrastructure

61

Web Service interfaces

Virtualization Layer 10GE Fabric

Virtualization Layer

Virtualization Layer

Processor Blades FPGAs/BEE2 Storage/Fileservers Virtualization Sub Agents

Figure 4.4: Virtualizing physical resources in VANI with 4x10GE ports or 16x10GE ports, or a content-based routers that routes packets based on the packets payload rather than their headers. We present the performance evaluation results for this hardware resource in the performance evaluation section of this chapter. Another physical resource in the VANI testbed is the processing resource. The processing service is developed based on Linux vServer [76] technology. Linux vServer is an OS-level virtualization software that creates a virtual processing node on top of a Linux kernel. Researchers are able to get a processing resource through VANI-CMP, and release it whenever they wish to do so. Once a virtual processing node is allocated, the researcher can directly ssh to the node. Researchers are also able to program the virtual processing node with a specific image, create an image of their own, and save it on the storage service, and share it with others or program other virtual nodes with that image. We have also virtualized the internal fabric of the testbed for creating virtual networks. The internal fabric consists of a set of high capacity Ethernet switches that are able to

Chapter 4. Virtualized Application Networking Infrastructure

62

isolate traffic between different applications and experiments by creating separate virtual LANs. Moreover, it allows different experiments to intercommunicate by creating shared virtual LANs that all have access to. This resource, together with the processing resource, enable VANI to guarantee the bandwidth for an experiment. Later in the bandwidth guarantee section, we will discuss this feature in more detail.

The gateway and bridge resource is another developed resource that enables communication between different VANI nodes. If one of the resources in VANI needs to be accessible from the Internet or from a resource in another VANI node, it can ask for a public address through the gateway service and get an address for duration that the external access is needed. The researcher can release the public address when it is no longer needed.

Th bridge service is used for experiment involving new layer three protocols on top of Ethernet network. Using the bridge service, a researcher can send and receive layer two Ethernet frames to any other VANI node, and hence, would be able to develop and test new layer three protocols over a wide area network. This functionality would only be available if the VANI nodes are connected using a wide area Ethernet network. We will discuss this case later in more detail.

Another physical resource developed for VANI is the storage resource. Storage resource is implemented on a set of distributed file servers that emulates one big storage server. Researchers are able to connect to the storage service through VANI-CMP and then directly connect to a file server for uploading and downloading files. All the direct communications to the file servers for uploading and downloading files are done over a secure HTTP/TLS connection. Researchers can use this service to store images for programming other resources such as processing resource, and reprogrammable hardware resource, and they can also share file with other researchers through this service.

63

Chapter 4. Virtualized Application Networking Infrastructure

Researcher Start

getResource

Control

Virtualized Resource

Auth/Authz getResource Accounting/ Record Keeping

programResource

Auth/Auhtz programResource

Auth/Authz

Resource

Direct Connection to Resource releaseResource

releaseResource Accounting/ Record Keeping

Figure 4.5: A sample interaction between a researcher and VANI to secure a resource

4.1.3

Example: Requesting a Resource in VANI

Figure 4.5 shows a sample message exchange scenario between a researcher, the VANI control and management plane and physical resources inside a VANI node. A researcher starts requesting for a resource by invoking the getResource operation of the VANI-CMP WS interfaces. In that request, the researcher includes the type of resource, the duration and number of required resources. VANI-CMP authenticates and authorizes the request and forwards the request to the resource. All resources in the testbed expose their operations to VANI-CMP through a generic WSDL interface. This makes it possible to easily extend the types of resources and services in the testbed without changing the control and management software. The resource responds back to the control plane request with a success result, and a Universally Unique IDentifier (UUID) for the resource. The control plane stores this returned UUID and passes it to the researcher. The researcher can program the resource

Chapter 4. Virtualized Application Networking Infrastructure

64

identified by returned UUID, and release it at a later time. In the next section, we delve into the control and management design and we describe its main functionalities in detail.

4.2

VANI Control and Management Plane (VANICMP)

In this section, we describe in detail the main functions of VANI-CMP. We also discuss the main technologies that are used in design and development of VANI-CMP. These technologies are mainly the SOA-based technologies such as Enterprise Service Bus (EBS) [81], and Business Processes Execution Language (BPEL) [25] orchestrator engine. VANI-CMP is responsible for performing AAA operations and allocates resources to the researchers and application providers. In addition, it performs user management functions, and stores and manages the testbed configuration data. It also has a registry for all services and resources that can be used by researchers for creating a new application or experiment setup. VANI-CMP is designed and developed using BPEL and deployed on an Enterprise Service Bus. Similarly to the resources and services inside the testbed, all internal components and functions of VANI-CMP have also been developed as independent service components, and are accessed through Web Services. The use of ESB and Web Services enables VANI-CMP to be easily extended in functionality and accessed through other types of interfaces in the future. This design choice also enables independent development, testing, and redeployment of internal functions of VANI-CMP such as AAA operation, configuration management, etc. Moreover, the use of BPEL language for VANI-CMP enables a high level description of the VANI control and management operations. This enables rapid and easy modifications of the control and management logic.

Chapter 4. Virtualized Application Networking Infrastructure

65

In the next subsections, we examine each of the functionalities of the control and management plane and we describe the design steps and interfaces of each of the modules.

4.2.1

User Management

Three concepts are used to manage users in VANI: application plans, service levels, and plan administrator levels. Application plans are used to show different experiments and to organize resources and resource usage in each experiment. When booking a resource, the researcher must specify which plan (experiment) the resource is being booked on. Any researcher belongs to a service level which governs what control operations s/he is allowed to call and also how much of each resource s/he is allowed to book. Custom service levels may be designed for specific users in order to maintain flexibility. Lastly, plan administrator levels are used to govern access to certain resources. Resource users will be granted specific levels of access defining their ability to release, program, save, etc.

4.2.2

Authentication Authorization Accounting

The control software is responsible for handling authentication of users. All operations in the control plane require users to provide credentials. Currently, credentials are in the form of a user name and password combination however the implementation allows this to be easily changed. On every call to the control software, the user is authenticated and a check is made to ensure that the user has the rights to execute the requested operation. In addition to authentication, the control software is responsible for authorizing access to resources. Every access to a resource consists of two checks, ensuring the resource belongs to the user, and the user has the rights to manipulate the resource as requested. In order to prevent outsiders from directly accessing resources and bypassing the control plane, all requests to resources require credentials known only to the control plane. This credential is generated when resources are initialized.

Chapter 4. Virtualized Application Networking Infrastructure

66

The control software keeps a record every time a resource is booked or released. This keeps an account of which resource was used by which user (on which plan) and for how long as well as all resources currently in use. Resources are identified by a UUID generated by the resource and passed back through the control plane.

4.2.3

Resource Allocation

Resources are booked through the control plane whether the user is a researcher or an application provider building a resource on top of another. Users provide their credentials and specify which resource they wish to book (on which VANI node) and the plan to which the resource will belong. The control plane ensures the user is allowed to book the resource and determines the location (WSDL address) of the resource in the network. A getResource request is then made to the resource. The resource does not know who is requesting the resource as this information is hidden by the control software. If successful, the resource will return a UUID identifying the resource as well as any other relevant data which is then passed back to the user. The UUID is used by the control plane for accounting purposes.

4.2.4

Generic Resources and Registration

New resources can be made available dynamically in the control plane through a registration operation. The new resource must consist of a unique name, a service name, a port name, one or more WSDL addresses, and optionally a JNLP address for the resources GUI. The service and port name are used to create an end point reference which is assigned to the partner link when the resource is to be accessed. The resource may have multiple WSDL addresses if there are different instances of the resource on different VANI nodes. The control software will select the appropriate address depending on which node the user is attempting to access. Lastly, a JNLP address may be included which allows resource creators to design and deploy their own GUI using Java web start

Chapter 4. Virtualized Application Networking Infrastructure

67

Figure 4.6: A sample schema for generic XML content in a getRequest response message technology [82]. In order for resource creators to dynamically add new resources to the control plane, it is necessary to use a generic WSDL interface for all resources. The main objective with the generic interface is to provide a template that makes creating resources easy while providing flexibility. This is accomplished by providing a number of operations, messages that are common between many resources such as get, release, and program. To maintain flexibility, each operation contains an optional XML string which can be used to customize data that is passed in and out (figure 4.6). Furthermore a generic operation is included in the WSDL which can be used to include operations not already included in the template.

4.3

SOA-Based Implementation of VANI-CMP

The Control and Management software is implemented as a collection of web services and BPEL models. The design is modular and flexible allowing for components to be replaced or changed as required. The control plane is a BPEL model wrapped with a WSDL exposing a number of operations for application providers and researchers. Currently, there are five key components, each implemented as a BPEL model: authentication, data store, resource manager, storage manager, and the dynamic partner link generator. In

Chapter 4. Virtualized Application Networking Infrastructure

68

this section, a brief description of each component is provided before focusing on how the components fit together. For more information on each component, please refer to the relevant section. The data store stores all the data required by the control software. This may include user authentication data, resource allocation and accounting, and network data. The authentication component is responsible for checking user credentials as well as ensuring users have the rights to execute operations. The storage and resource manager are used to access the resources on the network. The managers determine the location (stored as a WSDL address) of the resource in the network before forwarding requests to the appropriate location. The dynamic partner link generator is used throughout the control plane to dynamically choose an endpoint reference. This allows calls to be made to different web-services determined at run time (provided they have the same interface). The data store consists of a MySQL database, a BPEL model and three web services: query generator, database, and result processor. The query generator has a number of operations used to generate different SQL queries. The database has one operation which takes in a SQL query and returns the result of the query in XML. This web service has a socket connection to a database subagent which executes the query on the MySQL database. The third web service processes the XML result. The authentication component is implemented using a BPEL model and makes use of some of the operations provided by the data store. It provides a number of operations to check user login credentials, ensure users have the permission to execute the requested operation. In addition, this component is responsible for ensuring users have permission to book or manipulate (release, program etc.) booked resources. The dynamic partner link generator is used to dynamically assign an endpoint reference to a partner link. First, a call to the data store to determine the WSDL address, service name, and port name, which is then passed to a web service which wraps the service name as a QName and the port as an NCName. An endpoint reference is then

Chapter 4. Virtualized Application Networking Infrastructure

69

created using the WSDL address, service name and port name and returned. The resource and storage manager provide an interface for accessing resources available on the network and storage. A call is made to the dynamic partner link generator to dynamically assign an endpoint reference to the partner link depending on which resource is being accessed.

4.4

Security in VANI

One of the basic requirements in VANI design was to make sure the experiments are done in a secure and isolated environment from the other applications and experiments. To create this secure environment we have to consider security issues in various parts of the system architecture. The first part is to secure the communications between the researchers and VANICMP. In VANI all communications between these two entities are encrypted using secure SSL connections and WS-security specification. To do so, each researcher has to share his/her public key with VANI (and vice versa). On top of that VANI-CMP authenticates the researchers and application providers using the credentials provided in all transactions, and then, authorizes the researcher’s access level to the resource. The second part is the communications between the resources and VANI-CMP. These communications have also been encrypted. Moreover, credentials only known to the resource and VANI-CMP are included in all communications from VANI-CMP to the resources. All internal traffic within one experiment is separated from other experiments using tagged Ethernet VLAN s. By proper configuration of the testbed internal fabric resource, we are able to isolate these tagged VLANs from each other. This case is discussed in more detail in the bandwidth guarantee section. Communications inside the applications plane, internal to one experiment, or coming

Chapter 4. Virtualized Application Networking Infrastructure

70

to and from that experiment could be encrypted or not depending on the experiment, and therefore it is outside of the scope of the VANI design. This allows researchers to freely design and develop new encryption and decryption algorithms in different layers inside their application plane slice.

4.5

Guaranteeing Bandwidth in VANI

In order to make sure that one experiment cannot undermine another experiment’s capability to send and receive traffic, we need to have a bandwidth guarantee mechanism in place. Likewise, for communications between different VANI nodes, there should be a rate guarantee in place so that a distributed experiment could have a guaranteed access to the available bandwidth. Since all communication in VANI is carried over the VLAN tagged Ethernet frames, an Ethernet rate limiting mechanism in processing nodes has been developed. By doing so, we limit the rate in which each virtual processing node sends and receives traffic from/to another virtual processing nodes inside a VANI node. To guarantee the send and receive rate, we designed and developed a novel Ethernet traffic shaping system, called Distributed Ethernet Traffic Shaping system (DETS) [9] that we describe in the next chapter. Also the gateway and bridge service controls the rate in which an experiment sends (receives) traffic to (from) the VANI wide area network. The wide area network that is used to connect the VANI nodes would be a research-dedicated network like CANARIE [77] or ORION [83] that can guarantee the aggregated traffic to/from the VANI nodes. If the wide are network was able to provide dynamic and on-demand bandwidth allocation, VANI would be able to use this functionality whenever an experiment asks for sending/receiving traffic to/from the wide area network. VANI nodes could also be connected to the public Internet network, however, bandwidth could not be guaranteed for

Chapter 4. Virtualized Application Networking Infrastructure

71

the experiments in this case. To request a bandwidth guarantee in VANI, a researcher can specify the bandwidth requirements of a virtual processing node in the resource get request. Likewise, a bandwidth requirement can be specified when access to the VANI wide are network is requested. The virtualization layer in VANI control and management plane makes sure that the specified requirements are met when allocating virtual resources to the experiment. If an experiments needs more VLANs it can simply ask for adding a new VLAN to the experiment. Also, if separate experiments or applications need to have intercommunication inside VLANs one of them could ask for creating a shared VLAN through the control plane, and add other experiments to it. Another way of communication between experiments could be through the gateway using the public addresses allocated using the bridge and gateway services.

4.5.1

Interconnecting VANI Nodes in IP Layer

VANI Node

VANI Node

VLAN#30

GW

VR

VR

VR

IP Network VR

VR

VR GW VLAN#10 VR 10.X.X.X/20 VLAN#20

GW

VR VR VANI Node

Public IP Address

Legend: GW = Gateway VR = Virtual Resource

Figure 4.7: Connecting VANI nodes in IP layer

Chapter 4. Virtualized Application Networking Infrastructure

72

Figure 4.7 shows how we can set up an experiment or create a distributed application across a wide area IP network. In this setting, all resources inside an experiment in a VANI node get a local IP address in the range of 10.X.X.X. All resource could send traffic to the wide are network using the NAT functionality implemented in the gateway service (shown as GW in figure 4.7). It is possible to put multiple gateways in place and direct outgoing traffic to different gateways to avoid bottlenecks in the system. On the other hand, if a resource needs to be accessible from the wide area network, the researcher can ask the gateway service for a public address/name, and the gateway service redirects all traffic to that public address to the resource’s internal IP address/VLAN.

4.5.2

Interconnecting VANI Nodes in Ethernet Layer VANI Node

VANI Node MAC1

Wide Area Ethernet Network

BR

BR MAC2

QinQ#100 VLAN#30 BR

VLAN#10

MAC3

VLAN#20

VANI Node

Legend: BR = Bridge

Figure 4.8: Connecting VANI nodes in Ethernet layer Figure 4.8 shows an Ethernet connected VANI. Ethernet connected VANIs use the bridge service instead of the gateway service to interconnect. Inside a VANI node, all

Chapter 4. Virtualized Application Networking Infrastructure

73

resources in an experiment communicate using a specific VLAN which is unique to the VANI node. If an experiment needs to operate across multiple VANI nodes (for instance, to test a new layer three protocol), the VANI wide area network has to be able to transfer Ethernet frames. In this case, a unique Q-in-Q tag [84] would be assigned to the experiment. The bridge service would be used to re-frame the internal tagged Ethernet frames to the wide are Q-in-Q frames and the destination bridge would do the reverse operation, and deliver the Ethernet frames to the destination MAC/VLAN in the destination VANI node. Since Q-in-Q tagged Ethernet frames might not be available in a wide area network, we are able to define public MACs that can be used for redirecting traffic to an internal MAC/VLAN by the bridge service. This functionality would enable any other Ethernetbased experiment to send Ethernet frames to a resource in another experiment through the bridge service.

4.5.3

Experimentation with L3 Protocols

Figure 4.9 shows how the testbed could be used to test a new layer three protocol in a a large scale and distributed environment using proxy nodes. In this setting, the new L3 protocol is tunneled within IP payload to a resource inside a VANI node, and then that resource strips off the IP header and feed the new L3 packet over the VANI wide are Ethernet network.

4.6

SW-Based Resources in VANI

One of the main contributions in our testbed control and management plane is that we could encapsulate any software or hardware resource in our testbed as a service. To do so, the resource can be virtualized, and abstracted as a service component that follows a generic resource WSDL template. Then it can be registered into the control plane and

Chapter 4. Virtualized Application Networking Infrastructure

74

Example: “Red” network protocol stack deployed in slices of VANI nodes & tested to scale Testbed Network

es Acc

st

tun gh IP u o r h

nels

Figure 4.9: Large scale experimentation with new L3 protocols made available to other researchers. Details on how this task can be accomplished have been discussed in the control and management plane section in this chapter. Examples of such resources as a service are any hardware function or resource that could be reused in different applications and experiments such as hardware accelerators for encryption, decryption, content conversion, and content compression/decompression. Also other reconfigurable hardware modules such as NetFPGA could be virtualized and offered to the researchers on an on-demand basis. Other types of processing nodes could also be offered to the researchers as a resource. For example, Amazon Elastic Computing Cloud (EC2) nodes [27], GENI virtual processing nodes, VMWare-based virtualized processing nodes [85], or Graphics Processing Units (GPUs) could be controlled and managed by VANI-CMP. Moreover, software services such as BPEL orchestrator engine and Complex Event Processing (CEP) engine, could be developed and/or deployed on top of current virtual resources and made available to the researchers through VANI-CMP. Currently, we have developed and deployed several software-based resources as service components in VANI. In this section, we briefly go over these resources and we describe what functionalities each resource provides: 1. BPEL orchestrator as a service is able to execute a BPEL project and to orchestrate

Chapter 4. Virtualized Application Networking Infrastructure

75

a composite application. 2. Complex Event Processing as a service is a service that is customizable to receive events from different sources using different protocol (JMS, SNMP, etc..). This service is able to analyze received events and produce notifications and events and send them to different destinations using different protocols. We have used this service for the performance monitoring and analyses of VANI. 3. Database as a service is able to store, search, and retrieve data on-demand. Researchers could get this resource, program it using a database file, and query it by sending SQL commands over WS interface to the database resource. DB resource uses MySQL engine and stores its data on VANI’s storage service. 4. Sensor as a service is able to manage different sensors data and forward them to anywhere a researcher asks for on-demand. For example, a researcher can ask for sensor data for the status of wind or sun in a specific location for a limited time. This allows creation of many new applications and experiments using the sensor service. 5. GENI federation service enables access to PlanetLab GENI resources through VANICMP for the researchers connected to VANI. We discuss this service in the next section in more detail as we describe the interconnection between VANI and GENI.

4.7

Federation with GENI

GENI is an initiative to create a large scale experiment through federation between different testbeds. Federation in GENI is done using GENI wrappers. A GENI wrapper is developed for each testbed and testbeds could connect to each other through them. In VANI, we developed a wrapper for control and management plane, and through that we invoke GENI wrapper operations to get a node on any GENI testbed. We tested our

Chapter 4. Virtualized Application Networking Infrastructure

76

Researcher

VANI-CMP

Virtualization Layer

VANIWrapper

R

GeniWrapper Client

S M A M

GeniWrapper Server

VANI-CMP

VANI/GENI Interface

Phy Nodes

GENI Nodes

Figure 4.10: Connecting VANI to GENI wrapper with PlanetLab GENI wrapper and managed to obtain a PlanetLab processing node through our VANI-CMP. In VANI, researchers are able to get a PlanetLab processing resources using VANI generic resource template. Since PlanetLab does not support storage service, and also does not support other VANI requirements such as processing and bandwidth requirements, access to PlanetLab processing resources would not support these functionalities. Figure 4.10, shows the structure of interconnection between VANI and PlanetLab through the GENI wrappers. Currently, we are in the development phase of offering VANI resources to GENI researchers through the VANI wrapper.

4.8

A VANI Node

A VANI node is composed of the resources described in this chapter, their corresponding virtualization software, control and management software, and the storage service. A

Chapter 4. Virtualized Application Networking Infrastructure

77

VANI node can be totally deployed on a computer cluster composed of normal computing blades, and manageable Ethernet networking elements. The basic resources in a VANI node are the processing resource, the storage service, and the fabric service for the network virtualization that are deployed on a computer cluster. All other resources and the control and management software are deployed on these basic services. In addition, all other software-based resources, and the virtualization layer for resources like reconfigurable hardware resource, and the VANI wrapper for connecting to GENI testbeds are also deployed on these basic resources. The only elements that cannot be found in a normal computer cluster are the reconfigurable hardware resources, the gateway and bridge services, and required 10GE Ethernet switches. These resources are also co-located with the computing cluster to provide the WAN connectivity and to enable running experimentation with the reconfigurable hardware resource. In the future, we will publish instruction manuals on how to connect to the VANI control and management plane and how to access resources through the developed GUI as well as the secure WS interfaces. We will also describe how all features that we described in this chapter can be accessed by application providers including registering a new service in VANI.

4.9

Performance Evaluations

Up to now, we presented the VANI architecture and we discussed different aspects of its design. To find if the currently developed resources can meet VANI design requirements, we performed several experiments on those resources. In this section, we present performance measurements on two key physical resources that have been virtualized and offered to the researchers in VANI. The first one is the reprogrammable hardware resource, and the next one is the processing resource. Our main focus in this part would be to see

Chapter 4. Virtualized Application Networking Infrastructure

78

40 Gbps

User FPGA

User FPGA 20 Gbps

20 Gbps

Control FPGA

40 Gbps

40 Gbps 20 Gbps

20 Gbps

User FPGA

User FPGA 40 Gbps

DDR2 DIMM slot

10 Gbps Ethernet port

Figure 4.11: Reprogrammable Hardware (BEE2 Board) if we could guarantee the promised quality of service to the researchers that use these resources in their experiment.

4.9.1

Reprogrammable Hardware Resource

By introducing a virtualized and reprogrammable hardware resource in VANI, we enable researchers to test new networking algorithms and protocols using high performance and high throughput hardware resources. To do so, we virtualized BEE2 boards developed in the University of California at Berkeley. A BEE2 board consists of one controlling FPGA, and four high capacity Xilinx Vertex-II FPGAs (figure 4.11) that can be programmed by users. Each FPGA has four 10GE interfaces, and 4 GB of memory. In VANI, a researcher can get a set of FPGAs on a BEE2 board, and can ask for on-board inter-chip communication channels which can carry up to 5 GigaBytes per second (GBps). The detailed design of BEE2 virtualization system and introducing it as a resource in VANI can be found in [8]. Here, we present the performance measurements

Chapter 4. Virtualized Application Networking Infrastructure

79

on this resource. The parameters of interest are the programming time of the FPGAs through the virtualization software as well as the speed with FPGAs can send and receive data. The first parameter is the time in which a researcher can program an FPGA through the testbed control plane. Also, we would like to know how this time would change if four researchers want to program all four FPGAs concurrently. To do so, we developed a bitstream that initializes all 10GE interfaces on the FPGAs and starts sending a burst of UDP/IP packets on one of its 10GE interfaces, and we programmed FPGAs through VAN-CMP using the generated bistream for several times. Table 4.1 shows the average maximum programming time that programming one, two, three, and four FPGAs take. As can be seen, it only takes 30 seconds on average to program an FPGA in the case where all four FPGAs are programmed concurrently, and this time is around 11 seconds if only one FPGA is programmed at a time. This fast programming time allows a researcher to get an FPGA with four 10GE interfaces in less than a minute, and to run an experiment and return the FPGA back to the VANI resource pool as soon as it’s not required. The next experiment that we performed is to measure the speed with which the FPGAs can send and receive traffic. To do so, we developed a traffic generator using Verilog hardware description language, and we started sending traffic from one 10GE interface to another 10GE interface on the same FPGA, and we recorded the maximum bandwidth that we could receive in the hardware resource. We also compared this with the traffic statistics gathered by the Ethernet switch connected to the FPGA. We repeated this experiment several times and were able to send and receive Ethernet frames to the rate of 1GBps, which is equal to 8Gbps. The reason that we could not send more traffic is the 8/10 bit encoding mechanism for 10GE-CX4 interfaces, and 8Gbps is the maximum achievable traffic rate per port on a BEE2 board. In our measurements, this rate did not change if all ports started sending and receiving traffic at the same time since separate

Chapter 4. Virtualized Application Networking Infrastructure FPGAs 1 Programming Time (s) 11

2 17

3 24

80

4 30

Table 4.1: Average maximum FPGA programming time

internal modules are controlling each port. This experiment shows that one FPGA alone can send and receive 32Gbps traffic. If a researcher get all four FPGAs on a BEE2 Board it is possible to send/receive traffic in the rate of 4x32=128Gbps. We have used this reprogrammable resource in developing the high capacity gateway and bridge service for VANI, and we have developed a bandwidth control mechanism on this resource that controls and guarantees the rate at which one experiment could send and receive traffic to/from a wide are network. In the future, we will present our design for the gateway and bridge service, and we will present our performance measurements for this service as well.

node01 from/to

UDP

UDP (rl)

TCP

TCP(rl)

node02 (12.50MBps) node03 (18.75MBps) node04 (25.00MBps) node05 (31.25Mbps) node06 (31.25Mbps)

24.5/24.3 24.5/24.3 24.5/24.3 24.5/24.3 24.5/24.3

12.4/12.4 18.8/18.8 25.3/25.3 31.7/31.6 31.7/31.6

15∼35/24.7 15∼35/24.3 15∼35/24.1 15∼35/22.1 15∼35/23.2

12.3/12.3 18.4/18.4 24.8/24.6 31.3/31.1 31.3/31.1

Table 4.2: UDP and TCP traffic measurements in a VANI node in MBytes per second (MBps)

4.9.2

Processing Service and Network Virtualization

Another main physical resource that we have virtualized is the processing service that uses Linux vServer software. There have been studies on processing virtualization techniques [86], and also specifically on Linux vServer [76]. Linux vServer performance evaluations show that this virtualization module has a very low overhead on overall system

Chapter 4. Virtualized Application Networking Infrastructure

81

performance.

VN_1_1

VN_1_2

VN_1_3

VN_1_4

VN_1_5

node01 1GE

VANI Internal Fabric 1GE

1GE

1GE

1GE

1GE

node02

node03

node04

node05

node06

VN_2_1

VN_3_2

VN_4_3

VN_5_6

VN_6_5

vlan#101 Exp#1

vlan#102 Exp#2

vlan#103 Exp#3

vlan#104 Exp#4

vlan#105 Exp#5

Processing Servers

Virtual Processing Nodes

Figure 4.12: Traffic measurement experiment topology

However, since we are also doing network virtualization in addition to the processing node virtualization, we conducted two more experiments that we believe were necessary to show that virtual processing nodes can have guaranteed access to the VANI network. In our experiment, we virtualized cluster blades with dual Xen 1530 CPUs and 2GB of RAM and one 1GE interface. The Linux kernel version that we used was 2.6.16, and we used vServer 2.3.2. patch. The developed virtualization layer allows up to ten virtual nodes on a physical node. For this experiment, we initialized and launched 5 virtual nodes on a node named node01. We also launched 5 other virtual processing nodes on five separate servers with same capabilities described for node01. These nodes are named node02 to node06. Each of the virtual nodes in node01 belongs to an experiment that includes one other virtual node running on one of the other nodes. The topology and VLAN tags for experiments are shown in figure 4.12. In this experiment, we measured the UDP and TCP traffic rate that each virtual

Chapter 4. Virtualized Application Networking Infrastructure

82

node in an experiment could send and receive in different cases. The first case is to find out the maximum achievable rate when no limit is placed on the traffic rate and only one experiment is active. This rate is 122MB per second (MBps) for both UDP and TCP traffic which is equal to 976Mbit per second (Mbps). Table 4.2 show the achievable rate in different cases when all experiments are active and send as fast as they can. Since all experiments running on node01 try to send and receive on one 1Gbps Ethernet link concurrently, they get a different share of this available traffic in different cases.

In table 4.2, we show the maximum traffic rate in MBps between a virtual node on node01 and its corresponding virtual node on node02 to node06. The UDP column shows the maximum rate when all virtual nodes in all experiments try to send and receive UDP traffic, concurrently, without any rate limit mechanism in place. The TCP column shows the TCP rate in this case. As it can be seen, because of the massive packet loss in this case, TCP cannot achieve a stable rate, and its rate changes from 15 to 35 MBps. These measurements prove the need for a rate limiting mechanism when different experiments want to run on a shared virtualized infrastructure.

The columns with (rl) show measurements when we limit the send and receive rate in experiments to (12.5), (18.75), (25), (31.25), and (31.25) MBps respectively, totaling to 118.75 MBps (950 Mbps). As can be seen, using the rate limit functionality we could achieve the bandwidth guarantee requirements (with maximum 1% deviation from the target rate) in a VANI node. Another case that we have studied is the case where all virtual nodes in one experiment start sending traffic to one virtual node concurrently. This would result in congestion on the shared link that is serving the destination virtual node. To solve this problem, we have developed a novel traffic control mechanism that we will present in the next chapter of this thesis.

Chapter 4. Virtualized Application Networking Infrastructure

4.10

83

Experiments & Applications

The testbed could be used to run large scale experiments on networked systems and applications and network architectures from layer three up. Especially it is designed to enable experimentation with applications that need responsiveness and quality of service guarantee by having processing and storage services in all testbed nodes. Example applications that could use these functionalities are video streaming applications and smart power grid networked applications. Due to the ability to change the experiment configuration on-demand and on the fly and together with the everything as a service foundation of the testbeds network architecture such as green architecture could be tested on this testbed. In green network architecture, network topology and configuration can be changed due to the changes in the status of renewable energy generation and consumption. Based on the same functionalities, we are in the process of building a green orchestrator engine that would use many aspects of the testbed including the on-demand configuration, short lived resource leases, and testbed’s status and performance monitoring tools. The outcome of this application would be published soon. Also, due to the availability of the storage and processing resources in the testbed nodes, the testbed could be used to experiment with various content delivery networks such as hybrid peer-to-peer networks. In hybrid p2p networks, peers and in-network resources could be organized and structured in a way that content be delivered to the users with a lower search and delivery time. Also, implementing content based routers, and distributed publish/subscribe system would be possible in our testbed, and these services could be offered to the researchers as a stand alone and reusable service components to facilitate experiment setup and application creation.

Chapter 5 A Distributed Ethernet Traffic Shaping System The architecture of the local area networks is facing new challenges with the emergence of cloud computing [87] and the deployment of massive data centers [26]. This new computing paradigm allows users to access a virtual network of resources in the cloud that can be called upon to deploy applications on demand. At the same time, the networking research community has moved toward creating similar platforms for experimenting with new networking concepts and architectures [7]. As in cloud computing, these networking testbeds offer a virtual network of resources to the researchers so that they could evaluate their networked systems in large scale. The creation of these research testbeds and cloud computing platforms has become possible mainly due to the advancement of virtualization techniques that have made separation of the virtual computing resource and the underlying physical resources much easier, and have allowed operation of multiple virtual machines on one physical resource. Inherent in such shared resource environments is the potential for disruptive interaction among users and hence the need for new techniques to provide network and resource isolation. The Virtualized Application Networking Infrastructure (VANI) [7, 8], presented 84

Chapter 5. A Distributed Ethernet Traffic Shaping System

85

in the previous chapter of this thesis, is an example of a networking research testbed that allocates a virtual network of resources to researchers. An important requirement in VANI is to guarantee network access rates and isolation between different experiments. In this chapter, we present the Distributed Ethernet Traffic Shaping (DETS) system and its corresponding algorithms designed to provide a guaranteed network access rates in VANI. The DETS system is not only applicable to VANI, but also to the computing clusters and data centers that virtualize and share their resources among different virtual networks. DETS deployment in a cluster or a data center does not require any changes in system hardware, and can be deployed on top of normal computing blades and Ethernet switches. VLAN #2

Virtual Nodes

VN21

Virtual Nodes

VN22

VN23

VN24

VN25

VLAN #1 VN11

VN12

VN13

VN14

VN15

Physical Nodes

PN5 PN1

PN2

PN3

PN4

Ethernet Switch

Figure 5.1: A system with five nodes and two virtual nodes on each The primary role of DETS is to control and regulate the traffic sent and received on VLANs. Especially, this is required where more than one virtual machine is working on a physical node, and each has to send and receive a guaranteed rate of traffic on a dedicated VLAN on a shared Ethernet access. Figure 5.1 shows a sample scenario for DETS. In this sample system, we have five physical nodes (SNMP) each having two

Chapter 5. A Distributed Ethernet Traffic Shaping System

86

Received TCP Rate (Mbps) on vlan 1

1200 1000 800 600 400 200 0

200

400

600 Time

800

1000

Figure 5.2: TCP rate back off due to interfering UDP traffic running virtual nodes (SNMP). All these PNs are connected to an Ethernet network and the VNs running on these PNs require a guaranteed access rate to the Ethernet network. For the sake of simplicity, we show an Ethernet network with just one Ethernet switch, but in general, it is possible to have many switches in a network. In this topology, VNs running on a node are working separately and can only communicate with their peer VNs in other physical nodes. If V N 11, V N 12, V N 13, and V N 14 start sending traffic to V N 15 , they can consume all the available bandwidth on the Ethernet link that connects P N 5 to the Ethernet switch. This may cause problem for traffic sent from nodes V N 21, V N 22, V N 23, V N 24 to node V N 25 that shares the Ethernet link with V N 15. Therefore there is a need for a traffic shaping or rate control to limit the rate that P N 5 can receive traffic for V N 15 so that V N 25 can also receive traffic at a guaranteed rate. This problem would become very evident and observable if the interfering traffic (traffic for V N 15) is UDP and the underdog traffic (traffic for VN25) is TCP. The high

Chapter 5. A Distributed Ethernet Traffic Shaping System

87

amount of UDP packets on the link to P N 5 would virtually disable TCP traffic to V N 25 as the experimental results in Figure 5.2 show. In the Figure, V N 25 receives the maximum possible TCP rate, if no traffic is sent to node V N 15. However, as soon as UDP traffic is sent to node V N 15 (around time 300 in Figure 5.1), TCP rate goes to almost zero until UDP traffic stops (at around time 1000). This experiment shows not only the sensitivity of a TCP flow rate to a competing UDP flow but also it shows the importance of having a traffic shaping and rate control system to guarantee an agreed access rate for different virtual nodes on a physical node that share one Ethernet link. Although there have been proposals for TCP-friendly transport protocols [88, 89], in many systems and environments, such as in VANI, it is not desirable to impose a specific flavor of transport protocol on the virtual machines. The problem of network performance degradation in virtualized environments has been also studied in [29] and the authors, through measurements on Amazon Elastic Computing services, concluded that virtualization techniques can cause significant throughput instability. Current Ethernet flow control uses PAUSE signals [90]. When multiple ports flood a port, the Ethernet switch sends PAUSE signals back to the flooding ports so that they stop sending for an amount of time specified in the PAUSE message. It has been generally accepted that the pause mechanism in Ethernet flow control is not suitable for solving new challenges facing these networks [26]. To address Ethernet congestion problems, two new IEEE task forces (802.1Qua [91] and 802.1Qbb [91]) have been created. The main approach in these task forces is to do flow control at the level of class of service by marking frames at Ethernet switches. In contrast to these approaches, our proposed system operates at the edge of the Ethernet network on the computing hosts in a cluster or a data center. We direct interested readers to [26, 30, 92, 93, 94] for a survey on the recent work on Ethernet network congestion control for data centers. The current proposed methods for congestion management entail modifying Ethernet network elements. Moreover, the

Chapter 5. A Distributed Ethernet Traffic Shaping System

88

majority of the proposed systems are Congestion Notification based systems with no explicit rate information [30, 92] that have been shown that have draw backs, such as slow recovery in comparison to explicit rate systems [92]. The salient explicit rate congestion management system, Forward Explicit Congestion Notification (FECN) [92, 94], passes explicit rate from the congestion point to the source point based on the utilization ratio of the congested link. Our system is also an explicit rate system, but differs from FECN in several aspects. The DETS system is more than just an Ethernet congestion management system. In particular DETS allows setting guaranteed limits on the send and receive on each virtual network, and shapes the traffic so that virtual networks do not interfere with each other’s ability to send and receive traffic. Unlike FECN, DETS does not need any change in the current Ethernet equipments, and can be applied in the current computing cluster systems and data centers. Moreover, our system is capable of supporting both fair and weighted fair bandwidth allocation mechanisms. In addition, in allocating rates to the sending nodes, the system considers the available sending capacity of the sending nodes that results in higher throughput. Nevertheless, we emphasis that DETS is designed to address congestion at the egress ports of Ethernet networks. Consequently, it does not directly address congestion in the network. The DETS operation is seamless to the virtual machines running on the host system, and virtual machines only see the decrease and increase in send and receive traffic rate on certain flows. In other words, the applications need not to report their bandwidth requirement since the measurements are done in DETS. However, since our system runs on the host system its rate set and measurements periods are limited to the system’s timer (about 55ms). The organization of this chapter is as follows: Section 2 describes our proposed system, identifies key control and measurement points, and presents the DETS protocol. Section 3 presents the DETS system design and it main internal modules. In this section, we also

Chapter 5. A Distributed Ethernet Traffic Shaping System Measure and Rate Control Point VN11

Ethernet Switch

89

Rate Measurement Reports

PN1

VN21

Rate Control Commands

VN12

PN2

Measure Point and Rate Allocator

VN22

VN15 VN25

VN13

PN3

PN5

VN23

VN14

PN4 VN24

Figure 5.3: DETS measurement and rate control points propose four different algorithms developed for DETS. The DETS system performance measurements are presented in section 4, and in section 5, we describe the modification in Ethernet control plane in order to port the DETS system to Ethernet network elements. Finally in section 6, we present concluding remarks and our future work.

5.1

Distributed Ethernet Traffic Shaping (DETS) system

The DETS system is designed to control the rate of the traffic generated by each virtual machine according to the total traffic rate at the destination virtual node. DETS controls the sending rate of the traffic in the originating VN before it enters the Ethernet network based on a target rate imposed by the receiving virtual node. In the VANI system, a virtual LAN is created for the virtual nodes that are in one

Chapter 5. A Distributed Ethernet Traffic Shaping System

90

group, and ”over the top” rate controller software is run in each of the physical nodes. This software is able to control the rate at which each virtual machine sends traffic to any other virtual machine in that virtual network. The module is also able to measure received traffic to each virtual node, and detect if the received rate limit is violated. If the received rate limit is violated, the receiving node is declared the congested node. The controller then monitors the sent traffic to the congested node and controls its rate at the sending node. This system is depicted in Figure 5.3 which shows the control and measurement points. Each agent in DETS has two separate modules; a send rate controller, and a receive rate allocator. The send rate controller monitors the sending traffic rate to any other virtual machine that is facing congestion, and reports it to the rate allocator in the congested node (node P N 5 in example scenario, called the receiving node in the remainder of this document). The rate allocator at the receiving node (P N 5) allocates a rate to each sending node and sends set-rate commands to the corresponding send rate controller modules in the sending nodes. The send rate controllers apply the received set rate commands (at the set rate control points shown in Figure 5.3) and subsequently the traffic sent to the congested node (P N 5) will be shaped accordingly. The DETS system can be implemented in any cluster with any operating system that is able to control the egress Ethernet traffic rate. In the next section, we focus on a cluster of Linux-based computing nodes, and we describe the system design and protocol for deploying DETS in such a cluster.

5.1.1

DETS Protocol

The DETS protocol has five types of messages:

1. Traffic Report message, sent from a sending to a receiving node and includes measured rate, current rate limit, and available rate.

Chapter 5. A Distributed Ethernet Traffic Shaping System

91

2. Initialize Traffic Control message, sent from a receiving to a sending node to initialize the traffic controller to the receiving node. 3. Set Rate message, sent from a receiving to a sending node and includes the allocated rate that the sending node has been granted. 4. Keep Alive message, sent from a receiving to a sending node when the traffic control on the receiving node is active. 5. Deactivate Traffic Control message, sent from a receiving to a sending node to deactivate traffic control to that receiving node.

5.1.2

DETS for Linux OS

In Linux, traffic shaping can be done on egress and ingress traffic. The main command for performing traffic shaping is ’tc’ command [95]. This command can operate on a virtual interface (serving a VLAN), and can be also used for measuring the send and receive rates. The shaping in our system is done in the Linux hosts, and it is seamless to the virtual machines running on them.

5.2

DETS System Design

Figure 5.4 shows the design of DETS. In the send rate control module, there is one state machine for each receiving node. Also, there are two internal sub modules in the receiving rate allocator module. The first module is responsible for communicating with the sending nodes, and the second module allocates the rates to the sending nodes.

5.2.1

Rate Allocator Module

The core part of the DETS system is the rate allocator module that allocates the sending rate to each sending node. The rate allocator module utilizes a Rate Allocation Algorithm

Chapter 5. A Distributed Ethernet Traffic Shaping System

DETS System Send Rate Control Subsystem Send Rate Measurement & Control

92

Receive Rate Allocator Rate Allocator Receive Rate Measurement Sending Node Communication

Linux Traffic Measurement and Shaping Figure 5.4: DETS System Internal Modules (RAA) to determine the rate at which each sending node can send traffic to the receiving node. In RAA design, we need to consider that the measurements in the send rate control modules are capped by the rate set by RAA. To better explain this limitation and its implication on algorithm design we use an example scenario. Assume that in Figure 5.3, the system in a steady state with four virtual nodes (V N 11 to V N 14) sending traffic to VN15 with rates (80, 80, 20, 20)Mbps respectively. At this point, if VN11 stops sending traffic to VN15 the rate allocator algorithm may reallocate the vacant rate to other nodes. However, since there are no measurements for sending rate above the rate limits, the RAA needs a mechanism to probe V N 12 to V N 14 to see if the sending nodes need to send more traffic or not. Without a probing mechanism, RAA would allocate rates to a node that might not need the extra allocated rate and the available bandwidth would be wasted. The probing mechanism allows us to provide fairness in rate allocation to virtual nodes. Assume that in the above example, all nodes have similar importance, and have equal amount of traffic to send to VN15, so the above allocated rate is not fair since two of the virtual nodes have been allocated rates (80 Mbps to each) that are much more

Chapter 5. A Distributed Ethernet Traffic Shaping System

93

input : Active nodes list and their send capacity output: Calculates granted rate to each Node 1) Calculate the fair rate; totalRate ; f airRate ← activeN odes 2) Assign f airRate to all active nodes considering their send capacity; while There are unallocated rate and nodes with sending capacity do grantRate[i] ← min(f airRate, maxRate[i]); if f airRate > maxRate[i] then fairly distribute extra rate among other nodes; end end Algorithm 1: RAAFairShare than the rates allocated to the other two nodes. If V N 13 and V N 14 had more traffic to send this rate allocation is unfair. In this case, the probing mechanism in RAA starts probing nodes with lower allocated rates to see if they have more traffic to send, and whether they need more allocated rate. The probing mechanism in RAA is done through gradual increase and decrease in rate allocations to different nodes and monitoring the increase and decrease in rate measurements. The probing mechanism may reduce bandwidth utilization, but this might be acceptable in order to overcome the above mentioned problem. Another important factor in RAA design is to consider the available traffic sending capacity in the sending nodes in rate allocation. Assume that in Figure 5.3, V N 14 is sending 20 Mbps to V N 15, and 80 Mbps to V N 12, and its total send limit is 100 Mbps. Therefore, V N 14 cannot send any more traffic to V N 15. Therefore, the rate allocator algorithm in P N 5 should consider the available sending capacity of the sending nodes in its rate allocation. There are a number of possible allocation algorithms that can be used in this system. Next, we propose four of these rate allocation algorithms: Fair Share algorithm (RAAFS); Slow Probe algorithm (RAA-SP); Fast Probe algorithm (RAA-FP), and Forward Explicit algorithm (RAA-FE).

Chapter 5. A Distributed Ethernet Traffic Shaping System

94

input : Active nodes list and their requested rate and send capacity output: Calculates granted rate to each node 1) Inflate requested rate of the nodes that fully use their allocated rate by 10%; 2) Calculate the total requested rate; 3) Calculate the ratio of increase and decrease the requested rate based on the available rate; ratio ← totalReqRate/totalAvailableRate; 4) while There are unallocated rate and nodes with sending capacity do grantRate[i] ← min(reqRate[i] ∗ ratio, maxRate[i]); if reqRate[i] ∗ ratio > maxRate[i] then fairly distribute extra rate among other nodes; end end Algorithm 2: RAASlowProbe

The fair share algorithm (RAA-FS) calculates a fair rate by dividing the receiving rate limit by the number of sending nodes that have traffic to send, and allocates that fair share to each of the active sending nodes. This algorithm is suitable for the cases where the sending nodes need to be treated similarly in the rate allocation process, independent of the amount of required traffic, as shown in the pseudo code presented in Algorithm 1. In this rate allocation mechanism, if the calculated fair rate is more than the sending capacity of a sending node, the extra rate is fairly distributed among other sending nodes with available sending capacity. This algorithm is oblivious to the difference in rate requested by each active node, and does not perform any probing to see if the nodes have more traffic to send or not. Although RAA-FS is fair but it might result to bandwidth underutilization, since some sending nodes might not need all of the allocated rate. The second algorithm, slow probe algorithm (RAA-SP), allocates rates to the sending nodes based on the rate measurement reports received from their send rate control modules. The algorithm identifies the nodes that are fully utilizing their allocated rate, and inflates their rate request by a percentage (for example 10%) to give them an op-

Chapter 5. A Distributed Ethernet Traffic Shaping System

95

input : Active nodes list and their requested rate and send capacity output: Calculates granted rate to each node 1) Execute Slow Probe algorithm; grantRate ← RAASlowProbe(); 2) Sort all nodes that fully utilized their allocated rate according to their granted rate, and calculate the mean of the granted rate to them; 3) while pick a node with highest rate above mean rate(upper) do while pick a node with lowest rate below mean rate(lower) do Multiply the rate of lower node by d and deduce the increase from higher node, considering lower node send capacity; if upper node new rate goes below mean then average lower and upper rate and assign avg rate to both; end end end Algorithm 3: RAAFastProbe portunity to increase their rate realtive to other sending nodes that are not using their allocated rate. RAA-SP then calculates the total requested rates and allocates a portion of the available bandwidth to each node. This portion is calculated based on the inflated request rate and the receive rate limit as presented in this algorithm’s pseudo code (Algorithm 2). RAA-SP gradually probes the sending nodes that are fully utilizing their allocated rate, and gives them a better chance of getting more allocated rate. RAA-SP, however, does not address the fairness problem, since it does not reallocate the rate from the high rate allocated nodes to the low rate nodes. The third algorithm, fast probe RAA (RAA-FP), extends the slow probe algorithm by reallocating the sending rates from the higher rate allocated nodes to the lower rate allocated nodes. In contrast to the two previous algorithms, RAA-FP addresses both fairness and bandwidth utilization concerns. This algorithm sorts the nodes that fully utilize their allocated rate and calculate the mean allocated rate to these nodes (shown in the pseudo code presented in Algorithm 3). RAA-FP then picks the nodes with

Chapter 5. A Distributed Ethernet Traffic Shaping System

96

the highest allocated rate, and the lowest allocated rate. RAA-FP multiplies the rate allocated to the lowest rate allocated node by a parameter (d > 1) and deducts that extra allocated rate from the node with highest allocated rate if the resulting deducted rate does not go below the mean allocated rate. Otherwise, it takes an average between the highest and lowest rate allocated nodes, and allocates this average rate to both of them. This change in the allocated rate is done considering the free sending capacity of the node with lower allocated rate. This operation is repeated on the next two nodes with the next highest and lowest allocated rates until all allocated rates to the fully utilizing nodes get revised. Our performance evaluations show that the fast probe rate allocation algorithm (RAA-FP) is able to achieve probing algorithm goals rather quickly, since it gives more opportunity to nodes that are fully utilizing their allocated rate to send more traffic. Moreover, it achieves better fairness in rate allocation since it reduces the gap between the nodes with high allocated rates and nodes with low allocated rates. The choice of parameter d controls the trade off between the fairness and bandwidth utilization. A small d value results in more bandwidth utilization but lowers fairness in rate allocations. On the other hand, A choice of large d results in lower bandwidth utilization in exchange of higher fairness in rate allocation. The fourth algorithm is inspired by the FERA algorithm introduced in [92] for FECNbased Ethernet congestion management. This algorithm is designed to enable comparison between a DETS-based rate allocation system and a FECN-based system. It has been shown that [94] the FERA algorithm has a better convergence time compared to other proposals for Ethernet congestion control. The essence of FERA is to control the queue length of an outgoing Ethernet switch port by assigning a fair share rate to flows passing through that port. This algorithm uses a linear (or a hyperbolic) control function to adjust the allocated (fair) rate to achieve a target level on queue length (Qeq ). We modified this algorithm to arrive at a target receiving rate at the receiving node.

Chapter 5. A Distributed Ethernet Traffic Shaping System

97

This algorithm (called RAA-FE) calculates a fair rate (ri+1 ) at (i + 1)th interval, based t in which k is a on the ri value at ith interval, and a control function f (r) = 1 − k ∗ r−R Rt

constant, r is the measured receiving rate, and Rt is the target rate. DETS sends back the calculated rates to the sending nodes, and the sending nodes apply the rates to their rate controller modules. Compared to the previous algorithms, this algorithm does not require the rate measurements at the sending nodes, and does not support weighted fair allocation. In the original FERA, intervals are as low as 1 ms, but in DETS intervals are about 55 ms. Therefore, the rate regulations are done every 55 ms that makes rate convergence a challenge for this algorithm. Although the linear control function leads to a faster convergence time compared to the hyperbolic function, but as our experiments show, RAA-FE takes about 40 intervals (> 2s) to converge to the fair rate. The analytical results show this slow convergence as well [92]. This is mainly because this algorithm does not include the sending rate measurements.

5.2.2

Performance Improvements

To improve the performance of the DETS system, we have embedded several performance improvement mechanisms in the system. These improvements are mainly to reduce the number of exchanged messages and to better predict the required sending rate of sending nodes. The first improvement is in the rate measurement reports. The send rate control module can send the measurement reports only if there is a major change in the measured rate. By doing so, the measured rates can be sent with a lower frequency. Also if the measured rate is less than a minimum threshold, the send rate control module can stop reporting it, and the rate allocator would automatically allocate a minimum rate to that node. The send rate control module can also use a prediction algorithm on the rate in which

Chapter 5. A Distributed Ethernet Traffic Shaping System

98

600 400 TCP Rate (Mbps) VLAN 1

200 0

200

400

600

800

1000

600

800

1000

Time 1000

500

UDP Rate (Mbps) VLAN 2

0

200

400

Figure 5.5: DETS performance evaluations for system shown in Figure 5.1

a sending node will generate traffic to a receiving node during the next time period, and send it to the rate allocator module. This predicted rate can be calculated based on the current and past measurements. This prediction improves the rate allocation algorithm performance since it considers the predicted rate requirements of a node instead of past sending rate measurements.

To reduce the number of rate allocation messages generated by the rate allocator module, this module can send out these messages to the send rate control modules whenever there is a major change in the allocated rate.

To make sure that the messages used in DETS protocol can be delivered to the distributed modules with minimum delay, DETS messages can be conveyed on a separate physical or virtual network. They can be even marked with a high priority, so that they get better chance of arriving to their destination in case the network is congested.

Chapter 5. A Distributed Ethernet Traffic Shaping System

5.3

99

Performance Evaluations

In this section, we present experimental results that show DETS can achieve isolation between virtual LANs. We implemented the DETS system in C++ and deployed it on 11 nodes with 1GE Ethernet connections in a computing cluster, and we created two VLANs on the Ethernet switches. As in our VANI processing virtualization service [8], we used Linux vServer technology for virtualization and deployed two virtual nodes on each physical server. One virtual node in a physical node is connected to VLAN 1, and the other one is connected to VLAN 2. This setting is similar to the one depicted in Figure 5.1, except that we used eleven physical nodes instead of five nodes. We set the send and receive limit rate for all virtual nodes in the first VLAN to 400 Mbps, and in the second VLAN to 500 Mbps, and we used the fast probe rate allocation algorithm on both VLANs with parameter d = 2. We started sending TCP traffic from 10 nodes to one node. We expect that DETS control the rate that the receiving node receives traffic, and limit it to 400 Mpbs. We also expect that if the nodes in the second VLAN start sending UDP traffic to the receiving node, the TCP flows destined to that machine don’t get overwhelmed with the interfering UDP traffic. Our results (presented in Figure 5.5) show that DETS is able to achieve both goals. In this Figure, the rate measurements are shown in every time unit (every 55 ms). As can be seen, when all nodes in the second VLAN simultaneously start sending UDP traffic to the receiving node (around time unit 320 in Figure 5.5), momentarily TCP traffic on the first VLAN gets disrupted, and it takes two time units for the control algorithm to receive the measurements and make the decision and apply the limits on the sending nodes. After this short transient period, TCP traffic is able to bounce back quickly, and continue sending information at the limit rate which is 400 Mbps. We also evaluated and compared the performance of the four allocation algorithms. To do so, we set up a VLAN with 10 virtual nodes sending a mix of UDP and TCP traffic to one virtual node, and we monitored the received traffic on the receiving node.

Received rate (Mbps)

b1) RAA−FP 800

700

700

600

600

500

500

400

400

300

300

200

200

100

200

400

600

800

1000

1200

1400

100

200

400

600

a2)

1200

1400

400 Mean StdDev

350

Rate (Mbps)

1000

b2)

400

300

250

250

200

200

150

150

100

100

50

50 200

400

600

800

Time

1000

1200

1400

Mean StdDev

350

300

0

800

0

200

400

600

800

1000

1200

1400

Time

Chapter 5. A Distributed Ethernet Traffic Shaping System

a1) RAA−SP 800

Figure 5.6: Performance evaluation of rate allocation algorithms a) RAA-SlowProbe b) RAA-FastProbe 100

Received rate (Mbps)

b1) RAA−FE 800

700

700

600

600

500

500

400

400

300

300

200

200

100

200

400

600

800

1000

1200

1400

100

200

400

600

a2)

1000

1200

1400

b2)

400

400 Mean

350

Rate (Mbps)

800

350

300

300

250

250

200

200

150

150

100

100

50

50

0

0

200

400

600

800

Time

1000

1200

1400

Mean 200

400

600

800

1000

1200

1400

Time

Chapter 5. A Distributed Ethernet Traffic Shaping System

a1) RAA−FS 800

Figure 5.7: Performance evaluation of rate allocation algorithms a) RAA-FairShare b) RAA-ForwardExplicit 101

Chapter 5. A Distributed Ethernet Traffic Shaping System

102

We also limited the peak rate of three of the sending nodes to a low limit (to 20Mbps). This helps us better compare the performance of the proposed algorithms. We developed an on/off burst traffic generator that is able to generate a burst of UDP or TCP traffic for a random period between 0 and T , and stops sending traffic for another random period between 0 and T . We used various values for T ranging from 0.5s to 10s on different nodes. This traffic generator enables DETS performance evaluation under time varying and bursty UDP and TCP traffic. Figures 5.6(a1, b1) and 5.7(a1, b1) show the received rate measurements on the receiving node for all algorithms for the period of 82 seconds (1500 time units). Figures 5.6(a2, b2) show the measured mean and standard deviation of the allocated rate to the nodes by the slow probe and fast probe (d = 2) algorithms, respectively. Figures 5.7(a2, b2) show the mean value of the allocated rate by RAA-FS, and RAA-FE. The fluctuations in the received rate measurements are due to the on-off nature of the generated traffic. It can be seen that the fast probe and the slow probe algorithms achieve better utilization of the received bandwidth compared to the fair share algorithm, especially since some of the nodes have less sending capacity compared to the other nodes. As it was expected, the slow probe algorithm outperforms the fast probe algorithm in term of its receiving bandwidth utilization. However, the fast probe algorithm is able to achieve a low standard deviation between different flows coming from different virtual nodes compared to the slow probe algorithm. The RAA-FE algorithm performs poorly compared to the other algorithms and has a slow convergence rate, and it has difficulty stabilizing. This is mainly because of the fluctuations in the generated traffic. RAA-FE also does not consider the sending rate measurements, and does not have a probing mechanism. In general, the fast probe algorithm is better than other algorithms if weighted fairness is required, but if a user needs fairness in rate allocation the fair allocation schema can be picked. The slow probe algorithm is for the cases where the user wants to increase

Chapter 5. A Distributed Ethernet Traffic Shaping System

103

SW2 Ethernet Switch

SW1

SW3 DETS

DETS

8

5 DETS Control Messages

DETS

Receiving Node

Sending Node

Figure 5.8: DETS in Ethernet control plane the bandwidth utilization in expense of fairness, and does not want a sudden change of a traffic flow rate and prefers a slow change. In DETS, it is possible to have different rate allocation algorithms running on different virtual networks, as long as algorithms satisfy the network isolation requirement. This allows users to pick an algorithm that suits their needs.

5.4

Modifications to Ethernet Control Plane

Here we discuss inclusion of DETS protocol in the Ethernet control plane so that Ethernet switching equipments can perform DETS operations even without (or with minimum) help from hosts attached to the Ethernet network. We propose that in an Ethernet network, the distributed modules in the DETS system be embedded in Ethernet switches, and DETS messages be added to the Ethernet control messages. To do so, controlling traffic to a receiving node has to be done on ingress ports by the edge Ethernet switches. These messages could be added to the MAC Control type of Ethernet frames (EtherType = 0x8808) as specified in IEEE 802.3 family of

Chapter 5. A Distributed Ethernet Traffic Shaping System

104

specifications [90]. The only message currently defined in this type of frame is the PAUSE message (opcode = 0x0001). The DETS messages can use other free opcodes in this frame type. These messages have to be in VLAN-tagged frames, since DETS is designed to control the rate on VLANs. Figure 5.8 shows an Ethernet network equipped with DETS. The rate allocator module operates on the receiving port of an edge Ethernet switch (SW3,port number 5) and the send rate control module and the traffic shaper operates on the sending port of originating edge Ethernet switch (SW1,port number 8). The set rate messages are sent from the receiving port to the sending port. The sending port applies the allocated rate to the sent traffic, and can forward the rate control messages to the sending host in case it (or its NIC) is able to do the traffic shaping.

Part III QoS & Admission Control in Service-Oriented Systems

105

Chapter 6 Allocating Services to Applications using Markov Decision Processes In the first two parts of this thesis, we analyzed impacts of service-oriented approaches in application creation on future network architectures and more specifically its central role in network-facilitated application creation in an Application-Oriented Network. In this part of the thesis, we focus on improving quality of experience for the applications that are created based on this paradigm. In the Service-Oriented application creation paradigm, services that are designed and developed independently can be composed with other service components to create new applications or a more complex service component. Nowadays we can see the effect of this paradigm on different aspects of networking, such as the development of new applications through composition of service components, both in the form of “mashups” [21] as well as in a more rigorous form by using the Service-Oriented Architecture [96, 19]. For example, Google maps have provided the basis for a huge number of mapping mashups. The importance of the mashup phenomenon is that it marks the emergence of a new mode of application creation where applications are created through a distributed and collaborative process. The term Web 2.0 refers to this emerging network-centric 106

Allocating Services to Applications using MDP

107

platform [22]. In addition, The SOA-based loosely-coupled IT systems have given the enterprises greater agility, when it comes to adjusting the structure of their businesses to meet changing business requirements. Another example is applying this paradigm to the multimedia applications, by composing multimedia services [97]. There are numerous literatures in the area of service composition. In [98], the authors have discussed the service composition problem from the QoS-awareness point of view. They have argued that the problem of composing services with different QoS parameters for creating an application with a set of constraints on different QoS parameters is a Linear Programming problem, and they have used the simplex method to find the best service set for satisfying the application’s constraints. A QoS-aware middleware for composing multimedia services for providing multimedia applications has been proposed in [97]. The authors have shown that the problem of composing services is an NP-hard problem and they have proposed a heuristic algorithm for composing services in both centralized and P2P manner for satisfying the overall QoS constraints of multimedia applications. In their peer-to-peer algorithm, upon receiving a request from the user, the system starts finding candidate services which satisfies the overall QoS constraints and at the end it decides which services have to be chosen to properly serve the users interests and QoS constraints. In [99], the authors have proposed a Markov Decision Process (MDP) model for combining services while having multiple choices for each service to increase the overall reward for work flows while exploring different possibilities. The problem of scheduling workflows while composing web services has been discussed in [100]. The authors have proposed a genetic search approach which searches among the possible order of a vast number of business processes and tries to find the best order so that it satisfies the overall QoS constraint of business processes. In this chapter, we address the problem of service composition problem while having conflicted requests for different services in composite applications. We study this problem in two different

Allocating Services to Applications using MDP

108

cases. The first case assumes applications that require simultaneous execution of service components, and in the second case we investigate the applications that execute service components in sequence. We propose optimal policies for assigning service instances to different applications using Markov Decision Processes in both cases. After formulating the problem in a form of an MDP problem, we obtain the optimal policy, and we compare performance of the system while following this policy with systems that use the Complete Sharing (CS) or Complete Partitioning (CP) [101] mechanisms. The rest of this chapter is organized as follows: in the next section we define the problem of service allocation in the case of concurrent service executions. then we formulate this problem as an MDP problem. In subsection 6.1.2, we analyze the optimal solution, and in section 6.1.3, we analyze the problem in case a service has instances with different QoS parameters. In section 6.1.4, we present the optimal policy for a sample system and we compare its performance with the CS and CP methods. In the second part of this chapter, we extend the MDP-based service allocation to the applications that execute service components in sequence. Similar to the first case, we define and formulate this problem using MDP, and we obtain the optimal policy for a sample system and present the performance evaluations and comparison results.

6.1 6.1.1

Concurrent Service Executions Problem Formulation

Consider an environment with m types of services, and k class of composite applications (Figure 6.1). For simplicity, we assume that all instances of one service have similar QoS parameters. Each class of composite application is composed of a set of services. For example, a class 1 application is composed of services 1, 3 and m, while a class 2 application is composed of services 1, 2, 4 and m and a class 3 application is composed of only one service of type m. Therefore, a request for a class 1 application will be

109

Allocating Services to Applications using MDP

accepted whenever there are free instances of services of type 1, 3 and m. Also, a request for a class 2 application will be accepted whenever there are free instances of services of type 1, 2, 4 and m, and similarly a request for a class 3 application will be accepted whenever there is a free instance of service of type m. As it can be seen, there is a conflict between service requirements between class 1, 2 and 3 applications. Consider a case where high request rate for class 3 application results in allocating all services of type m and hence decreasing the chance of accepting other classes of applications that require a type m service instance. This fact results to leaving instances of other types of services underutilized while the requests for the other applications are being rejected.

N-1

N-1

N-1

1

1

1

0

0

0

Service Type 1

Service Type 2

Service Type m

Figure 6.1: A system with m different service types and N instance of each type To solve this problem, and consequently to maximize the overall utilization, there should be a mechanism to allow or deny acceptance of requests for different classes of application. In this section, we propose an MDP-based partitioning model for achieving an optimal policy for accepting or denying the requests, and we compare the results of enforcing this policy in achieving higher utilization compared to the other policies including Complete Sharing and Complete Partitioning [101]. In CS policy, the requests for each class of application will be accepted, whenever

Allocating Services to Applications using MDP

110

there is one free instance of each corresponding service. In this algorithm, no reservation for any of the applications is carried out. The CS policy, as described before, results to non-optimal allocation of services to the applications. In CP policy, a constant number of services will be allocated to each application class which can not be shared with other classes of applications. While this policy seems fair, it underutilizes the services. We propose a mechanism for accepting or rejecting the requests for each class of application in the time of the request. We assume that for each class of application, the incoming rate and holding time are exponentially distributed, where λi , (1 ≤ i ≤ k) is the arrival rate for class i application, and µi , (1 ≤ i ≤ k) is the service rate for the class i application. Also ni , (1 ≤ i ≤ k) is the number of the class i applications being currently served in the system. First we assume a simple model consisting of only 2 classes of applications (k = 2) and 3 types of services (m = 3). The application class 1 is composed of services 1, 2 and 3. The application class 2 is composed of services 1 and 2 (Figure 6.2). We assume that all services satisfy the QoS requirements of all classes of applications. Also, we have N instances of services in our system from each type of service, and n1 and n2 represent the number of applications which are currently in the system, respectively from class 1 and class 2 applications. Therefore, a state vector of (n1 , n2 ) represent the current state of the system. Let S = {s = (ni , n2 )|0 ≤ n1 , n2 ≤ N, 0 ≤ n1 , n2 ≤ N } be the system state, and st be the system state at time t. Based on the statistical assumptions, {st , t ≥ 0} is a continuous time Markov chain whose transitions are the event of an arrival or departure of an application. We try to formulate our problem as a Markov Decision Process [102]. Our objective is to maximize the utilization of the services and increase the revenue. Therefore, our decision process is to find how we should treat the next request arrival while the system is in state s. The system either can only accept a request for a class 1 application, or

111

Allocating Services to Applications using MDP

Application Class 1

N-1

N-1

N-1

Application Class 2

i

i

i

0

0

0

Service Service Type 1 Type 2

Service Type 3

Figure 6.2: A system with three types of service and two classes of applications only a request for a class 2 application or accept requests for both classes of applications. Therefore, whenever the system enters the state (n1 , n2 ), the system knows whether it will serve the next request for either of the classes of application or it will reject it. We assume that rejected requests do not interfere with the system. As a result, the possible next actions based on the state s are: A(s) = {0}:means only accept a request for class 1. A(s) = {1}:means only accept a request for class 2. A(s) = {2}:means accept requests for both classes. Our objective is to find an optimal policy for each state to maximize the reward which is the weighted sum of the applications currently being served in the system.

6.1.2

Markov Decision Process Formulation

This initial continuous-time Markov Decision Process can be converted into an equivalent discrete-time MDP by applying the uniformization technique [102]. In order to do so, we define the sampling time c := N (µ1 + µ2 ) + λ1 + λ2 , and during each sample time only

112

Allocating Services to Applications using MDP

one transition can occur which corresponds to either arrival of a request, departure of a request, or a fictitious event. To maximize the utilization in our problem we try to maximize the reward function which is the weighted sum of different classes of applications in the system. Therefore we use the MDP infinite-horizon discounted reward model [102, 103], and we define our one-step reward function as follows:

R(s) = αn1 + βn2

(6.1)

The optimal discounted function and the optimal policy can be computed using the value iteration algorithm [103],

Vn+1 (s) = max[R(s) + ǫ a

X

′ a Pss ′ Vn (s )]

(6.2)

s′

a in which ǫ is the discounting factor and Pss ′ is the transition probability from state s to

state s′ while applying policy a and its value is as follows: • When there is an arrival for a request for a class i application and we accept the request the probability is: λi /c • When there is an arrival for a request for a class i application and we reject the request the probability is: λi /c • The probability of departure of a class i application from the system is: ni µi /c P P • The probability of the fictitious event is: 1 − ( ni µi − λi )/c Now we can recursively compute the sequence of n-stage Vn (s) values using the method of successive approximations [103] and limit of this sequence when n goes to infinity. It is shown that V (s) := limn→∞ Vn (s) exists and it is the solution of the infinite-horizon discounted problem [103].

113

Allocating Services to Applications using MDP

Application Class 1

N-1

N-1

N-2 S3.2

Application Class 1

i

i

i S3.1

Application Class 2

0

0

Service Service Type 1 Type 2

0 Service Type 3

Figure 6.3: A system with three type of services, two classes of applications and two types of instances for service type 3

6.1.3

Optimal Policy with Different Services

In the previous section, we formulated the problem considering the case when all services of one type are similar. But in some situations, there are different service components with different QoS parameters which cover similar functionalities and can be replaced with each other. For example consider the case where we have two classes of applications and three types of services. Application class 1 is composed of services 1, 2 and 3, while application class 2 is composed of services 2 and 3. Also, we have two type of service 3 in the system. For example, among N instances of service 3, L instance are similar from the QoS point of view (S3.1 services), and other remaining (L - M ) instances have similar QoS properties but different than the first L instances (S3.2) (Figure 6.3). We assume that after solving a Linear Programming (LP) problem for satisfying the constraints of each class of application, we have found that a class 2 application can use both type of service 3 instances, but a class 1 application only can use the instances of type S3.1. Now the problem is to propose a policy for accepting or rejecting requests for the

Allocating Services to Applications using MDP

114

applications class 1 and class 2 to maximize the utilization of service instances. We show that this problem is similar to the previous problem, and the policy maker can use the previously proposed model for achieving optimal policy and making optimal decision. Since the services of type S3.2 can only be used by the application class 1, then the decision is to either use the S3.1 instances for application class 1, or for a class 2 application. Therefore, applications compete for a limited number of instances instead of competing for all available instances. Then, upon an arrival of request for application class 2, the system assigns the S3.2 if there is a free instance of it. If there is no available S3.2 instance, system based on the MDP model, decides whether it should give an instance of type S3.1 to this request or keep it for later use by a class 1 application. Formulation of this problem is similar to the formulation of the previous problem, except that among N services in the system we have L number of S3.1 instances of services and the arrival rate of the class 2 application is λ2 pf instead of λ2 , in which pf is the probability that a request for a class 1 application arrives to the system, and there is no free S3.2 service instance in the system. Note that pf can be simply obtained using Erlang B formula as follows: (λ/µ)m /m! P ,m = N − L pf = m n n=0 (λ/µ) /n!

(6.3)

Based on this problem formulation, in order to find the optimal policy we use the following first-step reward function:

R(sn ) = αn1 + βn2

(6.4)

where n2 represents the number of class 2 applications that have come into the system and have not found any free S3.2 instances. Again, we try to maximize this reward using MDP for the discounted reward model with infinite-horizon. Similar to the previous problem, we can use the method of successive approximations for finite-period Markov

115

Allocating Services to Applications using MDP

Decision Processes for finding the optimal policy that maximizes the weighted-sum reward function.

6.1.4

The Optimal Policy and Performance Comparison

Based on the presented MDP problem, we computed the optimal policy for the first problem that described and formulated earlier. We found the optimal policies for request arrival rates of λ1 = λ2 = 5, holding time of µ1 = µ2 = 1, and N = 10 and we set ǫ to 0.99 in the Equation 6.2. We obtained the optimal decision in each state in the case of (α = 1, β = 0.1) and (α = 1, β = 0.5).

9 8 7 6 5 n2 4 3 2 1 0

0 0 0 0 0 0 2 2 2 2 0

0 0 0 0 0 0 2 2 2 1

0 0 0 0 0 0 2 2 2

0 0 0 0 0 2 2 3

0 0 0 0 0 2 4 n1

0 0 0 0 0 5

0 0 0 0 6

0 0 0 7

0 0 8

1 9

Figure 6.4: Optimal policy when the system is in state (n1 , n2 ), and α = 1, β = 0.1 Respectively, Figure 6.4 and Figure 6.5 show the optimal policy for each case when the system is in the state (n1 , n2 ). In both these figures, ’0’ shows that the system only will accept a request for a class 1 application, ’1’ shows that the system only will accept a request for a class 2 application, and ’2’ shows that the system will accept request for both classes of applications. As it can be seen when the weight of the class 2 application is low, and plenty of them are currently being served in the system, our decision making mechanisms suggests that we have to reject the new requests for a class 2 application (Figure 6.4). However, if the weight of the class 2 applications is high we have to accept

116

Allocating Services to Applications using MDP

9 8 7 6 n2 5 4 3 2 1 0

0 0 0 0 2 2 2 2 2 2 0

0 0 0 2 2 2 2 2 2 1

0 0 0 2 2 2 2 2 2

0 0 0 2 2 2 2 3 n1

0 0 0 2 2 2 4

0 0 0 2 2 5

0 0 0 2 6

0 0 2 7

0 1 8

1 9

Figure 6.5: Optimal policy when the system is in state (n1 , n2 ), and α = 1, β = 0.5 more requests for that class of application (Figure 6.5). We simulated the system performance and compared the performance of the system while using MDP-based partitioning mechanism with Complete Sharing (CS) and Complete Partitioning (CP) mechanisms [101]. As we described before, in CS method, the system accepts any request for any class of application if it has enough room to serve that request. In other words, the system does not reserve any of its resources for any class of application. In CP method, the system keeps a constant number of service instances for each application class and does not allocate that portion to any other class of application. In our implementation of the CP method, we divided the resources based on the weights of each class. Figure 6.6 shows the comparison results between these three methods. Figure 6.6(a) shows the case where α = 1, and β = 0.1 and Figure 6.6(b) shows the case where α = 1, and β = 0.5. The x-axis in both these figures represents the request rate in terms of λ1 , and λ2 . In both these figures λ1 = λ2 and they change from 1 to 30. The y-axis in both figures represents the reward value, which is the weighted sum of the number of applications currently in the system, while applying each of the partitioning methods. As it can be seen, MDP-based partitioning mechanism outperforms the other two mechanisms,

117

Allocating Services to Applications using MDP

Reward

14000 12000 10000 8000 6000 4000 2000 0 0

5

10

15

20

25

30

25

30

lambda1, lambda2 CS

CP

MDP-based

(a) α = 1 and β = 0.1

14000 12000

Reward

10000 8000 6000 4000 2000 0 0

5

10

15

20

lambda1, lambda2 CS

CP

MDP-based

(b) α = 1 and β = 0.5

Figure 6.6: Performance Comparison between Complete Sharing, Complete Partitioning and MDP-based partitioning mechanisms

Allocating Services to Applications using MDP

118

especially when the request rate is high. It can be seen that, when the request rate is low, there is no significant difference between the CS, CP, and MDP-based partitioning. However, when the load is high and weight of the second class of application is low using the MDP-based partitioning results to 60% more reward compared to the CS method, and 10% more reward compared to the CP method. In the next section, we revisit this problem by relaxing some of the assumptions. We again study the optimal policy and we present MDP-based solutions for this problem.

6.2

Sequential Service Executions

In the previous section [11], we studied the problem of optimal allocation of services to different applications, and we proposed a Markov Decision Processes approach for solving it. One of the main assumptions we made in that section was that all instances of services are committed by the system to the application throughout its lifetime. In this section, we try to put more relaxation on this assumption. We propose an optimal policy for reserving services’ instances for different applications and business processes using Markov Decision Processes. We obtain the optimal policy for a sample case and we compare its performance with the performance of a system that uses a Full Commitment Policy or a No Commitment Policy in assigning services’ instances to applications.

6.2.1

Problem formulation

Consider an environment with m types of services and k class of applications or business processes (Figure 6.7). Each class of application is composed of a set of services. For example, a class 1 application is composed of services 1, 3 and m, while a class 2 is composed of services 1, 2, 4 and m and a class 3 application is composed of only one service of type m. Each application uses a service for a limited time during its lifetime and the service

119

Allocating Services to Applications using MDP

is free for the rest of the time. Whenever the system receives a request for a class of application, it can accept the request or deny it. If the system accepts the request, one policy is to put all corresponding instances of services on hold, until the application execution finishes. We call this policy a Full Commitment Policy (FCP). Under this policy, the system can accept a request for a class 1 application whenever there are free instances of service types 1, 3 and m. Also, a request for a class 2 application will be accepted whenever there are free instances of service types 1, 2, 4 and m, and similarly a request for a class 3 application will be accepted whenever there is a free instance of service type m. As it can be seen, there is a conflict between service requirements between application classes 1, 2 and 3.

N-1

N-1

N-1

1

1

1

0

0

0

Service Type 1

Service Type 2

Service Type m

Figure 6.7: A system with m different service types and N instance of each type Consider a case where a high request rate for class 3 applications results in consumption of all services’ instances of type m and hence decreasing the chance of accepting other classes of applications that require a type m service. This fact results to leaving instances of other types of services underutilized while the requests for the other applications are being rejected. In the previous section, we analyzed this problem, and provided optimal solutions for a sample scenario. Since some types of applications or business

Allocating Services to Applications using MDP

120

processes do not need the services’ instances throughout their lifetime, under the FCP policy, the services’ instances could be underutilized, even with the optimal assignments of services’ instances to applications or business processes. Another alternative policy is to accept any request for any type of application whenever it has one free instance of the first service. We call this policy a No Commitment Policy (NCP). Although this policy seems simple, it however has a significant drawback. For instance, applications and business processes which are only composed of one service and have high request rates can easily consume all instances of that service and force other applications to fail when they need that particular service. Another policy is the Partial Commitment Policy (PCP). Under this policy, the system assigns the instances of services to applications considering the fact that applications do not need all instances throughout their lifetime and also the fact that the system should guarantee some level of service availability to all accepted applications. In this section, we analyze this policy and propose an optimal solution based on it. We also formulate this problem and propose an optimal solution using Markov Decision Processes. Using the achieved policy, we compare the results of applying it in achieving higher service utilization compared to the other policies. Our proposed mechanism is to accept or reject the requests for each class of application in the time of the request. In other words, by rejecting a request, we reserve available services’ instances for future use by other classes of application. For each class of application, the incoming rate is exponentially distributed, where −1 λ−1 i (1 ≤ i ≤ k) is the mean interarrival time of the class i application, and µj (1 ≤ i ≤ m)

is the mean execution time of the service type j. Each application class is composed of a set of services, and nij (1 ≤ i ≤ k)(1 ≤ j ≤ m) is the number of the class i applications being currently served by services instances of type j in the system. Also, we have N instances of services in our system from each type of service. As a result, the state vector of the system is: s = (n11 , n21 , ..., nk1 , n12 , n22 , ...., nk2 , .., n1m , .., nkm )

121

Allocating Services to Applications using MDP

If an application class does not need an specific type of service at all, its corresponding nij will be 0 throughout the system lifetime, and therefore it could be omitted form the state vector. The set of all possible states, S, is given by:

S=

(

s : nij ≥ 0, i, j > 0, i ≤ k, j ≤ m,

X j

nij ≤ N

)

(6.5)

Also each application class starts from a service and step by step executes a sequence of services according to what has been already planned for it using any type of execution language such as Business Processes Execution Language. Therefore, the state space will be limited to the states valid based on the planned execution path. Throughout this article, we only consider execution plans with no conditional branches. For example Figure 6.8 demonstrates a sample scenario consisted of only 2 classes of applications (k = 2) and 2 types of services (m = 2). The application class 1 is composed of services 1, 2. The application class 2 is only composed of service 2. Therefore, the state vector (n11 , n12 , n22 ) represents the current state of the system. Let S = {(n11 , n12 , n22 )|0 ≤ n11 ≤ N, 0 ≤ n12 + n22 ≤ N } be the system state, and st be the system state at time t. Based on the statistical assumptions, {st , t ≥ 0} is a continuous time Markov chain whose transitions are the event of an arrival or departure of an application from the system, or transition from one service to the next service according to execution plan. Ultimately, for each state s, the optimal solution should tell us whether we should accept next request for a class of application or not. Thus the action vector is:

a = (a11 , a21 , ..., ak1 , a21 , a22 , ..., ak2 , ..., a1m , ..., akm )

(6.6)

in which aij ∈ {0, 1} is the act of accepting or rejecting a request for a class i application while entering the system at service j. Consequently the action space of the system is: A = {a : aij ∈ {0, 1} , i, j > 0, i ≤ k, j ≤ m}. This action space, however, can be

122

Allocating Services to Applications using MDP

Application Class 1 N-1

N-1

Application Class 2 i

i

0

0

Service Type 1

Service Type 2

Figure 6.8: A system with three types of service and two classes of applications

simplified based on the execution plan of each application class. Later in this section, we will present a sample action space. We try to formulate our problem as a Markov Decision Process [102, 103]. Our objective is to maximize the utilization of the services and increase the revenue. Our decision process is to find how we should treat the next request arrival, while the system is in state s. For example, whenever the sample system enters the state (n11 , n12 , n22 ), the system decides whether it will serve the next request for either of the classes of application or it will reject it. We assume that rejected requests do not interfere with the system. Therefore in state s, the possible next actions are to accept the request only for a class 1 application, or only for a class 2 application, or to accept requests for both classes of application. Therefore possible next actions based on the state s are: A(s) = {{0, 1} , {1, 0} , {1, 1}}. For simplicity we use following action representation in the rest of this chapter: A(s) = 0,which means only accept a request for class 1.

123

Allocating Services to Applications using MDP A(s) = 1,which means only accept a request for class 2. A(s) = 2,which means accept requests for both classes.

Our objective is to find an optimal policy for each state to maximize the reward which is the weighted sum of the applications currently being served in the system.

6.2.2

Markov Decision Process formulation

This initial continuous-time Markov Decision Process can be converted into an equivalent discrete-time MDP by applying the uniformization technique [103]. In order to do so, P P we define the sampling time c := N µj + λi . During each sample time only one

transition can be occurred, which corresponds to either a change in state, or a fictitious event. To maximize the utilization in our problem we try to maximize the reward function which is the weighted sum of different classes of applications in the system. Therefore we use the MDP infinite-horizon discounted reward model [102, 103], and we define our one-step reward function as follows:

R(s, s′ ) = α∆+ (s, s′ )n11 + β∆+ (s, s′ )n12 + γ∆+ (s, s′ )n22

(6.7)

∆+ (s, s′ )nij = max {nij (s′ ) − nij (s), 0}

in which ∆+ (s, s′ )nij denotes the amount of increase in nij due to the transition from state s to state s′ . The optimal discounted function and the optimal policy can be computed using dynamic programming techniques and the value iteration algorithm [102, 103],

Vn+1 (s) = max a

(

X s′

a ′ Pss ′ (R(s, s )



)

+ ǫVn (s ))

(6.8)

124

Allocating Services to Applications using MDP

a in which ǫ is the discounting factor and Pss ′ is the transition probability from state s to

state s′ while applying policy a, and its value is as follows: • When there is an arrival for a request for a class i application and we accept the request the probability is: λi /c • When there is an arrival for a request for a class i application and we reject the request the probability is: λi /c • The rate of the execution of service j is:

P

i

nij µj /c

PP P • The probability of the fictitious event is: 1 − ( λi )/c i nij µj −

Now we can recursively compute the sequence of n-stage Vn (s) values using the method

of successive approximations [103] and limit of this sequence when n goes to infinity. It is shown that V (s) := limn→∞ Vn (s) exists and it is the solution of the infinite-horizon discounted problem.

6.2.3

Optimal policy and performance comparison

Based on the presented MDP problem, we computed the optimal policy for the sample system which is composed of two types of services and two classes of business processes. −1 We found the optimal policies for mean request arrival of λ−1 1 = λ2 = 60, mean execution −1 time of µ−1 1 = 30, µ2 = 40, (α = −0.1, β = 0.5), and N = 6 and we set ǫ to 0.99 in

Equation 6.8. We obtained the optimal decision in each state for γ = 0.1 and γ = 0.3. To reflect the importance of the continuation of a business process or an application, and not terminating it while it is in the middle way of its execution path, we chose a negative value for α and a positive number for β. The total sum of (α + β) shows the importance of a class 1 application or business process compared to a class 2 one, which is represented by γ. We chose a negative value for α because: if the system let an application to enter the system, and in the time of the completion of the first step, it

125

Allocating Services to Applications using MDP

forced the application to leave the system due to the unavailability of a free instance of a service, the system would pay a cost of α. Respectively, Figure 6.9 and Figure 6.10 show the optimal policy for each case when the system is in the state (n11 , n12 , n22 ). We showed the results when n22 = 1(6.9a, 6.10a) and when n22 = 4(6.9b, 6.10b).

n11

6 5 4 3 2 1 0

1 2 2 2 2 2 2 0

0 2 2 2 2 2 2 1

0 0 0 0 2 0 2 0 2 2 2 2 2 2 2 3 n12

a)

0 0 0 0 0 0 2 4

0 0 0 0 0 0 0 5

n11

6 5 4 3 2 1 0

0 0 0 0 2 2 2 0

0 0 0 0 0 0 2 1

0 0 0 0 0 0 0 2

n12

b)

Figure 6.9: Optimal policy when the system is in state (n11 , n12 , n22 ), and γ = 0.1: a) n22 = 1, b) n22 = 4 In all figures, ’0’ shows that the system only accepts a request for a class 1 application, ’1’ shows that the system only accepts a request for a class 2 application and ’2’ shows that the system accepts requests for both classes of applications. As it can be seen, when the weight of a class 2 application is low, and plenty of them are currently being served in the system, our decision making mechanism suggests that we have to reject the new requests for a class 2 application (Figure 6.9), and therefore, reserve the remaining resources for a class 1 application. However, if the weight of a class 2 applications or business process is high we have to accept more requests for that class of application (Figure 6.9). Also results show us that if the numbers of class 2 applications in the

126

Allocating Services to Applications using MDP

n11

6 5 4 3 2 1 0 *

1 2 2 2 2 2 2 0

1 2 2 2 2 2 2 1

1 2 2 2 2 2 2 2

n12

a)

1 2 2 2 2 2 2 3

0 0 0 2 2 2 2 4

0 0 0 0 0 0 0 5

6 5 4

1 2 2

0 0 0

0 0 0

n11 3 2 2 0 2 1 0

2 2 2 0

2 2 2 1

0 0 0 2

n12

b)

Figure 6.10: Optimal policy when the system is in state (n11 , n12 , n22 ), and γ = 0.3: a) n22 = 1, b) n22 = 4 system are high, the system should reject new requests for that type of application or business process, and reserve the free instances of services for other class of application. We simulated the described system and compared the achieved performance using the optimal MDP-based partitioning mechanism with other two policies; Full Commitment Policy, and No Commitment Policy. For FCP, we use Complete Partitioning (CP) mechanism [101]. In CP method, the system keeps a constant number of service instances for each application class and does not allocate that portion to any other class of application. In our implementation of the CP method, we divided the resources based on the weight of each class. Figure 6.11 shows the comparison results between these three methods, for the case where (α = −0.1, β = 0.5, γ = 0.1). The x-axis in this figure represents the requests −1 mean inter-arrival time as λ−1 1 , while λ1 = λ2 = 1, and λ1 changes from 8 to 60. The

y-axis in both figures represents the system revenue or reward, which is the weighted sum of the number of applications currently being served in the system, while applying each of the partitioning policies. As it can be seen, MDP-based partitioning policy outperforms the other two mecha-

127

Allocating Services to Applications using MDP

2800

Reward

2250 NCP FCP MDP

1700

1150

600 8

18

28

38

48

58

1/(lambda1), lambda2=lambda1

Figure 6.11: Performance Comparison between No Commitment Policy, Full Commitment Policy and MDP-based partitioning mechanisms (α = −0.1, β = 0.5, γ = 0.1) nisms, especially when the request rates are high. It can be seen that, when the request rate is low (inter-arrival time is high), there is no significant difference between the FCP, NCP and MDP-based partitioning. However, when the load is high the MDP-based partitioning results to 60% more reward compared to the No Commitment Policy, and 30% more reward compared to the Full Commitment Policy. Another experiment which we carried out was on the service execution time distribution. So far, we used the exponential distribution for the service execution time. For some types of services, however, this assumption might not be accurate. Using exponential distribution, we can assume memoryless properties for the problem, and consequently, we can use Markov Decision Processes approach for obtaining optimal policy. Also exponential distribution can be helpful in studying the problem behavior in the mean sense. Therefore, in this part we decided to see how much the achieved policy would be effective, if we had another type of distribution for the service execution. To do so, we assumed a

128

Allocating Services to Applications using MDP

0.16

0.14

Prob. density

0.12

0.1

0.08

0.06

0.04

0.02

0 24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

A Sample Beta pdf

Figure 6.12: A sample beta distribution

2800

Reward

2250 NCP FCP MDP

1700

1150

600 8

18

28

38

48

58

1/(lambda1), lambda2=lambda1

Figure 6.13: Performance Comparison between No Commitment Policy, Full Commitment Policy and MDP-based partitioning with a beta distribution for service execution time and (α = −0.1, β = 0.5, γ = 0.1)

Allocating Services to Applications using MDP

129

Beta distribution for the execution time of each service instance. Beta distribution has some interesting properties which makes it a good candidate for modeling many types of services and processes [104]. Figure 6.12 shows a beta probability density function, which we used for this experiment. As it can be seen, using this distribution, we can assume an optimistic estimate, a pessimistic estimate, and also the most likely estimate on service time execution. Based on these assumptions, we replaced the execution time of both services with a beta distribution which has the same mean as the exponential distribution of µ−1 1 , and µ−1 2 . Figure 6.13 shows the result of this experiment. As it can be seen, the optimal policy found for the exponential distribution is able to achieve satisfactory results for the Beta distribution case as well. Beta parameters used for this experiment are α = 2.33, β = 4.66, m1 = 30, m2 = 40. In this section, we presented optimal policies for making admission decisions in serviceoriented systems. These policies, however, are suitable for small scale systems since their computation will become infeasible for large scale systems. Also, these policies are for service-oriented systems that are exposed to stationary Poisson request arrival processes. In the next chapters, we extend this work and propose heuristics that are able to operate in large scale systems distributedly and able to handle both stationary and non-stationary demands.

Chapter 7 A Distributed Probabilistic Commitment-Control Algorithm In the previous chapter, we introduced the problem of optimal allocation of services to applications, and we proposed a Markov Decision Processes [103] approach to solve the problem. In that chapter, we first studied this problem assuming that applications require all corresponding service instances throughout their lifetime [11]. We also assumed exponential distribution for the applications request inter-arrival times and for the applications execution times. We next addressed the case in which applications do not need all corresponding service instances throughout their lifetime [12] and we again assumed exponential distributions for applications inter-arrival times and services execution times. In this chapter, we propose an algorithm for the problem of service commitment with the following desirable properties; the proposed heuristic algorithm does not limit the distribution for services execution times and applications request inter-arrival times to any specific type and it can be implemented in a distributed and scalable environment. Moreover, the heuristic algorithm can guarantee an important QoS parameter in a service-oriented environment. A key challenge in application creation through service composition is to guarantee the 130

Distributed Probabilistic Commitment Control Algorithm

131

quality of service of the created applications [97, 105, 106, 107, 108]. Guaranteeing quality of service in a service-oriented environment has increasingly received more attention as new types of large-scale applications are built based on this new paradigm [109, 110, 111]. We consider the QoS guarantee problem using an important QoS metric for composite applications in terms of probability of successful completion, or equivalently its complement, probability of failure. We propose a Distributed Algorithm for Service Commitment (DASC) that guarantees this QoS parameter. This algorithm can be a part of a service-oriented system that orchestrates the execution of composite applications such as business workflows, telecommunication applications or mixed IT/Telecommunication applications. The orchestrator system’s main task is to invoke different services according to the application’s execution plan. Generally, each of the invoked services has an execution time that is stochastic [112]. In order to guarantee successful completion of an application a service component provider has to provide a characterization of this stochastic behavior to the system, and the system has to consider this behavior in admitting requests for composite application. If a system overlooks these stochastic characteristics it may excessively invoke a service, causing it to: serve an excessive number of application instances resulting in performance degradation to applications in the system; refuse to serve some application instances; or queue instances resulting in unwanted delays in application execution. To avoid these undesirable events, the system includes an admission control mechanism to control its commitments to the application instances and to guarantee the probability of successful completion. The design of an admission controller depends on the properties of the demand for the applications. For example, if the demand is stationary (stationary arrival rate and stationary service times) the admission control can be designed using off-line and steadystate analyses. In this case, techniques and approximations such as decomposition-based

Distributed Probabilistic Commitment Control Algorithm

132

methods [113, 114, 115] can be used to find the acceptable region of requests arrival rates, and admission mechanisms are then used to enforce arrivals using rate regulators. If the demand is non-stationary, other techniques are required for admission control. For example, for each arriving application request, an online admission controller can calculate the likelihood that all service components of the application can be completed given the current state of the system. The DASC algorithm is designed to operate in non-stationary demand environments (namely with non-stationary request arrivals) and it uses a predictive model that delivers a target level of the probability of successful completion for admitted application instances. DASC does not assume any specific distribution type for applications request inter-arrival time and it is capable of functioning in a distributed and scalable environment. DASC assumes that the service execution times distributions in a service-oriented system are known and remain unchanged with time. We present two versions of the DASC algorithm, one with no queuing permitted for the application instances and one where the system queues application instances instead of dropping them which is discussed later in this chapter. We present simulations that show that without a commitment control mechanism, the successful application completion probability can be very low. We also show that our algorithms are able to meet QoS goals. Moreover, we compare DASC performance with alternative steady-state based admission controllers, and we show that DASC performs better especially where demand is bursty and non-stationary. The chapter is organized as follows. In the next three sections, we state the problem of service commitment in a service-oriented environment, we discuss the mathematical basis of the problem, and we present the corresponding modeling and formulation. In section 4, we present the DASC algorithm followed by performance evaluation results in section 5. . We extend the proposed algorithm to systems that can provide a limited number of

133

Distributed Probabilistic Commitment Control Algorithm

Application 1

Applications leaving the system

S1 S1 S1 S2 S2 S2

Application 2

Service Oriented System

Figure 7.1: A sample service-oriented environment queuing spots for services. We present the modifications to the formulations and the performance evaluations results. Finally in this chapter, we review the related work and discuss our contribution to this problem. Another issue that we do not discuss in this chapter is the system revenue maximization. In this chapter, we assume that all application classes have the same revenue for the system, and we are only interested in guaranteeing the quality of service. In the last chapter of this part, we propose techniques that accompany the DASC algorithm to maximize the system revenue by admitting more valuable application classes, and rejecting less valuable ones.

7.1

QoS Control in a Service-Oriented System

We are interested in guaranteeing the probability of successful completion of an application in a service-oriented environment. To clarify the problem we begin with a simple example. Figure 7.1 shows a service-oriented environment in which two different application classes use two different service types. In this example, application class one begins by executing service one followed by service two. Application class two merely uses service type two. The problem of interest in this paper arises when there are contending requests for shared services. For example, a high number of requests for application class two might

Distributed Probabilistic Commitment Control Algorithm

134

result in consumption of all available service two instances. Consequently, application one instances completing service one would fail to continue their execution because service two instances are not available. Thus application one instances have to either leave the system without completion or face unwanted delays. To avoid this problem, a service oriented system can use an admission control mechanism to control its service commitments by only admitting application instances when they are highly likely to complete successfully. To design an admission controller for a service-oriented system, we need to consider the environment in which it is operating. In a stationary environment with stationary demand, we can design an admission controller using off-line steady-state based analyses of the system. In such an environment, we can find the acceptable arrival rate region in which the successful completion probability can be guaranteed in the steady-state. To enforce operation within this acceptable region we can use token bucket regulators at the portal to a service-oriented system. Although, to the best of our knowledge, there are no exact closed form solutions for the associated Finite Capacity Queuing Network (FCQN), in general there are a variety of approximation and decomposition based [113, 114, 115], simulation-based [116] or bottleneck analysis methods [117] that can be used to find the region of acceptable request arrival rates in steady-state. When demand is non-stationary we need a different approach that can make admission decisions based on the current state of the system including the current application instances that are being served in the system. The design of this type of system requires a transient-state analysis of the system and involves on-line decision making based on these transient-state analyses. In this chapter, we present DASC as an algorithm that is able to handle non-stationary arrivals in a distributed and scalable environment. In DASC, each service component has an agent that tracks service component usage as well as future commitments. When a new request for an application arrives to a service-oriented system (for example to a service-oriented orchestrator engine), DASC first queries all the corresponding service components agents in parallel. Each agent

135

Distributed Probabilistic Commitment Control Algorithm

accepts or rejects the admission of that request based on the current state and anticipation of future usage, and the admission controller then makes the final admission decision according to the agent responses. We will show that DASC is able to achieve higher applications throughput compared to the steady-state based approaches when the serviceoriented system is exposed to the bursty request arrivals, while it can still guarantee the application successful completion (or failure) probability. DASC requires knowledge of the application execution plans. We assume that in a service-oriented environment the structure of the application in terms of its logical service components and their inter-connections are known. This is not an unreasonable assumption, especially since the service components involved in an application and application execution flow are known in most SO-based applications. DASC also requires the probabilistic properties of service execution times. This allows DASC to anticipate an application’s future service usage and commit the necessary service instances to each admitted application instance.

p S2 S1

S2

S3 S1

S4 1-p S3

Sequential

Conditional

1

l

S2

S1

S4

S1

S2

S3

1 S3

Parallel

loop

Figure 7.2: Composition Operations To model a service-oriented system for DASC algorithm, we first need to present some definitions. Assume that in a service oriented system there are L different application

Distributed Probabilistic Commitment Control Algorithm

136

classes; Ai (i = 1, .., L); there exist M different service types represented by Sj (j = 1, ..., M ), and each service type has Nj instances. Also S represents a set of all service types: S = {sj : 1 ≤ j ≤ M }, and Ui ⊆ S represents the set of services that are required for creation of application i based on the composition function Ci (Ui ). Composition function Ci , uses basic operators for service composition: Definition 1) In a service-oriented environment, the services can be composed using five types of operations: (four operations are shown in Figure 7.2): N N Sk shows that the service Sk will be executed after a) Sequential operation ; Sj the completion of execution of service Sj .

b) Conditional operation ; Sj Sk shows that the system executes either service Sj or Sk . We assume that the probability for choosing service Sj is pj , and for choosing service Sk is pk , while pj + pk = 1. L L Sk means that services Sj and Sk will be executed in c) Parallel operation ; Sj

parallel, and the output will not be available until both services finish their execution. N N d) Loop operation l ; l Sj means that the system must execute l sequential itera-

tions of service type j before continuing the execution of the application. J e) End operation for the end of execution. N Sequential operator is also used as the fork and join operator together with the parallel and conditional operators.

We use these operators to analyze the application instance execution times in terms of each service execution time according to the applications execution plan. Definition 2) an Application execution path (or execution path) is a path which a given instance of an application follows, starting from one service type and ending with another service type. By definition, there are no conditional operators in one execution path of an application. Definition 3) an Application execution plan (or execution plan) is a plan which outlines all execution paths of an application, including all of its conditional operations and

137

Distributed Probabilistic Commitment Control Algorithm

describes the sequence of the service executions of an application from start to finish. Our goal in this chapter is to present an algorithm that can guarantee the probability of application completion. In other words, we would like to have:

Pf i ≤ π ∀i ∈ {i = 1, .., L}

(7.1)

where Pf i is the probability of failure of one instance of application class i, and π is an agreed-upon threshold. Furthermore, this algorithm should be scalable and capable of operating in a distributed environment, and also not involve excessive computation overhead.

7.2

Probabilistic Modeling of Service Commitment

In this section we will consider random variables corresponding to the execution time of an application. For simplicity of notation, in this section we will assume that the service components that appear in the execution plan for an application are numbered S1, S2, ...., and that each such service appears only once in the plan so that the corresponding execution times can be unambiguously denoted by X1, X2, ... The execution time of a service j instance is a random variable Xj with a probability density function (pdf) and cumulative distribution function (cdf) given by fj (t) and Fj (t), respectively. The pdf of the application execution time for one path can be computed using a combination of pdfs of corresponding services in that path. We begin with the sequential operator. The execution time of Sj

N

Sk is a ran-

dom variable (Y⊗ ) which is the sum of two random variables representing the services execution time: Y⊗ = Xj + Xk , with pdf of: fY ⊗ (t) = fj (t) ∗ fk (t), assuming the independence between execution times, and also assuming that there is no waiting time between execution of two consecutive services. Similar to the sequential operation, the execution time of the loop operation of one

Distributed Probabilistic Commitment Control Algorithm

138

service is the l-fold convolution of the execution time pdf of that service. In other words, N P the execution time of l Sj is Y⊗l = lc=0 Xjc in which Xjc s are i.i.d random variables

with the pdf fj (t). Thus the pdf of the loop operation is fY N l (t) = fjl (t) which denotes the l-fold convolution of fj (t).

The execution time of the parallel operator (Sj

L

Sk ) is a random variable (Y⊕ )

which is equal to max(Xj , Xk ) and has the pdf of fY ⊕ (t) = FK (t)fj (t) + Fj (t)fk (t). The execution time of the conditional operator (Sj Sk ) is the random variable:

YO =

   Xj , (prob = pj )

, pj + pk = 1

  Xk , (prob = pk )

with pdf of: fY O (t) = pj fj (t) + pk fk (t).

For an instance of application i, suppose we start service j at time zero and consider the time until we complete a subsequent service k. Let hijk (t) denote the pdf for this elapsed time and let Hijk (t) be the corresponding cdf. Now suppose we are interested in the probability that having started service j at time zero that the application execution is in service k at time t. Let m be the service that precedes service k in the execution plan, then the application execution is in service k at time t if: 1. the application execution has completed service m by time t; and 2. the application execution has not yet completed service k by time t. This implies that the probability that having started service j at time zero that the application execution is in service k at time t is given by:

Gijk (t) = Hijm (t) − Hijk (t)

(7.2)

See Appendix C for a derivation of this result. Also, the probability that application i just started executing service j still be at the same service at time t is simply equal to: Gijj (t) = 1 − Hijj (t) = 1 − Fj (t). Similarly, we can compute Gijk (t|t0 ) the probability that an application class i instance has been in service j for t0 second, and will be at service k at time t by replacing

Distributed Probabilistic Commitment Control Algorithm

139

the pdf of fj (t) by its corresponding conditional pdf: fj (t|t0 ) = fj (t − t0 )/(1 − Fj (t0 )). Using Gijk (t), we can now characterize the random variable for the number of busy service instances of any service at any given time t in future. Define an indicator function for the event that the application execution will be in service k at time t having started at service j at time zero:

Iijk (t) =

   1, (prob = Gijk (t))

(7.3)

  0, (prob = Gijk (t))

For applications with multiple execution paths, the probabilities in the above indicator function need to be multiplied by pijk which is the probability that the application i, now in the service type j, will visit the service type k in future, according to its execution plan. Similarly, for the application instances that have already been in service j for t0 second, the probabilities in the indicator function (7.3) are replaced with their conditional version Gijk (t|t0 ). The number of busy service instances of type k at time t is found by adding the indicator functions of all instances of applications in the system of any class i (index il ), which are being served by a service in system (index j) at time 0, and can therefore be at service type k at time t:

Sk (t) =

X

Iil jk (t)

(7.4)

l

We can now specify the probability of over-commitment in service type k at a future time t (Pock (t)), that is, that the number of admitted applications needing service of type k at time t exceeds the number Nk of service instances that have been provisioned. In other words, Pock (t) is the probability that service type k is over-committed at a future time t due to admission of too many applications:

Distributed Probabilistic Commitment Control Algorithm

n o Pock (t) = P Sk (t) > Nk

140

(7.5)

In the DASC algorithm, the system computes this probability at the time of receiving a request for an application to ensure that the system is highly likely to have the necessary free instances to serve the application instance in each of the succeeding service types along the application execution paths for the time needed. Furthermore DASC needs to compute the above probability for any future time t in order to meet an agreed service level (Toc ). Pock is a a major parameter in our algorithm, and we discuss its computation in later sections.. In contrast to other admission control systems, the incoming request rate is not a factor in computing this probability. We only need for service execution distribution times to remain unchanged with time. For this reason our proposed algorithm can operate in systems with bursty or non-stationary request arrivals, and can handle transient surges in applications request rate without compromising the service-level agreements. We note that in this model not only different service components can have different service execution time distributions, but also we can have different execution time distributions for a service type for different application classes. However, in the rest of this chapter for the sake of simplicity, we assume that each service component only has one execution time distribution for all application classes. Consider the application failure probability which is the probability that an application instance cannot complete its execution plan due to the unavailability of a free instance of service type k at the time that the application needs the service k. We call this probability Pfijk , and it can be computed as follows:

Pfijk =

Z∞

Pock (t)hijm (t)dt

(7.6)

0

in which hijm (t) is pdf of the time to complete the execution of all services preceding

Distributed Probabilistic Commitment Control Algorithm

141

service type k (up to service m), or equivalently, the pdf for the start of the execution of the service type k. The DASC system keeps Pock (t) always below an over-commitment threshold Toc , so an upper bound for the application failure probability is the over-commitment threshold:

Pfijk ≤ Toc

(7.7)

In other words, Toc is the upper bound for an application class i failure probability at service k, if it starts its execution from service j. Consequently, to find the upper bound of total application failure probability, we need to consider failure probabilities at all services based on the application execution plan, as follows:

Pfij =

l X

(Pfijk

k=j+1

k−1 Y

(1 − Pfijm ))

(7.8)

m=j+1

where we assume that the last possible service is service l. Each term in the above sum is the probability that the application execution fails at service k. By taking partial derivatives of the above equation, it can be shown that Pfij is a monotonically increasing function of Pfijk . Therefore, an upper bound for Pfij can be obtained by applying the upper bound for Pfijk that is Toc : Pfij ≤ (1 − (1 − Toc )l−j )

(7.9)

in which (l −j) represents the maximum number of services that an application i instance has to traverse to finalize its execution. In the next section, we focus on the over-commitment probability, and to compute this probability we use the Central Limit Theorem (CLT).

Distributed Probabilistic Commitment Control Algorithm

7.3

142

Computing Over-Commitment Probability

The random variable in (7.3) is a Bernoulli random variable at time t, and Sk (t) in (7.4) is the sum of multiple non-identically distributed Bernoulli random variables. Therefore, we can compute the mean and variance of the indicator function as:

E[Iijk (t)] = Gijk (t) V AR[Iijk (t)] = Gijk (t) − Gijk (t)2

And consequently mean and variance of the sum random variable will be:

ηk (t) = E[Sk (t)] =

X

E[Iil jk (t)]

l

=

X

Gil jk (t),

(7.10)

l

σk (t)2 = V AR[Sk (t)] X = V AR[Iil jk (t)] l

+

XX l

COV (Iil jk (t), Iml′ nk (t))

(7.11)

l′

in which l and l′ denote the instances of applications currently in the system. Now imagine that that there is an unlimited number of servers available to support each service type. If so, then the application instances will all flow along their execution paths without having to contend with each other to obtain servers and so they will not interact at all. Consequently their corresponding indicator functions are independent random variables. Because the over-commitment probability will be small, we can suppose the number of servers of various types are ample. Therefore we assume that the Bernoulli random variables in equation (7.4) are independent, and the above covariance

Distributed Probabilistic Commitment Control Algorithm

143

terms will be zero and the variance of sum random variable will be:

σk (t)2 =

X

V AR[Iil jk (t)] X X 2 = Gil jk (t) − Gil jk (t) X 2 = ηk (t) − Gil jk (t)

(7.12)

We know from the Central Limit Theorem (CLT) [118](p.278) that the sum of n independent random variables approaches a Gaussian random variable with a mean and variance equal to the sum of the means and variances of all of the random variables respectively. Therefore, the over-commitment probability can be approximated using CLT as: n

Pock (t) = P Sk (t) > Nk

o

  Nk − ηk (t) =1−Φ σk (t)

(7.13)

in which the Φ function is the cdf of a Gaussian random variable with mean η = 0 and variance σ 2 = 1. The approximation for the over-commitment probability using the Central Limit Theorem becomes more accurate as the number of application instances in the system and the number of service instances are large, which is the case for many real systems. In summary, whenever an application request enters the service-oriented environment, the application admission control system computes the probability of the overcommitment at any time during the application’s lifetime for every service along its execution path, and if that probability is less than the permitted threshold (Toc ), it allows the application request to enter the system. Another technique for computing the over-commitment probability is to use the theory of large deviations and Chernoff’s bound. Chernoff’s bound enables us to find better approximations of the probability if the target threshold Toc is very small, while the CLTbased technique gives us good approximations of the probability for the target thresholds in the range of 10−3 and higher. In the Appendix, we discuss this alternative method for

144

Distributed Probabilistic Commitment Control Algorithm

Application request

Admission Controller Queries

i

N1 Instances

N1-1

i

N2 Instances

Agent 3

Service Type 3

N1-1

Agent 2

Service Type 2

Service Type 1

Agent 1

N1-1

i

N3 Instances

Figure 7.3: A service-oriented system with three agents, each controlling one service type

computing this probability using the theory of large deviation and Chernoff’s bound. In the next section, we present the Distributed Algorithm for Service Commitment in more detail, and we discuss how this system can be implemented in a distributed environment.

7.4

Distributed Algorithm for Service Commitment

Figure 7.3 shows the decentralized implementation of the service commitment function. Each service type is controlled by one agent. The task of each agent is to monitor the instances of one service type. Whenever the agent starts serving one application instance, it informs the agents responsible for succeeding service types that it has just started the execution of an application instance. The recipient agents have to store the relevant information regarding each particular application instance and use it to compute their over commitment probabilities every time a request for an application arrives. In other words, the agent for service type k computes the parameters for the random variable Sk (t) for all t in future. In this distributed algorithm, when the admission controller receives a request for an

145

Distributed Probabilistic Commitment Control Algorithm

IDLE

IDLE

WAIT_FOR_CONFIRMS

Commitment Response (Corresponding Agents -> *)

Application Request for admission Check Over-Commitment (Admission Controller->*) Check local resources

Commitment possible? Evaluate probability of over-commitment Release Resources (*->Corresponding Agents)

Check Over-Commitment (*->All corresponding agents)

No

All Agents reposne received?

Probablity is under threshold? Request rejected Yes

Commitment Not Possible (*->Admission Controllert)

WAIT_FOR_CONFIRMS

Temporarily Commit Resources

Commit Resources (*->Corresponding Agents)

Commitment Possible (*->Admission Controllert)

IDLE

The admission controller receives a request an application instance and queries all corresponding agents

WAIT_FOR_COMMITMENT_CONFIRM

Release Resources (Querying Agent->*)

Request Accepted

WAIT_FOR_COMMITMENT_CONFIRM

IDLE

RUNNING_AGENT

ACCEPTED

Execution Completed (Service -> *)

Application Status (Running Agent -> *)

Commit Resources (Querying Agent->*)

No Is this service still in path?

Commit Future Resources

WAIT_FOR_CONFIRMS

IDLE

Admission controller receives OverCommitment Check Responses from all the corresponding agents

All corresponding agents queried for the over-commitment probability check

Update local resources Release Temporary Commitment

Save the confirmation response

Agents receive updates on the current status and location of the application instance in the system from the running agent

yes Application Status (*->Corresponding Agents)

IDLE

Is application now running this service?

ACCEPTED

yes

yes

Do you have available instances?

IDLE No

Agents receive commitment command or release commitment command from the admission controller

The agent responsible for the current service (running agent) updates other agents upon completion of execution

Update local resources

Update local Resources accordingly

No

Application Dropped (*->Corresponding Agents)

Update Local resources

Application Dropped (*->System Management)

IDLE

ACCEPTED

IDLE

RUNNING_AGENT

Figure 7.4: Distributed Algorithm for Service Commitment in SDL (Specification and Description Language)

Distributed Probabilistic Commitment Control Algorithm

146

application, it will ask the corresponding agents whether they will have enough resources for serving that application during the period in which it is anticipated for that application to be served by their associated service types. Since all agents keep the records of the applications which are likely to use their service type, they can reply to the agent’s query with a ’yes’ or ’no’ reply. If the replies are all yes, the admission controller admits the application, and it tells the corresponding agents to commit necessary resources for the just admitted application instance. It is noteworthy that the agents do not need to compute the relevant distribution functions each time they receive a query from their preceding agents. Those distributions can be provided to each agent by another computing module in the system, and the agents can store them in their memory and use them as the need arises. Also, to avoid overcommitment, the queried agents, upon arrival of a query from the preceding agents and accepting to serve the application, can temporarily commit their resources for a limited time, until they receive another message from the admission controller confirming the acceptance or rejection of the application request. The SDL (Specification and Description Language) [119] depicted in Figure 7.4 presents the DASC algorithm. It can be seen that, the admission controller queries the corresponding agents upon receiving a request for admitting an application instance. The presented SDL shows the messages that should be exchanged among admission controller and the agents responsible for services used in creating an application as well as the agents internal states and their interactions in DASC.

7.4.1

DASC Complexity Analysis

DASC is a distributed algorithm in which each agent is responsible for controlling one service type. Therefore, for complexity analysis, we focus on one agent and we investigate its processing and memory requirements. If we represent the maximum lifetime of the longest living application in the system

Distributed Probabilistic Commitment Control Algorithm

147

by T , then the memory needed for storing the future estimation of instance usage for a service type would be O(T ). Further, an agent has to store some information for each application instance in the system that might use this service in future. If we represent the maximum number of application instances in the system by Na then the memory for storing application specific data will be in order of O(Na ). So, in total, each agent needs a memory in order of O(T + Na ) to store the required data for the algorithm. In addition, each agent for an incoming request has to compute the over-commitment probability for the maximum duration of the longest living application in the system. Therefore, the processing complexity for each agent would be in order of O(T ). Since the algorithm is distributed, we need to analyze the communication overhead in the system as well. In DASC, the admission controller has to query the corresponding agents in order to admit a request. Therefore the communication overhead will be in order of O(K) in which K shows the maximum number of services in the system. Also as the application proceeds its execution in the system, each agent is required to notify the succeeding agents of the latest change in the instance’s location. Therefore, the total communication overhead for an admitted application instance would be in order of O(K(K − 1)). For example, for a system with 12 service components (presented in the performance evaluation section) when a request enters the system, the admission controller communicates with 12 other agents (in worst case) to make an admission decision. These 12 messages are sent and processed in parallel, and each combined communication and computation takes less than 10ms, and hence the total decision making time is less than 10 ms. We believe that for many systems and applications this decision making time is quite acceptable. In addition, in this system, agents need a maximum memory of 500KB each. The number of exchanged messages could be reduced if the bottleneck services in a system are identified using an off-line analyses approach, and only agents responsible for those services are queried for making decision. In some systems, this reduction would be

148

Distributed Probabilistic Commitment Control Algorithm −3

probability density function (pdf)

2.5

x 10

2

1.5

1

0.5

0

0

500

1000

1500

2000

time unit

Figure 7.5: Beta pdf for service execution time with parameters α = 2.333 and β = 4.666

significant if only a small portion of the services are bottleneck services.

7.5

DASC Performance Evaluation

In this section, we present the performance evaluation results for our proposed algorithm for two different systems. The performance metric of interest is the applications failure ratio. Application failure ratio is the ratio of the number of failed applications to the number of applications admitted to the system. We would like this ratio to be less than the threshold set for the failure probability. We also evaluate the application failure ratios in each of the service types. Moreover, we compare the DASC algorithm against steady-state based admission control systems in terms of application failure ratio as well as applications throughput. We begin by simulating the simple system described in the first section and depicted in Figure 7.1 which is composed of two service types and two application classes. We assume that the service provisioning has been performed and 100 instances for each

Distributed Probabilistic Commitment Control Algorithm

0

Application 1 failure ratio

10 NoCommit DASC−0.5 DASC−0.1 DASC−0.01

0.5 0.4 0.3 0.2

Application 1 log failure ratio

0.7 0.6

149

−1

10

−2

10

−3

10

0.1 0

−4

0.02 0.04 0.06 0.08 Applications request rate

10

0.02 0.04 0.06 0.08 Applications request rate

Figure 7.6: Application failure ratio for a system with two application classes and two service types

of service types one and two have been provisioned. We also assumed identical beta distributions for service execution times for both services (Figure 7.5). We chose Beta distribution since it can represent many pdf shapes and hence is useful in modeling many types of services [104]. The Beta pdf parameters which we used in our experiment are α = 2.333 and β = 4.666. For generating application requests inter-arrival times, we used a geometric distribution with parameter p ranging from 0.01 to 0.1 in 0.01 steps. Figure 7.6 shows the application one failure ratio at service two for four different cases. The first case is the case that there is no commitment control in place, and for the other three cases we applied the DASC algorithm with thresholds of 0.5, 0.1, and 0.01. It is evident that without the DASC algorithm the performance is very poor, i.e. more than 50% application failure for high request rates. However, by applying the DASC algorithm, the system can achieve its target QoS, even when the request rate is high. We compared the DASC performance with an alternative steady-state based admission control mechanism. We designed this system using the bottleneck analysis of the system [117]. In this method, we identify the bottleneck service (S2 in our system), and

150

Distributed Probabilistic Commitment Control Algorithm a)Geometric Arrivals

b)On/Off Bursty Arrivals 10000

9000

Applications Served Successfully

Applications Served Successfully

10000

8000 7000 6000 5000 4000 3000 2000 0

0.02 0.04 0.06 Request arrival rate

0.08

9000

DASC−0.01 BTLNK−0.01 DASC−0.001 BTLNK−0.001

8000

7000

6000

5000

2000 4000 6000 8000 10000 Burst Period (T)

Figure 7.7: Comparing DASC throughput with bottleneck-based admission control algorithm

we approximate the system performance by the bottleneck service performance. At the bottleneck service, we used the Erlang-B formula to find the acceptable region of arrival rates to the system, in the steady-state, to contain the probability of overflow at S2 at the target level of 10−2 , and 10−3 . Then we use token regulators on the request arrival processes to enforce the acceptable arrival rates. We compared the throughput of a system controlled using Erlang-B, and a system controlled by DASC algorithm, for a geometric request arrival (Figure 7.7a), and an onoff bursty request arrivals (Figure 7.7b). In the bursty request arrival, we generated a burst of request arrivals using geometric distribution with parameter 0.01 for a period of T , followed by another burst of arrivals with parameter 0.1 for a period of T . Figure 7.7b shows the applications throughput of the on-off bursty arrival process based on the value of T . It can be seen that DASC outperforms the steady-state based admission control system in both stationary and bursty arrival cases in terms of applications throughput while it can meet the target QoS. This improvement is more significant in the bursty case since DASC makes the admission decision based on the current state of the system, and its anticipation of future usage.

151

Distributed Probabilistic Commitment Control Algorithm

S6 0.5

S4

Application Class 1 S1

S8

S9

S7

S2

S3 0.3

0.5

S10

S11

S5

0.7

S12

3 0.1 S7

Application Class 2 S3

S1

S4

S2

S6 0.5

S5

S7 S8 0.9 0.5 S2

Application Class 3 0.4

S9

S5

S4

S1

S7

S8

S6 S6 0.5

S3

S4

S7

S5

0.6 S8 0.5

Figure 7.8: A service oriented environment consisted of twelve service types and three applications

152

Distributed Probabilistic Commitment Control Algorithm Applications Failure Ratios

−3

Applications failure ratios

1.5

x 10

APP1 APP2 APP3 Total 1

0.5

0

0.02

0.04 0.06 0.08 Applications request rate

0.1

Figure 7.9: Applications failure ratios in the system

Our simulations also show that the choice of target failure probability affects system throughput. The lower we set this target, the system becomes more conservative in terms of admitting requests for applications leading to the applications throughput reduction. Also, as we increase the number of service instances the approximations become more accurate. This is mainly due to the fact that the CLT becomes more accurate. Next we simulated the more complex system depicted in Figure 7.8 which consists of twelve service types and three application classes. The applications in this system have sequential, conditional and parallel operations and application class two has one loop operation. Again, we assume that the provisioning has been performed and 200 instances of each service type have been provisioned. For the service execution times, we assumed identical beta distribution for all twelve service types. . We set the threshold for the total application failure to 10−2 and used bound (7.9) to set the threshold for the over-commitment probability for each service type (Toc ) to 1.5 ∗ 10−3 . The parameters that we evaluated in this simulation are the total application failure ratio, and the applications failure ratios in each service type separately. Moreover, we compared DASC performance on this system with three other admission control mecha-

153

Distributed Probabilistic Commitment Control Algorithm

−4

Total application failure ratios

4

−4

x 10

x 10 S1 S2 S3

3

S4 S5 S6

2

2 1 1

0

0.05

0.1

0

Applications request rate

0.05

0.1

Applications request rate

Figure 7.10: Failure ratios in services 1 to 6 vs. applications request rates

a)Total Application Failure Ratios

0

4

10

12 DASC NOCOMM SIM BTLNK

Total Application Failure Ratios

b)Applications Served Successfully

11 Total Applications Served Successfully

−1

10

x 10

−2

10

−3

10

−4

10

−5

10

10 9 8 7 6 5 DASC NOCOMM SIM BTLNK

4 3

−6

10

0.02

0.04 0.06 0.08 Applications Request Rate

0.1

2

0.02

0.04 0.06 0.08 Applications Request Rate

0.1

Figure 7.11: Comparison between four admission control mechanisms with stationary request arrivals

154

Distributed Probabilistic Commitment Control Algorithm a)Applications Served Successfully (ratio)

10 DASC NoCommit SIM BTLNK

−1

10

Total application failure ratios

Ratio of applications served successfully

0.65

b)Application Failure (ratio)

0

0.7

0.6

0.55

0.5

−2

10

−3

10

−4

10

−5

10

0.45 2.5

−6

2

1.5 1 Burst Time (T)

10

0.5 4

x 10

2.5

2

1.5 1 Burst Time (T)

0.5 4

x 10

Figure 7.12: Comparison between four admission control mechanisms with on-off bursty request arrivals with burst time (T)

nisms. The simulation period consisted of 750000 time units. For generating application requests, we used geometric distributions with parameter p ranging from 0.01 to 0.1. In this sample service-oriented environment, our analysis shows that the bottleneck service is S3. Therefore, we computed the required parameters for different values of p ranging from 0.01 to 0.1 in 0.01 steps, which covers a low request rate up to a request rate that loads the system with twice its provisioned capacity at the bottleneck service (S6). Figure 7.9 shows that even under very high request rates the total application failure using DASC remains under the guaranteed level of 10−2 . We also measured the individual application class failure ratios at each service component. Figure 7.10 shows these measured failure ratios at service S1 to S6. These measured ratios are all below the target threshold (1.5 ∗ 10−3 ) even under very high request rates. We also compared the DASC performance against three other admission controllers with both stationary and non-stationary request arrivals. Two of the admission controllers are token bucket regulators that enforce an acceptable region of arrival rates on the arrival process. In one of these, the acceptable region is obtained using the bottleneck

Distributed Probabilistic Commitment Control Algorithm

155

analysis of the system, and applying the Erlang-B formula as described before. The other admission controller uses simulation-based techniques to find the best arrival rates in the steady-state to maximize the throughput while keeping the failure probability less than the target threshold of 10−2 . The third controller does not apply any admission control on the arrival process, and admits requests for applications if there exists a free instance of the first service component of the composed application in its execution plan. Figure 7.11 shows the measured applications throughput and failure ratios for the stationary arrivals based on the request rate, and Figure 7.12 shows these parameters for the on-off bursty request arrivals based on the burst period (T). The DASC outperforms other mechanisms in both cases in terms of total throughput, and is able to meet the QoS target. With stationary arrivals and high request rates, this improvement is approximately 20% compared to the next best method (simulation-based). Note that the bottleneck approach is overly conservative and provides lower throughputs and very low application failure ratios. The improvement becomes much more visible with non-stationary request arrivals. This is mainly due to the fact that DASC is able to take advantage of the ”openings” through transient-state analysis of the system. The DASC throughput is higher than other methods when the burst period is large, and it can acheive comparable throughput to the nocommit algorithm when burst period is small while can still meet the target QoS. Interestingly, due to the transient-analyses prperty of the DASC algorithm, in some low burst periods DASC can find more openings, and hence, achieve a higher throughput compared to other low burst periods while still keeping the faliure probabliy below the threshold.

Queue-enabled Service Commitment

7.6

156

Queue-enabled Distributed Algorithm for Service Commitment

In the previous sections, we presented the Distributed Algorithm for Service Commitment (DASC) as an application admission control mechanism in a service-oriented environments able to guarantee the probability of successful completion for admitted application instances. So far, we assumed that the system is not allowed to queue application instances, and if one application instance finds no free instance of a service at the time it needs that service, the application instance leaves the system. In this section, we modify our algorithm so that a service offers a small number of queuing spaces to mitigate application failures. We allow queuing, but keep it’s usage under an agreed level. The number of required queuing spaces in a DASC controlled queue-enabled system is very small compared to the number of service instances, since DASC algorithm keeps the probability of over-commitment very low. For instance, if we assume that the threshold for probability of over-commitment is Toc and the total number of instances of a service type is N , then we roughly need at least Toc N queuing spaces to mitigate the application failures. Considering the fact that Toc is usually very low, the number of queuing spaces are significantly smaller than the total number of service instances. This section is organized as follows: In the next subsection, we present the modifications to the formulations to accommodate the queuing capability in the system. Then, we discuss the extensions the the distributed algorithm to be able to make the admission decision in a distributed environment. This subsection is followed by the performance evaluation section. In the appendix, we present a set of theorems and corollaries that are used in obtaining the required parameters in the Queue-enabled DASC algorithm (Q-DASC) and are referred to in the formulation and algorithm section.

Queue-enabled Service Commitment

7.6.1

157

Problem Formulation and Description

To start applying this extension to DASC, we need to present a brief analytical description of some parts of the DASC algorithm that we need to modify. One of the main parameters in DASC is Gijk (t) that shows the probability of an application instance i, just starting execution of service type j, will be at service k at time t, as formulated in (7.2). We need to consider the effect of adding a queue on this parameter. Assume that an application instance i arrives at service j and finds itself at the qth spot in the queue (1st spot being the head of queue), and assume that there will be no further queuing for that application instance along its way to service k, then we have: hqijk (t) = gjq (t) ∗ hijk (t) hqijm (t) = gjq (t) ∗ hijm (t)

(7.14)

in which gjq (t) is the pdf of the Time to Enter Service (TES) for the queued application instance. By replacing hijk (t) with its queue-enabled representation hqijk (t), we have: q q Gqijk (t) = Hijm (t) − Hijk (t)

(7.15)

q in which Hijk (t) is the cdf of hqijk (t).

Similarly, the probability that the application i which just joined the qth spot in service j’s queue is still in the queue or is executing service j at time t is: Gqijj (t) = q 1 − Hijj (t)

Finding a closed form for this distribution in general case is quite difficult and impractical. In the appendix, we present a series of results that are used in finding lower bound for this TES distribution for the queued instances. In queue-enabled systems, we

Queue-enabled Service Commitment

158

use this bound to compute the required over-commitment probabilities. In the Q-DASC algorithm, the agent responsible for the queue computes the TES mean (ηjq ) and variance (σjq 2 ) for the queued application instance using the results in the last section (Theorem 3 and Corollary 3). Then it reports these parameters to the succeeding agents. The succeeding agents, on the other hand, compute the convolutions in (7.14) using the received parameters and assuming that the TES distribution is a Normal distribution with parameters (ηjq , σjq 2 ), and apply them to (7.15) and update their future resource usage: hqijk (t) = N(t, ηjq , σjq 2 ) ∗ hijk (t)

(7.16)

If σjq 2 is much less than the variance of hijk (t), the above Normal distribution, in comparison to hijk (t) distribution, can be treated as a delta function centered at ηjq ; (δ(t − ηjq )). This can be easily shown using the frequency domain analysis of the above distributions. In this case, the equations in (7.14) and (7.15) will be changed to: hqijk (t) = gjq (t) ∗ hijk (t) ≈ hijk (t − ηjq ), ∀t > ηjq q q Gqijk (t) = Hijm (t) − Hijk (t) q q ≈ Hijm (t − ηjq ) − Hijk (t − ηjq ), ∀t > ηjq

(7.17)

As it can be seen in this case, the future estimations would be the shifted versions of the estimations used in the queue-less DASC algorithm.

7.6.2

Q-DASC Performance Evaluation

In order to evaluate Q-DACS performance and examine its effect on the quality of service, we simulated the complex system introduced in Figure 7.8. We first assume that all service types have ample number of queuing spaces, and we

159

Queue-enabled Service Commitment −3

Applications Queuing Ratio

3

x 10

Q−DASC Applications Queuing Ratios APP1 APP2 APP3

2.5 2 1.5 1 0.5 0

0.02

0.04 0.06 0.08 Applications Request Rate

0.1

Figure 7.13: Applications queuing probability with ample number of queuing spaces using Q-DASC algorithm measured the applications queuing ratio instead of applications failure ratio. The queuing ratio is measured by dividing number of queued instances of applications by the total number of admitted application instances. In particular, we wanted to check whether the proposed Q-DASC algorithm would operate below a target queuing ratio. In another simulation we assume that services only have a few queuing spots, and we determine whether these few spots would translate to lower applications failure ratios. The reason for this experiment is to show that the system rarely needs to queue the application instances since the commitment control mechanism restrains the over-commitment probability. In this section, we assume service component characteristics identical to the characteristics described in Section 7.5, and we also use the geometric distribution based request arrival generator as well as the thresholds described in Section 7.5. Our first measurement is the application queuing ratio. The target level for this ratio is 10−2 . Figure 7.13 depicts the queuing ratio for all three application classes based on the applications request rates assuming that there are ample number of queuing spaces. Clearly the Q-DASC algorithm keeps the queuing ratio under the threshold even when

160

Queue-enabled Service Commitment

−2

Q−DASC Applications Failure Ratios

Applications Failure Ratio

10

APP1 APP2 APP3 −3

10

−4

10

−5

10

0

1

2

3 4 Queue Size

5

6

7

Figure 7.14: Applications failure probability based on queue size in Q-DASC algorithm the offered load to the system is very high. Figure 7.14 shows the effect of number of queuing spots on applications failure probability. In this simulation, we assumed that the requests for applications follow a geometric distribution with parameter p equal to 0.1 that offers a load to the system which is almost twice the system’s capacity. It is evident that by adding a few queuing spaces for the over-committed application instances, we can significantly reduce the applications failure ratio, in comparison to a queue-less system, even when the offered load is very high.

7.7

Related work

One of the main issues in service-oriented systems that has been extensively studied is the problem of QoS-aware service composition. This problem deals with the cases where each service component has a specific set of QoS parameters and an overall QoS constraint has to be met for a composite application [98, 97, 105, 106, 107, 108]. Among the papers discussing this problem we can mention [98], in which the authors have formulated the problem as a linear programming problem, and in [105], where the authors proposed heuristics for optimal service composition considering general distributions for services.

Queue-enabled Service Commitment

161

We consider our work as an extension to these works, since we guarantee successful completion of an application according to the service-level agreements. The probabilistic nature of service execution and its influence on contracts between service providers and application providers have been also studied in [112], in which the authors have argued that instead of contracts that are based on hard bounds, probability distributions can be used in soft contracts between web service providers and their clients. In the first chapter of this part of the thesis, the problem of service allocation in service-oriented environments has been introduced, and the optimal solution of the problem using Markov-Decision Processes [103] is presented. The computation of optimal admission and allocation policies using MDP has some limitation for real large-scale systems, especially due to the problem of state space explosion and assumptions on execution distributions. In this chapter, however, we extended this work and we proposed algorithms for guaranteeing quality of service in service-oriented systems. In addition to the area of application creation through service composition, the work in this chapter touches other research fields. For example, in the operations research field, we can point to relevant research in admission control to a network of loss queues. For example in [120], the authors proposed optimal solutions for admission control to two queues in tandem, assuming only two user classes and exponential distributions. In [121], the authors extended the work to multiple queues in tandem and presented a heuristic algorithm as well. However, guaranteeing QoS is not a concern in their work. In queuing theory, there has been a vast amount of research on analyzing queuing network performance metrics [116]. While there are many types of queuing networks, very few have exact analytical solutions for performance parameters [116, 122, 115, 114], and many approximation techniques have been proposed to find approximate performance metrics (especially throughput) [113, 123, 124, 125, 126, 127]. We modeled our problem as an open Finite Capacity Queuing Network (FCQN) with limited or no waiting spaces and loss[113]. These networks do not have closed form solutions [114] and generally

Queue-enabled Service Commitment

162

approximations are used to analyze their performance metrics in steady state.

For

example in [113], the authors have presented a technique based on queuing network analyzer [123] that approximates the throughput and expected waiting time. The authors have found that the approximations are more accurate under light and moderate load, and they become less accurate when the system is in heavy load. This and other methods are based on the decomposition of the networks to individual queues. We direct interested readers to [114] for a complete survey on these methods. The inclusion of fork-join queues in a network makes its analysis more complicated. In fact, for fork-join queues exact analytical results only exist for the mean response time of a two server system [128, 129]. Although these types of queues can be seen in many applications, these queues have not received much research attention because are very difficult to analyze. For example in [126], the authors have proposed an approximation technique for an open queue with fork-join queues and normal queues, but they have assumed a blocking type of an open FCQN composed of M/M/C/K queues. Another decomposition-based approximation method is the bottleneck analysis discussed in [117], and [130]. In this approach, the bottleneck queue is determined and through its analyses approximations for the network of queues can be obtained. To best of our knowledge, our work is the first that has used a probabilistic approach to control admission in an open FCQN with losses that can guarantee the loss probability for systems with non-stationary request arrival processes. Admission to network of queues has also been studied by the telecommunication research community in the context of admission to wireless networks. A comprehensive survey on this field can be found in [131]. The context of our problem, however, is different since we are dealing with composing multiple services and creating new applications. In addition, while assuming exponential distribution for calls in wireless networks seems reasonable, it would not be an accurate assumption in service-oriented environments. Moreover, parallel and loop operations in service composition do not have a match in the

Queue-enabled Service Commitment

163

wireless cellular networks CAC problem. The closest wireless CAC algorithm to our problem is introduced in [132], in which the authors have also used a convolution-based approach to predict the future resource usage in the cellular network. However, in that paper the authors have stopped short of analytically computing the call dropping probabilities in the way we formulated the over-commitment and application failure probabilities. Other prediction-based papers in the field of CAC in wireless networks involve using linear predictors and wiener-process based predictors for future resource usages [133] which basically anticipate the future based on the past. However, in our problem, by utilizing the knowledge on the execution plans and service execution times, we can analytically anticipate the future resource usage in a much more accurate way. In the real-time operating systems field, there are numerous articles on scheduling and admission control mechanisms for real-time tasks [134, 135]. Recently the focus has been more on tasks and jobs that have probabilistic execution times [136, 137]. For example in [137], the authors in addition to presenting a survey on the relevant publications, have described a technique on computing the probability of missed deadlines on a monoprocessor real-time operating system. In [138], the authors have approximated the task execution distribution by Coxian distributions of exponentials and performed the schedulability analysis for multiprocessor real time application. Although our work in this chapter is presented for a system that orchestrate execution of applications, and is mainly in the application level for a distributed and service-oriented environment, variations of this modeling could also be applied to real-time large-scale and distributed multiprocessor systems for the purpose of schedulability analysis as well. In the next chapter, we study another issue in making admission decisions in serviceoriented systems that is the problem of system revenue maximization. The system revenue in service-oriented systems can be maximized by admitting more valuable application classes to the system and rejecting less valuable ones considering the applications

Queue-enabled Service Commitment

164

request arrival rates. We also present an application admission control system that combines the DASC algorithm and the reward-based admission controller to maximize the system revenue as well as to guarantee QoS.

Chapter 8 Application Admission Control System In a service-oriented environment service instances are allocated to composite applications so that the required performance is provided. Application admission control can be used to ensure that appropriate amounts of instances are committed to applications, given the revenue each application brings in the system and the system’s current commitment. The techniques described so far are able to control the over-commitment and failure probabilities and guarantee the application success probability. However, they do not address the issue of maximization of system’s overall revenue. In this chapter, we extend our study by proposing an application admission control system for service-oriented environments. This proposed system mainly makes the admission decision in two steps. Upon receiving a request for an application, in the first step, the system according to the current commitments, checks if it can guarantee the target QoS in terms of probability of the successful completion. This check is called the feasibility check part of the admission control system that uses Distributed Algorithm for Service Commitment (DASC) [14] described in the previous chapter. In the second step, a revenue maximization unit is used to maximize the system 165

Application Admission Control System

166

revenue by accepting more valuable applications to the system. For this unit, we propose two approaches. The first approach is a steady-state based revenue maximization one that is simple to implement, but does not capture the transient state of the system. The second approach is a more complicated method that uses online optimization based techniques which itself consists of three sub-blocks. The main sub-block is an online optimizer block that solves a binary integer programming problem to maximize system revenue. The proposed approaches in this chapter are different from the MDP-based solutions proposed in Chapter 6 in that they avoid the exponential service execution times and request inter-arrival assumptions of the MDP-based methods. In this chapter, we first state the problem of reward-based admission. The steadystate based approach is discussed next, and online optimization approach to application admission control, and its main blocks are discussed in section 3. These blocks are the feasibility check block, scenario generator block, online optimizer block, and the final decision maker. The binary integer programming problem is formulated in this section as well. Lastly, we present the performance evaluation and comparison results.

8.1

Problem Statement

Assume a service-oriented environment in which there are different service types, and where different applications can be created by composing sets of different service types. Each instance of an application requires each given service type during part of the application lifetime. Service instances can be used by other applications instances as soon as they become idle. Figure 8.1 shows an example system with 3 types of applications and 3 service types: Application 1 is composed of service types 1, 2 and 3; application 2 is composed of service types 2 and 3; and application 3 is composed solely of service type 3. In the example, application 1 first executes service type 1, and then executes service

167

Application Admission Control System

Application 1

S1

Application 2

S2

S2

Application 3

S3

S3

Applications leaving the system

S3

Figure 8.1: A sample service-oriented environment

type 2, and finally it goes to the last service type. Similarly, application 2 executes service type 2 followed by service type 3. Multiple applications can contend for the same service, and we suppose that each application brings a different reward to the system. For example if application 2 brings a low reward to the system while applications 1 and 3 bring higher rewards, then the system should avoid over-committing service type 3 to application 2 at the expense of applications 1 and 3. Application admission control entails regulating the admission of applications so that application requirements are met while system revenue is maximized. In the previous chapter, we proposed a distributed heuristic algorithm for the problem of service-commitment in service-oriented systems called DASC [15]. The DASC algorithm makes sure that the system delivers a guaranteed level of quality of service in terms of success probability for each accepted application instance. Another aspect of the problem of application admission control is to ensure maximization of the system revenue. In this chapter two revenue maximization methods are proposed: one is a steady-state based method and the other one is an online optimizationbased method that maximizes the system revenue by solving a linear programming problem. In the next section, we first describe the steady-state based method for revenue maximization.

168

Application Admission Control System

8.2

Steady-State Based Application Admission Control System

In this section, we study the revenue maximization problem in service-oriented systems in steady-state. Our goal is to obtain a set of admission parameters for application classes to maximize the system overall revenue. Assume that there is a service-oriented environment with M different service types and L different application classes in which the reward for serving an instance of application class i is Ri . In this system, the probability of an application class i instance uses a service type k instance during its execution is pik . We assume that the incoming process for the application class i is a renewal process [139] with mean 1/λi . In other words, its interarrival time follows a general distribution with mean 1/λi . Also we assume that the execution time of service type k follows a general distribution with mean mk . From [139], we can find the expected number of application i instances being served at service type k in the steady-state as pik λi mk if there was no limitation for the service instances in the system. If we sum the expected values for all application classes, then the expected number of busy service instances of type k in the steady-state would be: L X

pik λi mk

(8.1)

i=1

On the other hand, since there are finite number of service instances of each service type in the system, we define an admission control parameter zi for the application class i indicating the portion of requests for class i applications that can enter the system. Therefore, we define a linear programming problem for finding the optimum values for zi s that can maximize the system overall reward in the steady state considering the

Application Admission Control System

169

limitations on the number of service instances (Nk ) in all service types as follows:

max

L X

(λi Ri )zi

(8.2)

i=1

s.t.

mk

L X

(λi pik )zi ≤ Nk , ∀k ∈ {1, 2, ..., M }

i=1

zi ∈ [0, 1], ∀i ∈ {1, 2, ..., K}

The optimum values achieved by solving the above linear programming problem are used by our proposed service-oriented system to control the incoming request rate for each application class entering the system. This control can be enforced using token bucket mechanisms that regulate the admission rate of application classes to the system. The rate control parameters which are required for this algorithm can be calculated in an optimizer module. If the request arrival process is stationary and has known arrival rate for each application, they can be provided to this module manually. On the other hand, automatic rate measurement techniques can be used in case the application request arrivals processes are not stationary. In this case, the optimizer can recalculate these parameters every time the incoming request rates change. It is important to note that in the steady-state method, the reward-based admission block is totally separated from the commitment block, and if an application request passes the reward-based admission control mechanism, it needs to pass the service commitment checks as well, in order to get into the system. This makes the implementation of this system simple and as it will be shown in the performance evaluations section it can achieve acceptable performance results. In the next section, we describe another revenue maximization approach using onlineoptimization techniques. Although this alternative method is more complicated than the steady-state method, but it can better capture the transient states of the system and

Application Admission Control System

170

make better admission decisions in those states.

8.3

Online Optimization-based Application Admission Control System

If we assume exponential distribution for the service execution times and for the applications request inter-arrival time, finding the optimal solution for the problem will lead us to solving a dynamic programming problem using Markov Decision Processes that we studied in [12]. However, in the general distribution case, the search for the optimal solution involves solving a multi-stage stochastic programming problem [140]. In a multi-stage stochastic programming problem, in contrast to a deterministic programming problem, we try to find optimal decisions at each stage, considering the stochastic nature of the problem and uncertainty about future events. In our case, for instance, whenever a request enters the service-oriented system, we would like to know whether we should accept the request and gain its corresponding reward, or wait for later request arrivals for other more valuable application classes. The uncertainty in this problem is the time and type of the future request arrivals, and the execution times of the corresponding service types. In a multi-stage stochastic problem the decisions should be made when a request for an application arrives to the system, while somehow accounting for the uncertainty in future stages when future requests might come and leave. Due to the enormous number of uncertainties, this approach to finding optimal solutions for the application admission control in service-oriented systems becomes computationally extensive and infeasible for real systems, especially for the systems which require on-line decision making. Another approach to this problem that we follow in this section is a heuristic technique that finds near-optimal decisions using on-line optimization approaches [140]. In online optimization approach, we try to find best decisions for accepting or rejecting a

171

Application Admission Control System

Online Optimization-based Admission Control Request for application

Feasibility Check (DASC)

reject

Scenario Generator

Online Optimizer

Final Decision Maker

Accept the request

reject

Figure 8.2: Application Admission Control System using Online Optimization

request for an application class as requests arrive to the system, in an online manner. As described in [140], the online stochastic combinatorial optimization approaches have been used in solving many different decision making problems such as scheduling, and resource allocation. For example in [141], the authors have studied an online optimization technique for the problem of admission control to a media-on demand system. The online optimization approach for our problem consists of finding the optimal decision for some sample scenarios of the system trajectory instead of finding the optimal decision that could be achieved from a computation intensive multi-stage stochastic programming problem. In particular, our online optimization approach considers few sample scenarios up to a finite horizon, and finds the optimal decisions for those scenarios, instead of considering all the uncertainties on events which would occur in future. To do so, we have to consider few factors such as the number and status of the application instances that are already being served in the system as well as the number of available service instances. An additional important factor in making the decision is the reward that each application class instance brings for the system. The system has to make a decision on either accepting the newly arrived request, or waiting for future more valuable requests. Other two important factors in the decision making are the time between each arrival

Application Admission Control System

172

for different application classes, and the time for executing services for each application class. Therefore, we propose the following algorithm for the problem of application admission control in service-oriented systems using the online optimization approach, and we elaborate more on each of the following steps later in this section. 1) Upon receiving a request for an application class, we check the feasibility of accepting the request and we reject the request if accepting the request is not feasible. 2) We generate some scenarios for the possible system trajectory in future. 3) We find the optimal decision of either accepting the newly arrived request for the application or rejecting it in each generated scenario. 4) We make the final decision of accepting or rejecting the request based on the output of each decision making process in step 3. Figure 8.2 shows the block diagram of this proposed algorithm.

8.3.1

Feasibility Check

Feasibility check function in our algorithm evaluates the system’s current commitments to the already being served application instances, in order to guarantee an agreed level of quality of service. To do so, we use our previously proposed algorithm for the service commitment. In the previous chapter [14], we stated the problem of service commitment in service-oriented systems, and we proposed a distributed algorithm for this problem called DASC. In DASC, we define a threshold for the over-commitment probability, and we keep the over-commitment probability under this threshold by rejecting the requests for application classes that might push this probability above the threshold. By doing so, we guarantee an agreed level of application success probability for the admitted application instances. In our admission control system, we utilize the DASC algorithm to check the feasibility of admitting an application instance. It is important to note that by feasibility, we mean

Application Admission Control System

173

guaranteeing the agreed level of success for all of the application instances that are already being served, as well as the newly arrived request. If this check shows that we can not deliver the guaranteed level, we will reject the request immediately, otherwise we will proceed to the next step of the algorithm.

8.3.2

Scenario Generation

The second step in our online optimization approach is to generate a number of sample scenarios for the system trajectory. These scenarios consist of scenarios for the application instances that are currently in the system as well as scenarios for applications that arrive in future. The scenario generating mechanism identifies exact times for service executions as well as the execution path of the application. For instance, if an application class 1 instance in Figure 8.1 is currently being served in service type 1, the scenario generator would tell us that this instance finishes execution of that service at time unit 700, and will continue to service type 2 and finishes executing it at time unit 2500, and after that starts executing service type 3 and leaves the system at time unit 4200. As in this example, in each additional scenario, an exact timing is assigned to each of these transitions and also the exact execution path is specified. To generate these scenarios, we can use the distributions for execution times for each service type, distributions of request inter-arrival times, and different probabilities associated with choosing each service and consequently the applications execution path. We have discussed these distributions and probabilities in the previous chapters [12, 14]. An approach to obtain the required distributions is to use the system’s historical data. To do so, the historical data of the system’s activity has to be recorded and analyzed to find the required distributions and probabilities. In the rest of this chapter, we assume that these distributions are already available to the online admission control system, and the scenario generator block uses these distributions for generating scenarios for the

Application Admission Control System

174

current application instances and future arrivals. Another issue in generating scenarios is specifying a finite horizon for these scenarios. In other words, we have to decide how far we want to look to the future of the system in generating scenarios. To some extent, assuming a finite horizon for generating scenarios resembles defining a sliding window in discrete time signal processing systems. As in these types of systems, a limited window of the accumulated data is used for the processing, and decisions are made based on the windowed data. There are various factors in determining the horizon, such as the storage and processing limitations of each system. Applications lifetimes are another important factor in determining the horizon. Therefore, the decision on the length of this horizon can be made by the system designers based on each system’s resources and scale. The approach that we practice in this paper is assuming a finite horizon based on the maximum lifetime of the application classes. After generating the required scenarios, we are ready to proceed to the next step of our online optimization approach that is discussed in the next subsection.

8.3.3

Optimal Admission Decisions For Generated Scenarios

The online optimizer in our proposed system is responsible for finding the optimal decision for accepting or rejecting the requests in each scenario. To do so, we formulate a linear programming problem. In our online optimizer, the reward for serving an application request r is represented by w(r), and the decision for accepting or rejecting that request is shown by a(r) that can take one of the two following values: 0 for rejecting, and 1 for accepting the request r. There are also K different service types in the system S = {sj , (1 ≤ j ≤ K)} that each has N (sj ) instances. The scenario generator block produces a series of events and the timings associated to each event for the online optimization block, as described in

Application Admission Control System

175

the previous section. Based on these definitions, we can define the following Binary Integer Programming (BIP) problem for finding the optimal admission decision for each scenario:

max W =

X

w(r)a(r), a(r) ∈ {0, 1}

(8.3)

r∈R

s.t

X

er (sj , te )a(r) ≤ N (sj ), ∀te ∈ T, sj ∈ S

T =

[

(8.4)

r∈R

Ter

S = {sj , (1 ≤ j ≤ K)}

in which R represents the set of all requests for applications including the newly arrived request represented by r = 0. Also Ter is a set of all of the event times associated to one particular request r, and er is the execution path of the request r, both provided by the scenario generator block. The objective of this binary integer programming problem is to maximize the system reward W by accepting or rejecting each request in the generated scenario. This maximization is subject to the services capacity at the time of each transition in the applications execution path. This constraint is evaluated by considering er (sj , te ) which shows that request r is in service type j at transition time te or not. The set representing all these transition times is called T which is produced by the scenario generation mechanism for each scenario. The formulated BIP finds the optimal admission decision for all requests in one scenario. However, our main concern is a(0) that shows the decision for accepting or rejecting the newly arrived request in that particular scenario. The above integer programming problem can be solved efficiently using techniques such as branch and bound. The two main outputs of this step are a(0) and the maximum achievable reward (W ) that are fed into the next step of our proposed online

Application Admission Control System

176

admission control system explained in the next subsection. It is important to note that this optimization problem is solved for each scenario, hence the number of scenarios that can be evaluated is limited to the available time for making the decision, as well as the time required for solving the stated BIP problem given the available processing power. Therefore, in general case, the number of scenarios, and consequently the number of optimizations will be determined by the system designers based on the system’s specifications and resources.

8.3.4

Final Decision Making

In the previous subsection, we found the optimal decision for accepting or rejecting the request for an application in each scenario . The next step is to make the final admission decision. To make this decision, the output of the online optimizer block (i.e. a(0) and W for each scenario) are fed to the final decision maker block. Based on these obtained parameters, we can practice one of the following approaches for making the admission decision: 1) Voting: accept the request if the majority of the optimal decisions for the generated scenarios are in favor of accepting the request. This approach is similar to the voting mechanism where a decision is made when majority of the voters are agreed to the decision. 2) Conservative: accept the request if all of the decisions are in favor of accepting the request. 3) Greedy: accept the request if at least one of the decisions is in favor of accepting the request. 4) Maximum reward: accept the request if the total reward gained by accepting the request is more than the total reward gained by rejecting the request.

Application Admission Control System

8.4

177

Performance Evaluation

To evaluate the performance of the proposed algorithm we simulated the system depicted in Figure 8.1. We wrote a C++ program, and for solving the linear programming problem we used an open source library called lpsolve [142]. We set the number of instances per service type (i.e. N1 , N2 , and N3 ) to be 20. For the service execution times, we assumed a beta distribution [118] for all three service types with parameters α = 2.333, β = 4.666, the optimistic value of 1000, pessimistic value of 2000, and mean of 1333 time units. The simulation period in our simulation is 180000 time units which is 30 times of the maximum lifetime of an application 1 instance. We also chose similar geometric distributions for the arrival inter-arrival times for all three classes of applications with parameter p ranging from 0.002 to 0.010. We also assumed the following rewards for successful termination of each application instance: 0.4 for an application class 1 instance, 0.2 for an application 2, and 0.8 for an application 3. The penalties for unsuccessful termination of applications are 0.3 for an application 1 instance failed in service 2, 0.8 for an application 1 instance failed in service 3, and 0.4 for an application 2 instance failed in service 3. No cost is associated with rejecting a request for an application, and rejected requests will leave the system and will not interfere with the system in future. We evaluated the system performance using four different techniques. The first technique which we used is the No Commitment Policy (NCP) in which the system does not try to maximize the system revenue and does not avoid over-commitments. For the second mechanism, we only used the DASC algorithm that guarantees the quality of service, but does not address the problem of reward maximization. We set the threshold parameter in this algorithm to 1% meaning that the system guarantees 99% success probability for the admitted application requests. For the third technique, we used the steady-state based application admission control algorithm [14]. This technique works based on the steady-state analysis of the system,

178

Application Admission Control System 1300 NCP DASC1% Alone DASC1%−SS DASC1%−Online

1200

System Reward

1100 1000 900 800 700 600 500

3

4

5 6 7 Applications Request Rate

8

9

10 −3

x 10

Figure 8.3: System reward for four different techniques

and uses a linear-programming technique to achieve admission regulation parameters that maximize the system revenue in the steady state. For the fourth mechanism, we used online optimization-based system composed of the feasibility check block, scenario-generating block, online-optimization block, and the decision making block. For the feasibility check block, we used the DASC algorithm with a threshold parameter equal to 1%. For the scenario generation block, we generated three different scenarios for the period of two times longer than the longest living application (i.e. application class 1). As we mentioned earlier, we used lpsolve library [142] for solving the binary integer programming problem, and for decision making, we used the voting mechanism. Figure 8.3 shows the system revenue for the period of simulation for these four different mechanisms. Figure 8.4, on the other hand, shows failure probability for application classes 1 and 2 in a logarithmic view. As it can be seen, the NCP technique performance is acceptable when the system

179

Application Admission Control System

0

0

10

Application 2 Failure Rate

Application 1 Failure Rate

10

−1

10

−2

10

−3

10

−1

10

NCP DASC1% Alone DASC1%−SS DASC1%−Online

−2

10

−3

3

4

5 6 7 8 Applications Request Rate

10

9 −3

x 10

3

4

5 6 7 8 Applications Request Rate

9 −3

x 10

Figure 8.4: Application 1 and application 2 failure rates based on the applications request rate

is lightly loaded but the system revenue degrades drastically as system’s load increases. Moreover, the application failure results for this technique are extremely poor considering the best effort nature of this technique. The second observation is the performance of the DASC algorithm when it is the sole mechanism in place. As it can be seen, although the application failure probability is under the threshold, the system revenue does not improve using this algorithm. The steady-state based admission control algorithm, combined with the DASC algorithm, can perform better than two previous techniques. As it can be observed, the system revenue increases using this combination, and at the same time, the required quality of service can be delivered. However, the performance of the online optimization approach outperforms the steadystate based technique, especially when the system is not heavily over-loaded. The main reason for this performance improvement is the ability of the online optimization technique in capturing the transient conditions in the system, as opposed to the steady-state

Application Admission Control System

180

based admission control technique which uses the steady-state conditions of the system. This observation can be interestingly confirmed by the fact that the performance improvements are mainly occur when the system is not heavily over-loaded, and there are further potentials in the system for revenue maximization. As system goes over loaded, the systems capacity becomes saturated, and therefore both the steady-state based technique and the online optimization based system can perform well.

Chapter 9 Conclusions Future networks should cope with challenges imposed by emerging future generation of applications; otherwise the range and scope of applications over future networks will be limited by the design choices of the past. In this thesis, we studied future networks and applications requirements and we addressed various challenges in future networks by proposing an architecture, a network research testbed and scalable and distributed QoS control algorithms.

9.1

Contributions

While most of present proposals on future network architectures have been designed to address requirement of a particular class of applications, we have taken the research on future network architectures a step further by proposing an application-oriented network architecture as a configurable converged communication and computing network. Based on this new network architecture, we designed a Virtualized Application Networking Infrastructure that enables networking researchers to experiment with new network architectures and distributed applications. We have also proposed a novel scalable and distributed QoS and admission control algorithm in Service-Oriented systems and in Finite Capacity Queuing Networks in general. Overall this thesis contributions can be 181

Conclusions

182

listed as follows:

9.1.1

Application-Oriented Networking

We proposed a novel network architecture called an Application-Oriented Network architecture that addresses challenges from future networks applications such as configurability and application-awareness, and facilitates application creation through virtualization of processing, storage, reprogrammable hardware, and software resources that are commonly used in application creation. We proposed a three-plane architecture for AON comprising a control plane, a management plane, and an application plane. Applications are able to configure the resources in the application plane to satisfy their own requirements. These resources are virtualized computing, storage, hardware and software resources, and other resources and functionalities needed for rapid application creation. Multiplicity of applications are able to coexist over the same shared virtualized infrastructure in AON application plane that is managed and control by the other two AON planes; AON management and AON control. The latter is responsible for control-related functions such as allocation, and release of the resources as well as failure recovery operations, while the former is responsible for performing management related functions such as monitoring, provisioning, and re-provisioning and long-term fault management. We also proposed an architecture for applications in the application plane that has three main characteristic: a two-layer (service and transport) architecture, a service-oriented service layer, and a transport layer that provides content and data delivery. The proposed architecture can be helpful in a diverse range of applications that require responsiveness, reliability, security, smart caching, and efficient content broadcasting/multicasting. Mobile networks can also utilize the processing and storage capabilities embedded in the architecture for performing smart and adaptive content conversion and distribution to mobile nodes that experience hand-off as well as temporary disconnections.

Conclusions

9.1.2

183

Virtualized Application Networking Infrastructure

In this thesis, we presented Virtualized Application Networking Infrastructure (VANI) as a networking research testbed that allows experimentation with new networked systems and distributed applications. Compared to the other networking research testbeds VANI utilizes a service-oriented control and management plane that provides flexible and dynamic allocation, release, program, and configuration of resources used for performing large-scale experiments in a wide area network from layer three up. VANI resources in the application plane allow development of network architectures that require a converged network of computing and communications resources and in-network processing, and storage. Another main contribution in VANI is introduction of a reprogrammable hardware resource that can be allocated to the experiments that require high performance and high throughput computing on-demand. This resource is designed based on virtualization of hardware resources, in particular FPGAs, and providing well-defined interfaces to the researchers to program and configure it. Through experimentations and measurements, we showed that the reprogrammable hardware resource can be programmed rapidly and can achieve very high throughput using its 16x10GE interfaces. VANI also allows registration of new hardware and software resources in the control and management plane. This facilitates experimentation since researchers can set-up new experiments rapidly using the available service components developed independently by other researchers. VANI in essence is a prototype of our proposed Application-Oriented Network Architecture and a proof-of-concept to show case how AON proposed concepts can be realized and how distributed applications and new network architectures can be built on such a network. Another major contribution of this study was the design, and development of DETS that is a novel system to shape and regulate Ethernet traffic in VANI as well as in a

Conclusions

184

computing cluster, or a datacenter. The DETS system is required where there is a host node connected to several virtual local area networks, and the sending and receiving traffic rate on each of these virtual networks has to be guaranteed and controlled. Without this control, an excess of received traffic on one of these virtual networks could disturb other virtual networks ability to receive traffic in a guaranteed rate. While most of current solutions for Ethernet congestion control rely on simple Congestion Notification-based mechanisms and virtually all of them require a change in the Ethernet hardware equipments, our proposed DETS system does not require any changes in the hardware. It is also able to operate distributedly using one of the four algorithms proposed for rate allocation. Through the experimentation on an actual Linux-based computing cluster, we showed the effectiveness of the DETS, and we compared the performance of the four algorithms and discussed their characteristics. We also proposed modifications to the Ethernet control plane so that DETS can be natively supported by Ethernet networking elements.

9.1.3

Scalable and Distributed QoS and Admission Control

In this thesis, we studied the problem of QoS and admission control and allocating instances of services to different applications in service-oriented environments. In this problem, a limited number of service instances from each service components are shared among different application classes. The major concerns in this problem are two-fold: maximizing the system revenue by allocating the service instances to the more valuable application classes considering the service execution times and request inter-arrival times of each application class; and guaranteeing the successful completion of an admitted application instance. We presented a method for obtaining the optimal policy for maximizing system revenue using Markov Decision Processes for small scale systems with exponential service execution times and request inter-arrival times. We analyzed the case where the consti-

Conclusions

185

tuting service components in an application are executed concurrently throughout application lifetime as well as the case where the service components are executed sequentially, and hence are not required throughout the application lifetime. We presented the optimal policy for prototype examples, and we compared the performance of applying this policy to the system with the performance of a system that uses Complete Sharing or Complete Partitioning mechanisms. In all cases, we showed that applying the policies obtained from Markov Decision Processes results to considerable performance improvement in system revenue compared to the other two mechanisms, especially when the request rates for the applications are high. As another major contribution of this study, we presented a Distributed Algorithm for Service Commitment (DASC) that guarantees a specified level of probability of successful completion for an application in a service-oriented system in settings that have stationary as well as non-stationary arrivals. We showed that the Central Limit Theorem can help us in computing this probability, and we also described alternative approach for computing this probability using Chernoff’s bound. The DASC algorithm can be implemented in a distributed environment and does not assume any specific distribution type for service execution times and application request inter-arrival times. For stationary systems, we proposed two steady-state based alternative approaches (one based on bottleneck analysis, and the other based on simulation) that use token bucket regulators to control the admission of application request to the system. These algorithms are simpler to implement than the DASC algorithm, but they can not operate in non-stationary environments. DASC, however, is able to perform in both stationary and non-stationary environments using transient-state analysis of the system. We presented performance evaluation results showing the effectiveness of the DASC algorithm in a simple service oriented system as well as in a complex system with both stationary and non-stationary request arrivals. We also showed that by adding a few queuing spaces, we can guarantee a specified

Conclusions

186

level of queuing probability for an application instance, and at the same time, significantly reduce the application failure probability. In doing so, we presented a series of theorems and corollaries that can be used in finding bounds for the time to enter service distribution in general queuing systems. To maximize the system revenue in addition to guaranteeing QoS, we proposed an application admission control system for service-oriented systems. The proposed system is able to use a simple steady-state or an online optimization approach for maximizing the system revenue, in addition to the DASC algorithm that guarantees the required level of probability of success. The online optimization block of our system is composed of three sub blocks; the scenario generating block, the online optimizer, and the final decision maker. We elaborated the functionalities of each block, and we discussed the important factors in designing each of them. We also formulated a binary integer programming problem which maximizes the system revenue in the online optimizer block. The simulation results and performance comparisons show that the proposed system can achieve its objectives and it can improve the system performance.

9.1.4

Related Educational Contributions

The last, but not the least, contribution if this study is the education of several University of Toronto (UofT) students especially through their involvement in performing experiments with AON architecture and design and development of various parts of VANI. In the early stages of this study, we were conducting experiments on AON architecture and applications. Justin Seto and Andrew Mehes helped us in this process by implementing a prototype of a new network architecture in AON for their final year design project at Electrical and Computer Engineering (ECE) department, UofT. The developed system has XML-delivery function in its transport layer and uses a peer-topeer mechanism to organize its network. The two other students that were involved in

Conclusions

187

this process were Michael Ens and Ian Gartley. They were Engineering Science students that performed experiments with the NaradaBrokering pub/sub system as well as a new open-source XML-parser. A major force in the VANI project was Keith Redmond, a MASc student at University of Toronto. We worked very closely together in design and development of virtualization layer for main resources in VANI including processing, storage, reprogrammable hardware, and the internal fabric. In summer of 2008, Tom Yue was a summer student that worked with us in development of parts of the VANI virtualization layer, specifically on the WS interfaces of reprogrammable hardware resource. Darryl Chung was also a summer student that developed the base for a Graphical User Interface for VANI in the summer of 2008. Gordon Tam was an Engineering Science student that helped us in development of VANI control and management plane software. He started working with us on his final year design project and continued his collaboration during summer of 2009 as a summer student. In the summer of 2009, a group of summer students helped us in development of various software resources in VANI including database resource, orchestrator resource, the hardware-based gateway resource, and GENI-VANI interworking resource. These students were Arbab Khan, Saleh Dani, Mingliang Ma, Maxim Galash, and Wenyu Li. Three of these students (Arbab Khan, Saleh Dani, Maxim Galash) together with Anthony Das Santos worked on a prototype of a green orchestrator engine and developed a sensor resource for VANI as their final year design project. Minglian Ming helped us in exploring some of our future work in regard to automatic application deployment in VANI as well. Arbab Khan still is cooperating with us as a summer student to integrate, maintain and improve VANI control and management software, and the developed processing, storage, gateway, and internal fabric resource. The author takes pride in working with these students and in being a part of their education process at University of Toronto.

Conclusions

9.2

188

Future Work

This dissertation has covered many subjects in dealing with challenges in future networks. In terms of future work, there are many possibilities in each of the covered topics. In Application-Oriented Networks in general, and VANI in particular, an important future work is to develop large scale applications based on this architecture and the developed testbed. One application that we are currently investigating is a green application orchestrator engine. In the green orchestrator engine, we intend to create a distributed follow-the-sun system that is able to move service components to VANI nodes that have better access to green energy such as solar power or wind. The green orchestrator system is built on VANI using a variety of software-based resources developed for VANI including the complex event processing service, and sensor service. Another application of VANI is in SW-defined radio. In wireless networks, VANI is capable of processing a large amount of aggregated and digitized radio signals in its reprogrammable hardware resources. This capability facilitates advanced research on software-radio systems, and future wireless technologies. A major extension to the AON control and management plane, as well as VANI is to develop functionalities to automate application creation and deployment. In an automated system, an Application-Provider would be able to specify the high level business goals of an application, and the system can identify the appropriate service components and deploy them in the right places in an AON to deliver the required functionality. Inclusion of autonomous management techniques in VANI is another possible extension of work on the VANI testbed. Additional future work in VANI include interconnecting VANI to GENI testbeds so that GENI researchers can use VANI resources to carry out federated experiments, as well as setting up VANI nodes in different sites across a wide area network to enable large scale experimentations. In addition, we plan to include new hardware resources such as

Conclusions

189

the new BEE3 boards and GPU-based hardware in VANI. We hope that VANI could serve as a breeding ground for research on large-scale and advanced networked systems in Canada in future. In terms of future work on Distributed Ethernet Traffic Shaping system, we intend to further explore the DETS protocol modifications to the Ethernet control plane, and develop proof of concept Ethernet switches with this capability using the hardware resources developed for VANI. In scalable QoS and admission control in service-oriented systems, we intend to further explore the transient-state analysis potential in maximizing the system revenue by predicting the revenue that a system would loose or receive by admitting a request for an application, especially when the system is not over-loaded and there is room for gaining more revenue. Another extension to this work could be including scheduling mechanisms for the queued application instances in the system. Further development of the proposed commitment algorithm to reduce power consumption in a service oriented system through anticipation of future resource requirements and putting the surplus resources in the low power mode could be another area of future research. Finally, incorporating the proposed QoS-control mechanisms in a real service-oriented system such as AON is another major extension of this work that we would like to explore in future.

Appendices

190

Appendix A Queue-Enabled Service Commitment In Q-DASC, we use the pdf of Time to Enter Service (TES) in a G/G/C/N queuing system. Finding exact solutions for TES distribution in general is very difficult. Therefore in this section, we introduce several results that help us find approximations for TES.

A.1

Time to Enter Service in a G/G/C/N System

Assume that there is a G/G/C/N system that has a general distribution type for the request arrivals. Also it has C independent instances of one service with execution time pdf f (t) with mean µ, and has N queue spots in front of them. We are interested in the distribution for the Time to Enter Service (TES) for the queued requests, assuming that all service instances are busy. To do so, we define the system state at time t as s(t) = (t − t1 , t − t2 , ..., t − tC ), t1 ≤ t2 ≤ ... ≤ tC , in which ti represents the time that ith instance started serving a request. Finding the closed form representation of the distribution of the time to enter service (TES) for each of the requests in the queue is quite difficult and impractical in general case. However, in this section, we develop series of theorems that lead bounds for these distributions. Our intuition is that we only need to study the residual times of the j longest served 191

192 requests that are already in the system to find out the time to enter service for a queued request that is in spot j of the queue(with spot 1 being the head of the queue). Following this intuition, we would then find the relation between the TES and the residual times of the requests that are being served. We will use the concepts in stochastic orders [143] to determine when our intuition is correct. Definition 1: Let X and Y be two random variables which have the following property: P {X > t} ≤ P {Y > t} ∀t ∈ (−∞, ∞)

(A.1)

then X is said to be smaller than Y in the usual stochastic order, shown by X ≤st Y . This property can be also represented in terms of cumulative distribution functions (cdf), as follows: FX (t) ≥ FY (t) ∀t ∈ (−∞, ∞)

(A.2)

In other words, the distribution of X is lower bounded by distribution of Y . Definition 2: A nonnegative random variable X with distribution function F and survival function F¯ (t) ≡ 1 − F (t) is said to be Increasing Failure Rate (IFR) if −log F¯ is convex on {t : F¯ (t) > 0}. Also X is said to be Decreasing Failure Rate (DFR) if −log F¯ is concave on {t : F¯ (t) > 0}. The next theorem finds the sufficient and necessary condition for a random variable to be IFR or DFR. Theorem 1) The random variable X is IFR [DFR] if, and only if, [X − t1 |X > t1 ] ≥st [≤st ][X − t2 |X > t2 ] whenever t1 ≤ t2 . proof: Theorem 1.A.13 in [143]. According to this theorem if the execution time of a service has the IFR property, then the application instances that are already being served in the system are more likely to finish their execution in the order of their arrival to that service. Similarly, if it is DFR, the instances are more likely to finish their execution in the reverse order of their

193 cdfs for a uniform distribution 1 F F4 F3 F2 F1

0.9 0.8

probablity

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

5

10

15 time unit

20

25

30

Figure A.1: Distributions for residual service times in a service with uniform execution time arrival. Therefore the next lemma can follow at once from the above definitions and theorem: Lemma 1) Assuming f (t) to be pdf of an IFR [DFR] service execution time, and F (t) as its cumulative distribution function (cdf), and the system is in state s(t), then F1 (t) ≥ F2 (t) ≥ ... ≥ FC (t) ≥ F (t), [F1 (t) ≤ F2 (t) ≤ ... ≤ FC (t) ≤ F (t)] in which Fi (t) is the cdf of the random variable Tri that shows the residual time of the ith service instance. The uniform distribution is an IFR distribution. Among other IFR distributions we can name, Normal distribution, the Gamma and Weibull distributions for α > 1, and the modified extreme value distribution [144]. DFR distributions are rare but as an example we can name log normal distribution [144]. Equality in Lemma 1 is for the exponential distribution that has a constant failure rate and is the boundary for IFR and DFR type of distributions. Example 1: Assume that a service execution time has a uniform distribution U(10,20). Figure A.1 shows Fi (t) distributions for a system that has four service instances (C = 4), and is in the state s(t) = (t − 15, t − 12, t − 8, t − 4).

194 cdfs for a normal distribution 1 0.9 0.8

probablity

0.7 F F4 F3 F2 F1

0.6 0.5 0.4 0.3 0.2 0.1 0

50

100

150 time unit

200

250

300

Figure A.2: Distributions for residual service times in a service with Normal execution time Example 2: Figure A.2 shows the distributions for a service with Normal distribution N(200, 10). In this system there are four service instances (C = 4), and the system is in the state s(t) = (t − 150, t − 120, t − 80, t − 40). Example 3: Figure A.3 shows the distributions for a system with four servers. The service execution time is the Beta distribution used in the previous sections, and depicted in figure 7.5. This Beta distribution is also an IFR distribution. The figure is depicted when the system is in the state s(t) = (t − 1500, t − 1200, t − 800, t − 400). Using the above definitions, theorem and lemma, we can now focus back on the properties of the Time to Enter Service (TES) for a queued application instance. Theorem 2) In a G/G/C/N system the distribution of TES for the first instance in the queue (head of the queue) is lower bounded by the distribution of the residual time of any of the requests Already Being Served (ABS) in the system. In other words, G1C (t) ≥ Fi (t), if G1C (t) is the cdf for the TES of the first request in the system. proof : Assume that the system is in state s(t) and there is one request in queue. The time to enter service (TES) for that request is a random variable shown by Tw 1C and is

195 cdfs for a beta distribution 1 0.9 0.8

probablity

0.7 F F4 F3 F2 F1

0.6 0.5 0.4 0.3 0.2 0.1 0

200

400

600

800

1000 1200 time unit

1400

1600

1800

2000

Figure A.3: Distributions for residual service times in a service with a Beta execution time α = 2.333, β = 4.666 equal to min(Tr 1 , Tr 2 , .., Tr C ) in which Tr i is the residual service time of the ith server. Cdf for Tw 1C is represented as G1C (t), and cdf of Tr i is referred as Fi (t). From the definitions, we wish to prove that Tw 1C ≤st Tr i , ∀0 < i ≤ C. To do so, we have to show: P {Tw 1C < t} ≥ P {Tr i < t} ∀t > 0

(A.3)

To prove the above inequality, we have to show that the event Tr i < t is a subset of event Tw 1C < t. This, however, is true, since we know that if Tr i < t then Tw 1C will be less than t. As a result, the above inequality is true, and the theorem is proved. The next two corollaries discuss the properties of the this variable in terms of its mean as well as its characteristics for an IFR [DFR] distribution. Corollary 1) In a G/G/C/N system, the mean TES for the first request in the queue (head of the queue) is not more than the mean residual time of any of the requests in the system. In other words, m1C ≤ mi . proof : This can be simply proved from theorem 1 and the fact that m1C =

R∞ 0

(1 −

196 G1C (t)dt). Corollary 2) In a G/G/C/N system with IFR [DFR] service time, we will have Tw1C ≤st Tr1 [TrC ], and consequently m1C ≤ m1 [mC ]. proof : This corollary can be proved from theorem 2, lemma 1, and corollary 1. The above corollary interestingly states that in a system with IFR [DFR] service time, the distribution of the TES for the first request in queue is lower bounded by the distribution of the longest [shortest] Already Being Served (ABS) request in the system. The next theorem considers the properties of the time to enter service (TES) random variable for other application instances in the service queue. Theorem 3) In a G/G/C/N system, the distribution of TES of the jth request (2 ≤ j ≤ C) in the queue is lower bounded by the distribution of the residual time of the maximum of any combination of j ABS requests in the system. proof : If we define TwjC as the random variable representing the TES of the jth request in the queue, we can define a set VC as follows:

VC = {Tr1 , Tr2 , ..., TrC }

(A.4)

We define VjC ⊂ VC as any subset of random variables in VC having |VjC | = j, assuming 2 ≤ j ≤ C. We need to prove:

Tw jC ≤st max(VjC ) P {Tw jC < t} ≥ P {max(VjC ) < t}

Again to show that the above inequality is true, we have to prove that the event max(VjC ) < t is a subset of event Tw jC < t. This is true, since if max(VjC ) < t then TwjC for sure will be less than t. As a result the above inequality is true and the theorem is proved. Corollary 3) In a G/G/C/N system with IFR [DFR] service time, the cdf of TES

197 for the jth instance in queue is lower bounded by the cdf of the maximum of first [last] j ABS instances in the system:

GjC (t) ≥

j [

[

C [

]Fk (t), j ≤ C

(A.5)

k=1 k=C−j+1

The next theorem finds an upper bound for the distribution of TES in a G/G/C/N system. Theorem 4) In a G/G/C/N system the distribution of TES of the jth request (j ≥ 2) in the queue is upper bounded by the distribution of the (j − 1)th request in the queue. proof: We know that TwjC ≥ Tw(j−1)C , therefore GjC (t) ≤ G(j−1)C (t), ∀t, and j ≥ 2. From Corollary 3 and theorem 3 we can see that for IFR [DFR] systems GjC (t) is S S bounded by G(j−1)C (t) and jk=1 [ C k=C−j+1 ]Fk (t).

In summary, we showed that to find bounds of TES distribution for the jth request

in queue, we only need to analyze the residual time of j requests that are already being served in the system. If the service time distribution is IFR, this j requests can be the longest served ones. Since many distributions in real systems can be characterized as IFR distributions, it can be concluded that our first intuition is correct for most real systems. However, for DFR distributions better bounds can be obtained by analyzing j shortest served requests. In Q-DASC, if an application instance is queued, we find the TES mean and variance using lower bounds, and we distribute them to other agents so that they can update their future usage estimation. As mentioned earlier finding the exact TES distribution for general service execution times is very difficult because it not only depends on the service execution time distribution but also on the current state of the system as well as start time of ABS instances. Therefore, we performed performance evaluations on the beta distribution that we used for the DASC performance evaluations in Chapter 7 in order to assess the tightness of the

Service Execution Time distribution

TES distribution for spot 1 in the queue

1

mean=1333.3,stdev=166.7

0.9

TES distribution for spot 2 in the queue

1

1

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.1

0.1

0

0

500

1000

1500

2000

0

TES distribution for spot 3 in the queue

0.2

bound: mean=1.5,stdev=0.7 sim: mean=2.3,stdev=0.3 1

2

3

4

5

0

TES distribution for spot 4 in the queue 1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

bound: mean=5.8,stdev=3.2 sim: mean=3.2,stdev=0.7

0

2

4

6

8

10

12

14

16

4

6

8

10

0.2

bound: mean=8.6,stdev=4.4 sim: mean=3.7,stdev=0.9

0.1 0

2

TES distribution for spot 5 in the queue

1

0.1

bound: mean=3.3,stdev=2.0 sim: mean=2.7,stdev=0.5

0.1

5

10

15

20

bound: mean=11.7,stdev=5.5 sim: mean=4.3,stdev=1.1

0.1 0

5

10

15

20

25

30

Figure A.4: TES distribution and calculated bound for beta distribution with α = 2.333, β = 4.666

198

199 bound. To do so, we assumed a queue with 200 busy servers with beta execution times with maximum service time of 2000. The start time of all ABS instances are uniformly chosen from 5 to 1995 with steps of 10. Figure A.4 shows the TES distributions and corresponding means and standard deviations for the first five instances in the queue from spots one to five using simulation as well as the bounds obtained in Corollary 3. As can be seen, the bound on distribution is lower than the distribution found through simulations as expected. We can also observe that the bound is tighter for the first spots in the queue and as we move further down the queue the bound becomes more conservative. In the next subsections we study the time to enter service distribution properties for G/D/C/N and G/M/C/N queuing systems.

A.2

TES for G/D/C/N System

A G/D/C/N queuing system is a system that has a deterministic service time of d seconds. We assume that all C servers in the system are busy and system is in state s(t) = (t − t1 , t − t2 , .., t − tC ), in which t1 ≤ t2 ≤ ... ≤ tC , and t − tj < d. Also, we know that a deterministic distribution is an IFR distribution. Therefore, the TES for the first request in the queue will be equal to the residual time of the longest served request in the system d − (t − t1 ). Similarly, we can see that the TES for the jth request in queue is a deterministic value and equals to:

Tw jC = d − (t − tj ), j ≤ C

A.3

(A.6)

TES for G/M/C/N System

Assume that there is a G/M/C/N system with service rate µ. It can be easily shown that TES in a G/M/C/N system follows the m-Erlang distribution. The TES distribution for the first request in queue is exponential (m-Erlang with parameter m = 1, and Cµ),

200 and distribution for the jth request in queue is an m-Erlang distribution with parameters m = j, and Cµ. Also the residual time distributions in a G/M/C/N system are all iid exponential distribution with rate µ. In this type of system, if we study the TES for the jth request in queue we can see that its mean would be:

E[TdjC ] = E[TwjC ] + E[Ts ] = j/Cµ + 1/µ =

j+C Cµ

V AR[TdjC ] = V AR[TwjC ] + V AR[Ts ] = j/(Cµ)2 + 1/µ2 =

j + C2 (Cµ)2

in which Ts is the requests service time. The interesting observation from the above equation is that for j ≪ C 2 the variance of the delay in the system is almost equal to the variance of the service time. In other words, in systems with ample number of servers, the variance of the TES for the first few requests in the queue (j/(Cµ)2 , j ≪ C 2 ) is almost negligible compared to the variance of service time (1/µ2 ).

Appendix B Computing Over-Commitment Probability using Chernoff ’s Bound The Central Limit Theorem gives a good approximation of over-commitment probability when the total number of service instances is in the range of few standard deviations (σk (t)) from the mean ηk (t). Therefore, if this range is more than few standard deviations and the required over-commitment probability threshold (Toc ) is less than 0.001, it is better to use a more tight bound on the probability. To do so, we use the theory of large deviation and the Chernoff’s bound [145] to compute the probability of over-commitment. The following is the definition of the Chernoff’s bound:

P {Sk (t) ≥ Nk } ≤ e−(sNk −µS (s)) , s > 0, ∀t > 0

(B.1)

in which Sk (t) is the sum random variable (7.4), and µS (s) = lnψS (s) is the logarithmic moment generating function of the Sk (t) random variable. Since, the right hand side of the above inequality is true for any s > 0, we can find the value of s which minimizes the right hand side of the inequality by finding the s∗ which satisfies the following equation: 201

Appendix: Over-Commitment Probability using Chernoff’s Bound

Nk = µ′S (s)

202

(B.2)

Putting the definition of the sum random variable from 7.4, we have:

P {Sk (t) ≥ Nk } ≤ e−s

∗N

k+

P



ln(Gijk (t)+Gijk (t)es )



in which s∗ is the solution of the following equation:

Nk =

X i,j

Gijk (t) . Gijk (t)e−s + Gijk (t)

It has been shown that the probability in (B.1) can be approximated for the random variables which are the sum of finitely many random variables (like our defined sum random variable (7.4) ) as follows [146]:

P {Sk (t) ≥ Nk } ≈

1 −(s∗ Nk −µS (s∗ )) p e s∗ 2πµ′′S (s∗ )

(B.3)

To make sure that the probability of over-commitment remains less than the threshold, we should compute s∗ for all times t, and compute the probability in approximation (B.3). To do so, we use characteristics of the sum random variable Sk (t). We know that Sk (t) is the sum of n independent Bernoulli random variables, in which n represents the number of applications that can be served by service type k at time t. Therefore, we analyze the problem in the general case as follows: Assume Xi , i = 1, 2, .., n are n independent Bernoulli random variables with paramPn eters (pi , qi ), and pi + qi = 1. We define random variable Y as Y = i=1 Xi . We

have:

Appendix: Over-Commitment Probability using Chernoff’s Bound

η := E[Y ] =

n X

pi ,

203

(B.4)

i=1

n X σ := V AR[Y ] = (V AR[Xi ]) 2

i=1

=

n X

p i qi

(B.5)

i=1

The Chernoff’s Bound is:

−sN

p{Y ≥ N } ≤ e

sY

−sN

E[e ] = e

n Y (qi + pi es ) , s > 0

(B.6)

i=1

After taking derivative in respect to s we find s∗ as the root of the following function:

d(s) =

n X i=1

pi − N ,s > 0 pi + qi e−s

(B.7)

Also the second derivative of the Chernoff’s bound right-hand side equation or the derivative of d(s) would be:



d (s) =

n X i=1

pi qi e−s ,s > 0 (pi + qi e−s )2

(B.8)

As it can be seen, d′ (s) is always positive and therefore d(s) is a strictly increasing function that (at most) has one root. Figure B.1 shows a sample of this function. In this Figure, we depicted d(s) for a service type with N = 900 instances and n = 1000 application instances in the system with random pi s. As expected, this function is a strictly increasing function, and (in this case) with one root. Therefore, we present a five step algorithm for finding the root. This algorithm in each step examines the cases where there is no root for this function, or there is one root which is much larger than one, or much less than one, or close to one. Now without getting further into the mathematical details, we present this algorithm for finding the optimum s∗ as follows:

204

Appendix: Over-Commitment Probability using Chernoff’s Bound

100

0

d(s)

−100

−200

−300

−400

0

2000

4000

6000 s

8000

10000

12000

Figure B.1: A sample d(s) for a service with 900 instances, and random pi s for 1000 application instances

1) if N ≥ n then s∗ = ∞, and the bound is 0, which means the system has more service instances than the number of admitted applications and the probability of overcommitment is zero. Otherwise go to the next step. 2) if N ≤ η then s∗ = 0, and the bound is 1. This means that the number of service instances is less than the mean number of admitted applications, and by using CLT we can see that the over-commitment probability is more than 0.5. Otherwise go to the next step. 3) If η < N < n, d(s) would be a strictly increasing function with only one root. therefore, if that root is a lot less than 1, (s∗ ≪ 1), we have:

s∗ =

N −η ∗ ,s ≪ 1 σ2

If the above equation achieves s∗ < 0.5 then s∗ is the answer. otherwise proceed to the next step. 4) for s∗ ≫ 1, we have:

Appendix: Over-Commitment Probability using Chernoff’s Bound



s =

205

P

pi −1 − n ∗ ,s ≫ 1 n−N

if the above equation achieves s∗ > 5 then s∗ is the answer. otherwise proceed to the next step. 5) the s∗ is in the range (0.5, 5), In this case, we can compute the root using the Newton’s method very efficiently. Our simulations show that in the most cases the above algorithm ends in the 4th step and there is no need to use the Newton’s method. However, even if it is needed, the Newton’s method can achieve a sufficiently accurate answer for our problem in less than few iterations. As we explained earlier, to compute the over-commitment probability in all future times we have to compute s∗ for all times t that is most likely for the application to be in that service. By calculating s∗ and obtaining Poc (t) we would be able to make sure that the application failure probability is less than the agreed threshold Toc at all times. The process of computation of s∗ for all t, however, in some systems can be a computation intensive task. To overcome this obstacle, we propose a practical technique for computing the root in equation (B.2). Our solution is to combine the CLT-based method and the Chernoff’s bound method. In this technique the system computes the over-commitment probability based on the mean and variance values and using the central limit theorem as described in the previous subsection. Moreover, the system keeps track of the time th that the CLT-based method gives the highest value for the over-commitment probability. If the highest CLT-based computed probability were less than 0.001, the system would compute the roots of the equation (B.2) using the above mentioned technique. Consequently, the Chernoff’s bound for that particular time th , can be computed using s∗ .

Appendix C Derivation of Gk (t) Probability Assume that there is an application that can be created by cascading m different services J N N N N Sk ... Sm . The execution times of all services are as following: S1 S2 ...Sj

independent random variables shown as Xi (i = 1, ..., m), with pdf of fi (t)(i = 1, ..., m).

We want to find the probability that at time t the application has finished the execution of all services before service k and is currently executing the service k: j j j P P P X, with pdf of fY j (t) X + Xk }, We define Yj as Yj := Gk (t) = P { X < t < 1

1

1

and cdf of FY j (t). Now we have:

Gk (t) = P {Yj < t < Yj + Xk }

=

Zt

fY j (τ )P {τ < t < Yj + Xk |τ = Yj }dτ

0

=

Zt

fY j (τ )P {t < τ + Xk }dτ

Zt

fY j (τ )(1 − Fk (t − τ ))dτ

0

=

0

206

Appendix: Derivation of Gk (t) Probability

= FY j (t) −

Zt

207

fY j (τ )Fk (t − τ )dτ

0

= FY j (t) −

Z t Zt−τ 0

fY j (τ )fk (λ)dλdτ

0

with the change of variable λ to ν − τ , we have: Z t Zt−τ 0

fY j (τ )fk (λ)dλdτ =

0

Zt Zt 0

=

Zt

fY j (τ )fk (ν − τ )dνdτ

0

(fY j (ν) ∗ fk (ν))dν = FY k (t)

0

Therefore, the probability Gk (t) is equal to:

Gk (t) = FY j (t) − FY k (t)

Appendix D Simulation Environment Description In this thesis, we have frequently used simulation techniques to evaluate performance of the proposed systems and algorithms. The simulations environment and techniques used for each of the performance evaluations have been described in the related parts of each chapter. In this appendix we would like to present an overall description of simulation environment and techniques used for the purpose of this study. The simulations in this thesis were all conducted on a 56-node computing cluster in the Network Architecture Lab in the Department of Electrical and Computer Engineering, University of Toronto. Each of these 56 computing nodes has two Xen 1.7GHz processors and two 40 GB local hard drives and 2GB of RAM. This considerable amount of processing power allowed us to easily repeat each simulation many times (> 20 per point) and use the calculated mean values of the obtained results to evaluate the performance of the proposed algorithms. We have also calculated the confidence intervals for these results and found out that since the number of trial runs are quite large the confidence intervals are very narrow. To make sure that the simulations are correct, we followed a step-by-step and modular approach. In each case, we started the simulation process by simulating simpler cases and we analyzed the extensive logs produced by the simulator to make sure the internal 208

Appendix: Simulation Environment Description

209

states and variables are correct. We also followed a modular design approach for our simulations and we tested each module in separation to increase the quality of simulations by simplifying the debugging process. We have also evaluated the correctness of the random number generators by performing statistical analysis on the generated random numbers. The input and output of each simulation is described in the performance evaluations sections in each chapter.

Bibliography [1] T. Anderson, L. Peterson, S. Shenker, and J. Turner. Overcoming the internet impasse through virtualization. Computer, 38(4):34 – 41, april 2005. [2] Zhenyu Yang, Wanmin Wu, Klara Nahrstedt, Gregorij Kurillo, and Ruzena Bajcsy. Enabling multi-party 3d tele-immersive environments with viewcast. ACM Trans. Multimedia Comput. Commun. Appl., 6(2):1–30, 2010. [3] A. Tizghadam and A. Leon-Garcia. Autonomic traffic engineering for network robustness. Selected Areas in Communications, IEEE Journal on, 28(1):39 –50, january 2010. [4] R. Farha and A. Leon-Garcia. Blueprint for an Autonomic Service Architecture. In Autonomic and Autonomous Systems, 2006. ICAS ’06. 2006 International Conference on, July 2006. [5] K.A. Abuosba and A.A. El-Sheikh. Formalizing service-oriented architectures. IT Professional, 10(4):34 –38, july-aug. 2008. [6] Virtualization. http://en.wikipedia.org/wiki/Virtualization. [7] Hadi Bannazadeh, Albert Leon-Garcia, and et. al. Virtualized Application Networking Infrastructure. In Proc. of the 6th International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, Berlin, Germany, May 2010. 210

Appendix: Simulation Environment Description

211

[8] Keith Redmond, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. Development of a Virtualized Application Networking Infrastructure Node. In Proc. of the 3rd IEEE Workshop on Enabling the Future Service-Oriented Internet, Honolulu, Hawaii, December 2009.

[9] Hadi Bannazadeh and Alberto Leon-Garcia. A Distributed Ethernet Traffic Shaping System. In Proc. of the 17th IEEE Workshop on Local and Metropolitan Area Networks (LANMAN 2010), Long Branch, NJ, May 2010.

[10] Michael Cusumano. Cloud computing and saas as new computing platforms. Communications of the ACM, 53(4):27–29, 2010.

[11] Hadi Bannazadeh and Alberto Leon-Garcia. Allocating Services to Applications using Markov Decision Processes. In proc. of IEEE Int. Conf. on Service-Oriented Computing and Applications, SOCA’07, pages 141–146, Newport Beach, California, June 2007.

[12] Hadi Bannazadeh and Alberto Leon-Garcia. Service Commitment Strategies in Allocating Services to Applications. In proc. of IEEE Int. Conf. on Service Computing, SCC’07, pages 91–97, Salt Lake City, Utah, July 2007.

[13] Hadi Bannazadeh and Alberto Leon-Garcia. A Distributed Algorithm for Service Commitment in Allocating Services to Applications. In proc. of 2nd IEEE Asia-Pacific Service Computing Conference, APSCC’07, pages 446–453, Tsukuba, Japan, Dec 2007.

[14] Hadi Bannazadeh and Alberto Leon-Garcia. Probabilistic Approach to Service Commitment in Service-Oriented Systems. In in the proc. of IEEE Congress on Services, Honolulu, Hawaii, July 2008.

Appendix: Simulation Environment Description

212

[15] Hadi Bannazadeh and Alberto Leon-Garcia. A distributed probabilistic commitment control algorithm for service-oriented systems. Network and Service Management (TNSM), to appear in the IEEE Transactions on. [16] Hadi Bannazadeh and Alberto Leon-Garcia. Online optimization in application admission control for service oriented systems. In Asia-Pacific Services Computing Conference, 2008. APSCC ’08. IEEE, pages 482–487, Yilan, Taiwan, Dec 2008. [17] Hadi Bannazadeh and Albert Leon-Garcia. On the Emergence of an ApplicationOriented Network Architecture. In proc. of IEEE Int. Conf. on Service-Oriented Computing and Applications, SOCA’07, pages 47–54, Newport Beach, California, June 2007. [18] Stephanos Androutsellis-Theotokis and Diomidis Spinellis. A survey of peer-to-peer content distribution technologies. ACM Comput. Surv., 36(4):335–371, 2004. [19] Service-Oriented Architecture. www.ibm.com/soa. [20] OASIS Reference Model for Service Oriented Architecture 1.0. http://www.oasisopen.org. [21] Francis Shanahan. Amazon.com Mashups. Wrox Press Ltd., Birmingham, UK, 2007. [22] Tim O’Reilly. What is web 2.0: Design patterns and business models for the next generation of software. Available online at http://oreilly.com/web2/archive/ what-is-web-20.html. [23] W3C Working Group Note. Web services architecture. Available online at http: //www.w3.org/TR/ws-arch/. [24] W3C. extensible markup language (xml). Available online at http://www.w3. org/XML/.

Appendix: Simulation Environment Description

213

[25] Benny Mathew Poornachandra Sarang, Matjaz Juric. Business Process Execution Language for Web Services BPEL and BPEL4WS. Packt Publishing, Birmingham, UK, 2006.

[26] Krishna Kant. Data center evolution: A tutorial on state of the art, issues, and challenges. Computer Networks, 53(17):2939 – 2965, December 2009.

[27] James Murty. Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O’Reilly Media Inc, California, 2008.

[28] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The eucalyptus open-source cloud-computing system. In Cluster Computing and the Grid, 2009. CCGRID ’09. 9th IEEE/ACM International Symposium on, pages 124 –131, Shanghai, May 2009.

[29] Guohui Wang and T. S. Eugene Ng. The impact of virtualization on network performance of amazon ec2 data center. In Proceedings of the 29th IEEE Conference on Computer Communications, INFOCOM 2010, San Diego, CA, March 2010.

[30] M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, Rong Pan, B. Prabhakar, and M. Seaman. Data center transport mechanisms: Congestion control theory and ieee standardization. In Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, pages 1270–1277, Sept. 2008.

[31] Alan B. Johnston. SIP: Understanding the Session Initiation Protocol. Artech House Publishers, 2009.

[32] ITU-T. Next generation networks global standards initiative. Available online at http://www.itu.int/ITU-T/ngn.

Appendix: Simulation Environment Description

214

[33] K. Knightson, N. Morita, and T. Towle. Ngn architecture: generic principles, functional architecture, and implementation. Communications Magazine, IEEE, 43(10):49 – 56, oct. 2005. [34] Gonzalo Camarillo and Miguel A. Garcia-Martin. The 3G IP Multimedia Subsystem (IMS). John Wiley & Sons Ltd, England, 2006. [35] TM forum.

Ipsphere forum.

Available online at http://www.tmforum.org/

ipsphere. [36] Cornelia Kappler. UMTS Networks and Beyond. John Wiley & Sons, England, 2009. [37] Pierre Lescuyer and Thierry Lucidarme. Evolved Packet System, The LTE and SAE Evolution of 3G UMTS. John Wiley & Sons, England, 2008. [38] Alasdair Allan. Learning iPhone Programming: From Xcode to App Store. O’Reilly Media, CA, USA, 2010. [39] Reto Meier. Professional Android 2 Application Development. Wiley Publishing, USA, 2010. [40] Akamai. http://www.akamai.com. [41] R.L. Xia and J.K. Muppala. A survey of bittorrent performance. Communications Surveys Tutorials, IEEE, 12(2):140 –158, second 2010. [42] Gero Mhl, Ludger Fiege, and Peter Pietzuch. Distributed Event-Based Systems. Springer, Germany, 2006. [43] P. Saint-Andre. Xmpp: lessons learned from ten years of xml messaging. Communications Magazine, IEEE, 47(4):92 –96, april 2009.

215

Appendix: Simulation Environment Description

[44] Jacob Chakareski and Pascal Frossard. Adaptive systems for improved media streaming experience. Communications Magazine, IEEE, 45(1):77 –83, jan. 2007. [45] Cisco. Cisco visual networking index: Forecast and methodology, 2009-2014. Available online at http://www.cisco.com. [46] Youtube. http://www.youtube.com. [47] Hulu. http://www.hulu.com. [48] E. Mikoczy, D. Sivchenko, Bangnan Xu, and J.I. Moreno. Iptv systems, standards and architectures: Part ii - iptv services over ims: Architecture and standardization. Communications Magazine, IEEE, 46(5):128 –135, may 2008. [49] J.S. Turner and D.E. Taylor. Diversifying the internet. In Global Telecommunications Conference, 2005. GLOBECOM ’05. IEEE, volume 2, Dec 2005. [50] Steven M. Bellovin, David D. Clark, Adrian Perrig, and Dawn Song. Clean-Slate Design for the Next-Generation Secure Internet, 2005. at

A

Available

http://sparrow.ece.cmu.edu/group/pub/bellovin_clark_perrig_song_

nextGenInternet.pdf. [51] Stanford University Clean Slate Design For Internet: An Interdisciplinary Research Program. http://cleanslate.stanford.edu. [52] 100x100 project. http://100x100network.org. [53] Srel M., Rinta aho T., and Tarkoma S. Rtfm: Publish/subscribe internetworking architecture. ICT-MobileSummit 2008 Conference Proceedings, Paul Cunningham and Miriam Cunningham (Eds), IIMC International Information Management Corporation, 2008.

Appendix: Simulation Environment Description

216

[54] Van Jacobson, Diana K. Smetters, James D. Thornton, Michael F. Plass, Nicholas H. Briggs, and Rebecca L. Braynard. Networking named content. In CoNEXT ’09: Proceedings of the 5th international conference on Emerging networking experiments and technologies, pages 1–12, New York, NY, USA, 2009. ACM. [55] GENI System Overview, September 2008. Available at http://www.geni.net. [56] GENI Control Framework Requirements, January 2009. Available at http://www. geni.net. [57] Peterson L. PlanetLab: A Blueprint for Introducing Disruptive Technology into the Internet. http://www.planet-lab.org, January 2004. [58] PlanetLab GENI Control Framework Overview, January 2009. Available at http: //www.geni.net. [59] Emulab - network emulation testbed. http://www.emulab.net. [60] Mike Hibler, Robert Ricci, Leigh Stoller, Jonathon Duerig, Shashi Guruprasad, Tim Stack, Kirk Webb, and Jay Lepreau. Large-scale Virtualization in the Emulab Network Testbed. In Proceedings of the 2008 USENIX Annual Technical Conference, pages 113–128, June 2008. [61] Open resource control architecture. http://nicl.cod.cs.duke.edu/orca/about. html. [62] P. Szegedi, S. Figuerola, M. Campanella, V. Maglaris, and C. Cervello-Pastor. With evolution for revolution: managing FEDERICA for future Internet research. Communications Magazine, IEEE, 47(7):34–39, July 2009. [63] Snehapreethi Gopinath, Shweta Jain, Shivesh Makharia, and Dipankar Raychaudhuri. An experimental study of the cache-and-forward network architecture in

Appendix: Simulation Environment Description

217

multi-hop wireless scenarios. In Proc. of the 17th IEEE Workshop on Local and Metro Area Networks (LANMAN 2010), Long Branch, NJ, May 2010. [64] E. Grasa, G. Junyent, S. Figuerola, A. Lopez, and M. Savoie. Uclpv2: a network virtualization framework built on web services [web services in telecommunications, part ii]. Communications Magazine, IEEE, 46(3):126 –134, march 2008. [65] E. Grasa et al. UCLPv2: A Network Virtualization Framework Built on Web Services. Communications Magazine, IEEE, 46(3):126–34, March 2008. [66] Matthias Nicola and Jasmi John. Xml parsing: A threat to database performance. In In proc. of 12th Intl. Conference on Information and Knowledge Management, pages 175–178, New Orleans, Louisiana, 2003. [67] D. Davis and M.P. Parashar. Latency performance of soap implementations. In Cluster Computing and the Grid, 2002. 2nd IEEE/ACM International Symposium on, New Orleans, Louisiana, may 2002. [68] Hadi Bannazadeh. Hardware-based Content Processing, May 2007. [69] F. Hartung, N. Niebert, A. Schieder, R. Rembarz, S. Schmid, and L. Eggert. Advances in network-supported media delivery in next-generation mobile systems. Communications Magazine, IEEE, 44(8):82 –89, aug. 2006. [70] D. Chappell. Theory in Practice: Enterprise Service Bus. O’Reilly Media, USA, 2004. [71] IBM. Websphere datapower soa appliances. http://www-01.ibm.com/software/ integration/datapower/. [72] Bo Li and Hao Yin. Peer-to-peer live video streaming on the internet: issues, existing approaches, and challenges [peer-to-peer multimedia streaming]. Communications Magazine, IEEE, 45(6):94 –99, june 2007.

Appendix: Simulation Environment Description

218

[73] I. Hernandez-Serrano, S. Sharma, and A. Leon-Garcia. Reliable p2p networks: Treblecast and treblecast. In Parallel Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on, pages 1 –8, 2009. [74] Cisco. Application-oriented networking. http://www.cisco.com. [75] Larry Peterson, Soner Sevinc, Jay Lepreau, Robert Ricci, John Wroclawski, Ted Faber, Stephen Schwab, and Scott Baker. Slice-based facility architecture. Available online at http://www.geni.net. [76] Marc E. Fiuczynski Herbert Ptzl. Linux-VServer, Resource Efficient OS-Level Virtualization, June 2007.

Available at http://ols.108.redhat.com/2007/

Reprints/potzl-Reprint.pdf. [77] CANARIE Inc. CANARIE: Canadian Network for the Advancement of Research, Industry and Education. http://www.canarie.ca. [78] Glen Gibb, John W. Lockwood, Jad Naous, Paul Hartke, and Nick McKeown. NetFPGA: An Open Platform for Teaching How to Build Gigabit-Rate Network Switches and Routers. Trans. on Education, 51(3):364–369, August 2008. [79] Yu Cheng, R. Farha, A. Tizghadam, Myung Sup Kim, M. Hashemi, A. Leon-Garcia, and J.W.-K. Hong. Virtual network approach to scalable ip service deployment and efficient resource management. Communications Magazine, IEEE, 43(10):76 – 84, oct. 2005. [80] C. Chang, J. Wawrzynek, and R.W. Brodersen. BEE2: a high-end reconfigurable computing system. Design and Test of Computers, IEEE, 22(2):114–125, MarchApril 2005. [81] Sun Microsystems Inc. OpenESB: The Open Enterprise Service Bus. http:// open-esb.dev.java.net.

Appendix: Simulation Environment Description

219

[82] Sun Microsystems Inc.: Java Web Start Technologies. http://java.sun.com/ javase/technologies/desktop/javawebstart. [83] Ontario Research and Innovation Optical Network (ORION). http://www.orion. on.ca. [84] IEEE 802.1ad-2005, Virtual Bridged Local Area Networks Amendment 4: Provider Bridges, 2006. Available at http://standards.ieee.org. [85] Inc VMWare. VMware: A Virtual Computing Environment. http://www.vmware. com, 2001. [86] Padala P., Zhu X., Wang Z., Singhal S., and Shin K.G. Performance Evaluation of Virtualization Technologies for Server Consolidation, 2007. Available at http: //www.hpl.hp.com/techreports/2007/HPL-2007-59R1.html. [87] Cloud Computing Definition, National Institute of Standards and Technology, Version 15, 2006.

Available at http://csrc.nist.gov/groups/SNS/

cloud-computing/index.html. [88] The Internet Engineering Task Force (IETF). Rfc3448: Tcp friendly rate control (tfrc). http://www.ietf.org/rfc/rfc3448.txt. [89] S. Biyani and J. Martin. A comparison of tcp-friendly congestion control protocols. In Computer Communications and Networks, 2004. ICCCN 2004. Proceedings. 13th International Conference on, pages 255 –260, Oct 2004. [90] IEEE 802.3x-1997, Local and Metropolitan Area Networks: Specification for 802.3 Full Duplex Operation, 1997. Available at http://standards.ieee.org. [91] IEEE 802.1au, Virtual Bridged Local Area Networks Amendment Congestion Notification. Available at www.ieee802.org/1/pages/802.1au.html.

Appendix: Simulation Environment Description

220

[92] Jinjing Jiang, R. Jain, and Chakchai So-In. An explicit rate control framework for lossless ethernet operation. In Communications, 2008. ICC ’08. IEEE International Conference on, pages 5914–5918, May 2008. [93] Gary McAlpine, Manoj Wadekar, Tanmay Gupta, Alan Crouch, and Don Newell. An architecture for congestion management in ethernet clusters. In IPDPS ’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium - Workshop 9, page 211.1, 2005. [94] Chakchai So-In, R. Jain, and Jinjing Jiang. Enhanced forward explicit congestion notification (e-fecn) scheme for datacenter ethernet networks. In Performance Evaluation of Computer and Telecommunication Systems, 2008. SPECTS 2008. International Symposium on, pages 542 –546, June 2008. [95] Linux Advanced Routing and Traffic Control. Available at http://lartc.org/. [96] M. Bichler and K-J. Lin. Service-Oriented Computing. IT Systems Perspectives, 39(3):99–101, March 2006. [97] X. Gu and K. Nahrstedt. Distributed Multimedia Service composition with statistical QoS Assurances. IEEE Transactions on Multimedia, 8(1):141–151, Feb 2006. [98] L. Zeng, B. Benatallah, A.H.H Ngu, M. Dumas, J.Kalagnanam, and H. Chang. QoS-Aware Middleware for Web Service Composition. IEEE Transactions on Software Engineering, 30(5):311–327, May 2004. [99] P. Doshi, R. Goodwin, R. Akkiraju, and K. Verma. Dynamic workflow composition using Markov decision processes. In Proc. IEEE International Conference on Web Services, pages 576–582, July 2004. [100] Thomas Phan and Wen-Syan Li. Heuristics-based scheduling of composite web service workloads. In MW4SOC ’06: Proceedings of the 1st workshop on Middleware

Appendix: Simulation Environment Description

221

for Service Oriented Computing (MW4SOC 2006), pages 30–35, New York, NY, USA, 2006. ACM. [101] K.W. Ross and D.H.K. Tsang. The stochastic knapsack problem. Communications, IEEE Transactions on, 37(7):740 –747, jul 1989. [102] D.P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Belmont, Massachusetts, third edition, 2005. [103] M.L. Puterman. Markov Decision Processes. Wiley Inter-Science, New York, 1994. [104] S.D. Moitra. Skewness and the Beta Distribution. Journal of Operation Research Society, 41(10):953–961, Oct 1990. [105] Menasc´e Daniel A., Casalicchio Emiliano, and Dubey Vinod. A heuristic approach to optimal service selection in service oriented architectures. In WOSP ’08: Proceedings of the 7th international workshop on Software and performance, pages 13–24, New York, NY, USA, 2008. ACM. [106] Danilo Ardagna and Barbara Pernici. Adaptive service composition in flexible processes. IEEE Transactions on Software Engineering, 33:369–384, 2007. [107] Valeria Cardellini, Emiliano Casalicchio, Vincenzo Grassi, Francesco Lo Presti, and Raffaela Mirandola. Qos-driven runtime adaptation of service oriented architectures. In ESEC/FSE ’09: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 131–140, New York, NY, USA, 2009. ACM. [108] Tao Yu, Yue Zhang, and Kwei-Jay Lin. Efficient algorithms for web services selection with end-to-end qos constraints. ACM Trans. Web, 1(1):6, 2007. [109] David Chappell and David Berry. Soa - ready for primetime: The next-generation, grid-enabled service-oriented architecture. SOA Magazine, September 2007.

Appendix: Simulation Environment Description

222

[110] Menasc´e Daniel A., Ruan Honglei, and Gomaa Hassan. Qos management in serviceoriented architectures. Perform. Eval., 64(7-8):646–663, 2007. [111] Markus Schmid and Reinhold Kroeger. Decentralised qos-management in service oriented architectures. In Distributed Applications and Interoperable Systems, volume 5053/2008, pages 44–57. Springer Berlin / Heidelberg, 2008. [112] S. Rosario, A. Benveniste, S. Haar, and C. Jard. Probabilistic QoS and Soft Contracts for Transaction-Based Web Services Orchestrations. IEEE Transaction on Services Computing, 1(4):187–200, October-December 2008. [113] Leyuan Shi. Approximate analysis for queueing networks with finite capacity and customer loss. European Journal of Operational Research, 85(1):178 – 191, 1995. [114] Boualem Rabta. Rapid Modelling for Increasing Competitiveness, chapter A Review of Decomposition Methods for Open Queueing Networks, pages 25–42. 2009. [115] Carolina Osorio and Michel Bierlaire. An analytic finite capacity queueing network model capturing the propagation of congestion and blocking. European Journal of Operational Research, 196(3):996 – 1007, 2009. [116] H. Kobayashi and B. Mark. System Modeling and Analysis, Foundation of System Perfromance Evaluation. Pearson Education, Inc., Upper Saddle River, New Jersey, 2009. [117] Raj Jain. The art of computer systems performance analysis : techniques for experimental design, measurement, simulation, and modeling. John Wiley & Sons, Inc., New York, NY, 1991. [118] A. Papoulis and S. U. Pillai. Probablity, Random Variables and Stochastic Processes. MacGraw-Hill, New York, 2002.

223

Appendix: Simulation Environment Description [119] Z.100,

Specification

and

Description

Language.

Available

online

at

http://www.itu.int/rec/T-REC-Z.100-200711-I/en, 2007. [120] Cheng-Yuan Ku, Din-Yuen Chan, and Lain-Chyr Hwang. Optimal reservation policy for two queues in tandem. Inf. Process. Lett., 85(1):27–30, 2003. [121] Cheng-Yuan Ku and Scott Jordan. Near optimal admission control for multiserver loss queues in series. European Journal of Operational Research, 144(1):166–178, 2003. [122] S. Balsamo, V. Nitto Persone, and R. Onvural. Analysis of Queueing Networks with Blocking. Kluwer’s International Series, 2001. [123] W. Whitt. The queueing network analyzer. The Bell System Technical Journal, 62(9):2779–2815, 1983. [124] A. Heindl. Approximate analysis of queueing networks with finite buffers and losses by decomposition. Technical Report 1998-8, 1998. [125] J.C. Strelen. Loss queueing networks with bursty arrival processes and phase type service times: Approximate analysis. In In Proceedings of the 5th IFIP Workshop on Performance Modelling and Evaluation of ATM Networks, pages 87/1–10, 1997. [126] Sushant Jain and J. MacGregor Smith.

Open finite queueing networks with

m/m/c/k parallel servers. Computers & Operations Research, 21(3):297 – 317, 1994. [127] R. Sadre, B. Haverkort, and A. Ost. An efficient and accurate decomposition method for open finite and infinite buffer queueing networks. In in proc. of the Third International Workshop on Numerical Solution of Markov Chains, page 120, 1999.

Appendix: Simulation Environment Description

224

[128] Abigail Lebrecht and William J. Knottenbelt. Response time approximations in fork-join queues. In in proceedings of 23rd Annual UK Performance Engineering Workshop (UKPEW), June 2007. [129] R. Nelson and A.N. Tantawi. Approximate analysis of fork/join synchronization in parallel queues. Computers, IEEE Transactions on, 37(6):739 –743, jun 1988. [130] Edward D. Lazowska, John Zahorjan, G. Scott Graham, and Kenneth C. Sevcik. Quantitative system performance: computer system analysis using queueing network models. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1984. [131] Majid Ghaderi and Raouf Boutaba. Call admission control in mobile cellular networks: a comprehensive survey: Research articles. Wirel. Commun. Mob. Comput., 6(1):69–93, 2006. [132] D.A. Levine, I.F. Akyldiz, and M. Naghshineh. A Resource Estimation and Call Admission Algorithm for Wireless Multimedia Networks using Shadow Cluster Concept. IEEE/ACM Transactions on Networking, 5(1):1–12, Feb 1997. [133] T. Zhang, E. van den Berg, J. Chennikara, P. Agrawal, Jyh-Cheng Chen, and T. Kodama. Local predictive resource reservation for handoff in multimedia wireless ip networks. Selected Areas in Communications, IEEE Journal on, 19(10):1931– 1941, Oct 2001. [134] Ti-Yen Yen and Wayne Wolf. Performance estimation for real-time distributed embedded systems. IEEE Trans. Parallel Distrib. Syst., 9(11):1125–1136, 1998. [135] Lei Ju, Abhik Roychoudhury, and Samarjit Chakraborty. Schedulability analysis of msc-based system models. In RTAS ’08: Proceedings of the 2008 IEEE Real-Time and Embedded Technology and Applications Symposium, pages 215–224, Washington, DC, USA, 2008. IEEE Computer Society.

225

Appendix: Simulation Environment Description

[136] Firat Kart, Louise E. Moser, and P. Michael Melliar-Smith. Building a distributed e-healthcare system using soa. IT Professional, 10(2):24–30, 2008. [137] Sorin Manolache, Petru Eles, and Zebo Peng. Schedulability analysis of applications with stochastic task execution times. Trans. on Embedded Computing Sys., 3(4):706–735, 2004. [138] Sorin Manolache, Petru Eles, and Zebo Peng. Schedulability analysis of multiprocessor real-time applications with stochastic task execution times. In ICCAD ’02: Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, pages 699–706, New York, NY, USA, 2002. ACM. [139] Sheldon M. Ross. Stochastic Processes. John Wiley & Sons, 1996. [140] P.V. Hentenryck and R. Bent. Online Stochastic Combinatorial Optimization. The MIT Press, Cambridge, Massachusetts, 2006. [141] Martin Bichler and Thomas Setzer. Admission control for media on demand services. Service Oriented Computing and Applications, 1(1):65–73, Apr 2007. [142] Mixed

Integer

Linear

Programming

MILP

solver

lp solve.

http://sourceforge.net/projects/lpsolve. [143] M. Shaked and J.G. Shanthikumar. Stochastic Orders and There Applications. Academic Press, Boston, Massachusetts, 1994. [144] Richard E. Barlow, Frank Proschan, and Larry C. Hunter. Mathematical Theory of Reliability. SIAM, New York, NY, 1996. [145] Alberto Leon-Garcia. Probability, Statistics, and Random Processes For Electrical Engineering. Addison-Wessley, New York, 2008. [146] Joseph Y. Hui. Switching and traffic theory for integrated broadband networks. Kluwer Academic Publishers, Massachusetts, 1990.

Glossary ABS Already Being Served. AON Application-Oriented Network. AOR Application-Oriented Router. BEE2 Berkeley Emulation Engine 2. BIP Binary Integer Programming. BPEL Business Process Execution Language. CAC Call Admission Control. CDN Content Distribution (Delivery) Network. CEP Complex Event Processing. CLT Central Limit Theorem. CP Complete Partitioning. CS Complete Sharing. DASC Distributed Algorithm for Service Commitment. DETS Distributed Ethernet Traffic Shaping. DFR Decreasing Failure Rate. EC2 Amazon Elastic Cloud Computing. ESB Enterprise Service Bus. FCP Full Commitment Policy. FCQN Finite Capacity Queuing Network. 226

Glossary FECN Forward Explicit Congestion Network. FPGA Field-Programmable Gate Array. GENI Global Environment for Network Innovations. GPU Graphics Processing Unit. GUI Graphical User Interface. HTTP Hypertext Transfer Protocol. IFR Increasing Failure Rate. IMS IP Multimedia Subsystem. IP Internet Protocol. JMS Java Message Service. LP Linear Programming. MDP Markov Decision Processes. NCP No Commitment Policy. NGN Next Generation Network. PCP Partial Commitment Policy. Q-DASC Queue-enabled Distributed Algorithm for Service Commitment. RAA Rate Allocation Algorithm. RAA-FE Rate Allocation Algorithm-Forward Explicit. RAA-FP Rate Allocation Algorithm-Fast Probe. RAA-FS Rate Allocation Algorithm-Fair Share. RAA-SP Rate Allocation Algorithm-Slow Probe. SDL Specification and Description Language. SIP Session Initiation Protocol. SNMP Physical Node. SNMP Simple Network Management Protocol.

227

Glossary SNMP Virtual Node. SOA Service-Oriented Architecture. SSL Secure Socket Layer. SSS Service Signaling Stratum. TES Time to Enter Service. TLS Transport Layer Security. UCLP User Controlled Light Path. UUID Universally Unique IDentifier. VANI Virtualized Application Networking Infrastructure. VANI-AP VANI Application Plane. VANI-CMP VANI Control and Management Plane. VLAN Virtual Local Area Network. WS Web Service. WSDL Web Service Description Language. XML Extensible Markup Language. XMPP Extensible Messaging and Presence Protocol.

228