bandwidth level, or maximum delay, i.e., resource reservation requisites. ... Keywords: Aggregation, Signaling, Control, Inter-domain, Reservations, BGRP.
SICAP, a Shared-segment Inter-domain Control Aggregation Protocol Rute Sofia DI–FCUL
TR–04–22
Março de 2004
Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749–016 Lisboa Portugal
Technical reports are available at http://www.di.fc.ul.pt/tech-reports. The files are stored in PDF, with the report number as filename. Alternatively, reports are available by post from the above address.
SICAP, a Shared-segment Inter-domain Control Aggregation Protocol Helena Rute Esteves Carvalho Sofia Dissertação submetida para obtenção do grau de DOUTOR EM INFORMÁTICA
Orientador: Pedro Manuel Barbosa Veiga Co-Orientador: Júri:
Roch Guérin Augusto Júlio Domingues Casaca Paulo da Fonseca Pinto Paulo Jorge Esteves Veríssimo Luis Eduardo Teixeira Rodrigues Luis Miguel Parreira e Correia
Outubro de 2003
SICAP, a Shared-segment Inter-domain Control Aggregation Protocol
Helena Rute Esteves Carvalho Sofia
Dissertação submetida para obtenção do grau de DOUTOR EM INFORMÁTICA
pela
Faculdade de Ciências da Universidade de Lisboa
Departamento de Informática
Orientador: Pedro Manuel Barbosa Veiga Co-Orientador: Júri:
Roch Guérin Augusto Júlio Domingues Casaca Paulo da Fonseca Pinto Paulo Jorge Esteves Veríssimo Luis Eduardo Teixeira Rodrigues Luis Miguel Parreira e Correia
Outubro de 2003
“Para ser grande, sê inteiro: nada Teu exagera ou exclui. Sê todo em cada coisa. Põe quanto és no mínimo que fazes. Assim em cada lago a lua toda Brilha, porque alta vive.” Ricardo Reis (Fernando Pessoa’s persona) , In “Para ser grande, sê sincero: nada, Odes de Ricardo Reis”. (Portuguese poet, 1888-1935)
“We are as elastic as the gas of gunpowder, and a sentence in a book, or a word dropped in conversation, sets free our fancy, and instantly our heads are bathed with galaxies, and our feet tread the floor of the Pit. And this benefit is real because we are entitled to these enlargements, and once having passed the bounds shall never again be quite the miserable pedants we were.” Ralph W. Emerson, In “Representative Men”. (American essayist and philosopher, 1803-1882)
Abstract Most Internet services require some form of differentiation, mainly because users rightly demand guarantees about the services they are subscribing to. Hence, these services usually rely on customer-provider agreements describing end-to-end Quality of Service requirements such as bandwidth level, or maximum delay, i.e., resource reservation requisites. To function properly, such agreements have to be enforced end-to-end, meaning that each router along the path has to keep information to manage the requested reservations. Current RSVP broad deployment is proof positive that a resource reservation protocol is necessary and useful to manage endto-end resources. However, RSVP has severe scalability problems, which have already been investigated in RSVP enhanced versions. Still, new versions also fail when it comes to endto-end scalability, and there is not a feasible alternative to RSVP. The scalability problem is mostly a consequence of the possible high reservation volumes that links between different Autonomous Systems may experience, and one way of dealing with this is to treat data based on aggregate reservations, since aggregation diminishes the state and signaling required at routers. In this dissertation, we analyse issues related to reservation aggregation. We introduce a novel aggregation protocol, SICAP, which performs shared-segment aggregation. We compare SICAP against the only other existing alternative, BGRP, which performs sink-tree aggregation, in terms of state scalability, signaling load, and bandwidth efficiency.
Keywords: Aggregation, Signaling, Control, Inter-domain, Reservations, BGRP.
i
ii
Resumo A maioria dos serviços disponibilizados na Internet necessita de algum tipo de diferenciação, principalmente devido ao facto dos utilizadores exigirem, justamente, garantias de utilização dos serviços que subscrevem. Devido a tal, estes serviços dependem de acordos entre clientes e fornecedores, que basicamente descrevem requisitos de Qualidade de Serviço, tais como largura de banda, ou atraso máximo, i.e., requisitos que descrevem reservas de recursos. Tais acordos têm de ser assegurados extremo-a-extremo, o que significa que, ao longo do caminho percorrido, cada encaminhador tem de manter informação (estado) sobre as reservas desejadas, para que se consiga uma gestão de recursos adequada. A utilização generalizada do protocolo RSVP é a prova concreta de que um protocolo de reserva de recursos é necessário e útil para obter uma gestão dinâmica de recursos extremo-a-extremo. No entanto, o RSVP apresenta graves problemas de escalabilidade, abordados já em versões melhoradas deste protocolo, as quais não resolvem a falta de escalabilidade extremo-a-extremo, e não existe outro protocolo que possa funcionar como uma alternativa viável ao RSVP. Os problemas de escalabilidade devem-se principalmente aos possíveis volumes elevados de reservas que as ligações entre diferentes Sistemas Autónomos possam sentir, sendo uma possível solução para o problema de escalabilidade o tratamento de dados na forma de reservas agregadas, já que agregação permite diminuir o estado e o número de mensagens de sinalização em cada encaminhador. Nesta dissertação, analisamos conceitos relacionados com agregação de controlo. Apresentamos um novo protocolo, SICAP, que efectua agregação shared-segment. Comparamos este protocolo com a única alternativa existente, o protocolo BGRP, que efectua agregação sink-tree, a nível da quantidade de informação necessária, carga de sinalização e eficiência da utilização de largura de banda.
Palavras-Chave:
Agregação, sinalização, controlo, entre-domínios, reservas,
BGRP.
iii
iv
Contents Abstract
viii
Resumo
viii
Abbreviations
viii
List of Figures
xi
List of Tables
xii
1 Introduction 1.1
1.2
1
Quality of Service Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.1.1
The Data Plane: Traffic Handling . . . . . . . . . . . . . . . . . . . .
5
1.1.2
The Control Plane: Provisioning and Managing Resources . . . . . . .
6
Dissertation Goals and Outline . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2 Aggregation
9
2.1
Terminology and Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2.2
Aggregation Approaches and Algorithms . . . . . . . . . . . . . . . . . . . .
13
2.2.1
The Sink-Tree Approach . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.2.2
The Shared-Segment Approach . . . . . . . . . . . . . . . . . . . . .
15
2.2.2.1
WDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.2.2
MLWDS . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17 v
vi
2.3
2.4
Evaluation of the Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3.1
Source Hotspot Scenario . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.3.2
Destination Hotspot Scenario . . . . . . . . . . . . . . . . . . . . . .
22
2.3.3
Homogeneous Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
Chapter Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3 SICAP
29
3.1
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
3.2
BGRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.3
SICAP Design and Operation . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.3.1
Choosing Where to Deaggregate . . . . . . . . . . . . . . . . . . . . .
34
3.3.2
Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.3.2.1
REQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.3.2.2
RESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.3.2.3
ERROR . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.3.2.4
TEAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.3.2.5
REFRESH . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
SICAP Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.3.3.1
End-to-End Reservation Establishment . . . . . . . . . . . .
38
3.3.3.2
Reservation Deletion . . . . . . . . . . . . . . . . . . . . .
40
3.3.3.3
Reservation Failure . . . . . . . . . . . . . . . . . . . . . .
41
State Management at Intermediate Deaggregation Locations . . . . . .
41
3.4
SICAP and BGRP Comparison . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.5
Dealing with Bi-Directional Reservations . . . . . . . . . . . . . . . . . . . .
45
3.6
Chapter Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
3.3.3
3.3.4
vii
4 Over-reservation
47
4.1
Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.2
Over-Reserving to Reduce the Signaling Load . . . . . . . . . . . . . . . . . .
48
4.3
Over-Reservation Mechanism Design . . . . . . . . . . . . . . . . . . . . . .
51
4.3.1
Resource Distribution . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.3.1.1
Choosing How Much to Ask For, bS (x) . . . . . . . . . . . .
53
Resource Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.3.2.1
When to Release Resources, r(x) . . . . . . . . . . . . . . .
54
4.3.2.2
Choosing How Much to Release, bf (x) . . . . . . . . . . . .
55
Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.4.1
Parameters and Format of Results . . . . . . . . . . . . . . . . . . . .
55
4.4.2
Over-Reserving, No Aggregate Demand History . . . . . . . . . . . .
57
4.4.2.1
Long-lived Requests . . . . . . . . . . . . . . . . . . . . . .
58
4.4.2.2
Short-lived Requests . . . . . . . . . . . . . . . . . . . . . .
61
Over-Reserving with Aggregate Demand History . . . . . . . . . . . .
64
4.4.3.1
Estimating the Aggregate Demand . . . . . . . . . . . . . .
64
4.4.3.2
Long-Lived Requests . . . . . . . . . . . . . . . . . . . . .
65
4.4.3.3
Short-Lived Requests . . . . . . . . . . . . . . . . . . . . .
65
4.3.2
4.4
4.4.3
4.5
4.4.4
Delaying the Release of Resources
. . . . . . . . . . . . . . . . . . .
68
4.4.5
A Hybrid Approach: Over-reserving and Delaying the Release . . . . .
70
4.4.6
Approaches’ Comparison . . . . . . . . . . . . . . . . . . . . . . . . .
70
Enhancements and Implementation issues . . . . . . . . . . . . . . . . . . . .
72
4.5.1
Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . . . .
72
4.5.1.1
Route Aggregation and Overlapping Prefixes . . . . . . . . .
72
4.5.1.2
Support for Over-Reservation in Messages . . . . . . . . . .
74
4.5.1.3
State Management at the Last Deaggregator and Intradomain Issues . . . . . . . . . . . . . . . . . . . . . . . . .
74
SICAP Specific Issues . . . . . . . . . . . . . . . . . . . .
74
4.5.1.4
viii
4.5.2
4.6
Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
4.5.2.1
Negotiating How Much More to Ask . . . . . . . . . . . . .
76
4.5.2.2
Over-Reserved Resources Refresh Mechanism . . . . . . . .
78
Chapter Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
5 Summary and Future Work
81
5.1
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
5.2
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
5.3
Future Work Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
5.3.1
Choosing Where to Deaggregate . . . . . . . . . . . . . . . . . . . . .
82
5.3.2
Over-reserving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
5.3.3
Further Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
Bibliography
85
Abbreviations AF Assured Forwarding AS Autonomous System BE Best Effort BGP Border Gateway Protocol BGRP Border Gateway Reservation Protocol BR Boundary Router CL Controlled Load Service DiffServ Differentiated Services DSL Digital Subscriber Line DV Distance Vector E2E End-to-End EF Expedited Forwarding EGP External Gateway Protocol EMA Exponential Moving Average GS Guaranteed Service IDL Intermediate Deaggregation Location IETF Internet Engineering Task Force IGP Internal Gateway Protocol IntServ Integrated Services IP Internet Protocol ISP Internet Service Provider ix
x
IS-IS OSI IS-IS Protocol ISSLL Integrated Services over Specific Link Layers LAN Local Area Networks LS Link-State LSP Link State Packet MLWDS Multi-Level Weighted Deaggregation PointS NSIS Next Steps in Signaling OSPF Open Shortest Path First PHB Per Hop Behaviour PSTN Public Switched Telephone Networks QoS Quality of Service RSVP Resource reSerVation Protocol SICAP Shared-segment Inter-domain Control Aggregation Protocol SLA Service Level Agreement ST-II Internet Streaming Protocol version II TCP Transmission Control Protocol VoIP Voice over IP WLAN Wireless Local Area Networks WDS Weighted Deaggregation PointS WDS-E Weighted Deaggregation PointS Enhanced YESSIR Yet anothEr Sender Session Internet Reservation protocol
List of Figures 2.1
Role of BRs within an AS. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.2
Classification of Aggregates . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.3
Placement of aggregation agents within an AS . . . . . . . . . . . . . . . . . .
10
2.4
Aggregation agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.5
Merging Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.6
State accounting at a BR . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.7
Inter-Domain aggregation model. . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.8
Example of state accounting for the sink-tree approach. . . . . . . . . . . . . .
14
2.9
Example of state accounting for the shared-segment approach. . . . . . . . . .
15
2.10 MLWDS example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.11 Hotspot experiments’ topology. . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.12 Source hotspot scenario state evolution. . . . . . . . . . . . . . . . . . . . . .
22
2.13 Destination hotspot scenario state evolution. . . . . . . . . . . . . . . . . . . .
23
2.14 Homogeneous traffic experiment topology. . . . . . . . . . . . . . . . . . . . .
24
2.15 Homogeneous traffic scenario state evolution. . . . . . . . . . . . . . . . . . .
25
3.1
BGRP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.2
BGRP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.3
BGRP example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.4
WDS-E example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.5
SICAP’s common header format. . . . . . . . . . . . . . . . . . . . . . . . . .
36 xi
LIST OF FIGURES
xii
3.6
REQ message. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.7
RESV format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
3.8
SICAP scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.9
Reservation R1 establishment. . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.10 Reservation R2 establishment. . . . . . . . . . . . . . . . . . . . . . . . . . .
40
3.11 Reservation deletion messaging sequence. . . . . . . . . . . . . . . . . . . . .
40
3.12 Reservation failure messaging sequence. . . . . . . . . . . . . . . . . . . . . .
41
3.13 Internet-like topology used in the simulations. . . . . . . . . . . . . . . . . . .
43
3.14 State evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.1
Over-reservation example. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.2
Resource distribution algorithm. . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.3
Resource release algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.4
Topology used in the simulations. . . . . . . . . . . . . . . . . . . . . . . . .
57
4.5
IP prefix overlapping examples. . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.6
Route Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
4.7
Avoiding inconsistent choices. . . . . . . . . . . . . . . . . . . . . . . . . . .
77
4.8
bS (x) negotiation example. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
4.9
bS (x) negotiation example, allocating a value between bi and bi +
bSi (x). . .
78
4.10 Over-reserved resources timeout release algorithm. . . . . . . . . . . . . . . .
79
P
List of Tables 2.1
Global state for the sink-tree approach . . . . . . . . . . . . . . . . . . . . . .
15
2.2
Global state for the shared-segment approach . . . . . . . . . . . . . . . . . .
16
2.3
WDS state per AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.4
MLWDS state per AS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.5
Source hotspot scenario state, 5000 requests . . . . . . . . . . . . . . . . . . .
21
2.6
Destination hotspot scenario state, 5000 requests . . . . . . . . . . . . . . . .
23
2.7
Homogeneous traffic scenario state, 5000 requests . . . . . . . . . . . . . . . .
25
3.1
State, intensity of 5000 requests . . . . . . . . . . . . . . . . . . . . . . . . .
44
4.1
Over-reservation notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.2
Long-lived requests, bS (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
4.3
Long-lived requests, quantisation with Q = 2 . . . . . . . . . . . . . . . . . .
59
4.4
Long-lived requests, quantisation with Q = 4 . . . . . . . . . . . . . . . . . .
60
4.5
Long-lived requests, SICAP . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
4.6
Long-lived requests, BGRP . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
4.7
Short-lived requests, bS (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
4.8
Short-lived requests, quantisation with Q = 2 . . . . . . . . . . . . . . . . . .
62
4.9
Short-lived requests, SICAP . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.10 Short-lived requests, BGRP . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
4.11 Long-lived requests, relying on aggregate demand history . . . . . . . . . . . .
66
4.12 Short-lived requests, relying on aggregate demand history . . . . . . . . . . . .
67 xiii
xiv
LIST OF TABLES
4.13 Delayed release only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
4.14 Over-reservation and Delayed Release . . . . . . . . . . . . . . . . . . . . . .
70
4.15 SICAP global results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
4.16 BGRP global results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
4.17 Best Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
Acknowledgements This dissertation is the fruit of four years of work, three of which have been spent abroad. The years abroad have been a truly enriching opportunity, both during the first six months at the International Center for Advanced Research (ICAIR) in Evanston, EUA, and later, during the two and a half years stay at the University of Pennsylvania in Philadelphia, EUA. All this would not have been possible without the support of my long-time advisor, Prof. Pedro Veiga. I am deeply grateful to him for allowing and providing the means that made this long stay abroad possible, and for always encouraging me to attend and participate in research events. I would also like to thank Prof. Roch Guérin, who was my advisor and mentor at the University of Pennsylvania, and who is responsible for the work achieved with this dissertation. I am deeply and forever indebted to Prof. Guérin for all his support, and I feel it has been a privilege to work with him. I thank him for each of the long and spirited weekly meetings he always managed to fit into his really tight schedule, for his enduring patience both with my minor and not so minor mistakes, and for the long list of e-mails and phone calls exchanged to clarify issues related to our work, as well as for the advices given. This dissertation would not have been possible without his commitment, and scientific clairvoyance. I also want to thank my friends (sorry, can’t fit you all in here!) for still being there, after these self-confinement years; brothers Jorge and Hugo, and my parents, Jorge and Helena, for the innumerable e-mails and chats exchanged while I was abroad. But a special thanks goes indeed to my parents, who always taught us to pursue our dreams, and who have always given us the vote of confidence in our capability to rise to the occasion, to always take a step further. Last, but not least, I’d like to thank Paulo for all his support and patience throughout all these years. This experience would not have been the same, were you not around. It is a great joy to share my life with you. This work was partially supported by funding from the POSI (Programa Operacional Sociedade da Informação) programme of FCT (Fundação para a Ciência e Tecnologia) within the context of the Third European Community Support Frame, scholarship reference PRAXIS XXI/BD/18246/98.
xv
1 Introduction The Internet has matured during three decades now, and from an initial infrastructure aimed at research mostly, it grew to become a heterogeneous infrastructure, that not only has a very appealing commercial facet, but is also the de facto communication media of the twenty first century. Its commercial potential, added to the exponential growth of the development of current technologies, created the need for multimedia and other interactive services that Internet Service Providers (ISPs), and end-users, can profit from. The different networks that form the Internet use a broad range of technologies: Public Switched Telephone Networks (PSTN), Local Area Networks (LAN), Digital Subscriber Line (DSL), Cable, or even cellular technologies are now common amongst the technologies used to deploy networks on the Internet. Wireless Local Area Networkss (WLANs) [17], which are also rapidly becoming a significant part of the Internet infrastructure, brought benefits that gave rise to new niche markets: commercial places, airports, or even public gardens are examples of locations that now take advantage of the benefits of mobile networking. All this, and the addressing of broadband technologies as a necessary, and not a luxury commodity, helped in making the Internet available to vast geographical locations and consequently, to an even wider and diversified number of users, as shown in Fig. 11 . The increasing number of Internet users strengthens the deployment of new services: Arbitron/Edison Media Research presented in 2003 a report [5]2 showing that 44% of USA citizens over age 12 have already used Internet broadcasts, either video or voice. The report also shows that 77% of the USA population now accesses the Internet from the most varied locations. A worldwide perspective is provided by GlobalReach [13] which shows how e-commerce has been evolving since 2000. We present that information in Tabs. 1 and 1. GlobalReach’s source, Forrester Research [10], predicts that online commerce will reach $6.8 trillion by 2004. These facts corroborate that the Internet has been gaining ground both as an interactive media, and as a serious commercial infrastructure. However, its heterogeneity and almost self1 The chart is available at http://www.global-reach.biz/globstats/index.php3. This data was provided by GlobalReach, and their source was the Joshua Project, available at http://www.joshuaproject.net/. 2 the universe consisted of 2005 USA citizens, aged 12 and over. Interviews were conducted by phone from January 6th to 16th, 2003.
1
CHAPTER 1. INTRODUCTION
2
Online population, grouped into spoken language.
Worldwide eCommerce growth 2000
2001
2002
2003
2004
% sales in 2004
Total ($ B)
$657.0
$1,233.6
$2,231.2
$3,979.7
$6,789.8
8.6%
North America
$509.3
$908.6
$1,498.2
$2,339.0
$3,456.4
12.8%
United States
$488.7
$864.1
$1,411.3
$2,187.2
$3,189.0
13.3%
Canada
$17.4
$38.0
$68.0
$109.6
$160.3
9.2%
Mexico
$3.2
$6.6
$15.9
$42.3
$107.0
8.4%
Asia Pacific
$53.7
$117.2
$286.6
$724.2
$1,649.8
8.0%
Japan
$31.9
$64.4
$146.8
$363.6
$880.3
8.4%
Australia
$5.6
$14.0
$36.9
$96.7
$207.6
16.4%
Korea
$5.6
$14.1
$39.3
$100.5
$205.7
16.4%
Western Europe
$87.4
$194.8
$422.1
$853.3
$1,533.2
6.0%
Germany
$20.6
$46.4
$102.0
$211.1
$386.5
6.5%
United Kingdom
$17.2
$38.5
$83.2
$165.6
$288.8
7.1%
France
$9.9
$22.1
$49.1
$104.8
$206.4
5.0%
Italy
$7.2
$15.6
$33.8
$71.4
$142.4
4.3%
Netherlands
$6.5
$14.4
$30.7
$59.5
$98.3
9.2%
Latin America
$3.6
$6.8
$13.7
$31.8
$81.8
2.4%
Worldwide eCommerce growth per region Region
%
North America
50.9%
Asia/Pacific
24.3%
Europe
22.6%
Latin America
1.2%
1.1. QUALITY OF SERVICE ISSUES
3
regulatory nature makes it difficult to control the guarantees required to deploy multimedia services with robustness, scalability, and security. Adding to this, the Internet Protocol (IP) [16], basis for the communication on the Internet, was not designed having in mind new challenges: IP was built without the ability to provision networks with service differentiation, that would guarantee to end-users a required level of quality. Several ISPs deal with this problem by over-provisioning the links, i.e., adding resources on the links so that communication can be processed without failure. However, this technique is only a partial solution to the global problem: even though it can be effective [44] it requires, as a rule-of-thumb, link bandwidth increases as soon as the average link load exceeds some percentage [48] of the link bandwidth, e.g. 40%. Also, with network technologies becoming even more diversified, it is hard to over-provision all the segments on a path. Another problem with over-provisioning is its cost: the update of resources and equipment has to be performed continuously, in order to prevent failures. There is also the need for service differentiation not only in terms of resources and network guarantees, but also in terms of the willingness to pay for a particular service. Therefore, over-provisioning alone is not enough to provide the Internet with the necessary guarantees to support multimedia services. Related to the provisioning of such needs, and the capability of networks to deal with them, Quality of Service (QoS) [29] on the Internet has been one of the most important research topics throughout the past two decades. We next discuss QoS issues.
1.1 Quality of Service Issues Internet QoS can be defined as the provisioning of service differentiation, and of performance assurance in terms of bandwidth efficiency, packet loss, delay, and jitter. QoS is focused on efficient resource allocation, and performance optimization. QoS technologies include not only the mechanisms to handle traffic, i.e., data plane mechanisms, but also provisioning and configuration mechanisms, i.e., control plane mechanisms. Such set of mechanisms is required to deploy End-to-End (E2E) QoS on the Internet, i.e., desired network guarantees for end-users. These guarantees comprise the requisites that all components of an E2E path have to fulfill, namely, QoS specifications such as delay or bandwidth level, ways of guaranteeing and enforcing the requirements made and also, ways of keeping information about the active flows. But, due to the current heterogeneity of the Internet, an E2E path usually crosses different routing domains: in terms of routing infrastructure, the Internet is divided into routing domains named Autonomous Systems (ASs) [20], i.e., a set of networks under a common administrative authority, and regulated by a specific set of administrative guidelines. Inside ASs, core routers exchange routing information among themselves. At AS boundaries, Boundary Routers (BRs) exchange information both with the AS core routers, and with BRs from neighboring ASs. This router functionality division is mostly due to the fact that ASs have
4
CHAPTER 1. INTRODUCTION
exclusive traffic forwarding policies: inside an AS, those policies are well known. However, only some of those policies are exchanged with neighboring ASs, according to pre-established agreements. The exchange of such policies is carried out by routing protocols: within ASs, information is exchanged by an Internal Gateway Protocol (IGP), e.g., Open Shortest Path First (OSPF) [22], OSI IS-IS Protocol (IS-IS) [7, 34], while between ASs, information is exchanged by an External Gateway Protocol (EGP). Currently, the only available EGP is Border Gateway Protocol (BGP), which is a Distance Vector (DV) protocol. DV algorithms require that each router keeps the distance between itself and each possible destination. This information (distance vectors) is computed from information learned from neighboring routers, and comprises reachability information about different locations, according to the established agreements.
The described Internet hierarchy has a huge impact on E2E QoS: to deploy QoS in a scalable and transparent way, one has to be capable of managing intra-domain (or intra-AS) and inter-domain (or inter-AS) resources, following the so called “End-to-End principle” [19], introduced by Saltzer et al. This principle characterizes the design of the Internet, and of any of its protocols. The initial idea behind this design principle was that to allocate “intelligence” to large scale networks, complexity should be pushed to their edges. At the application level, this means that applications should have the intelligence and consequent complexity, while the network should remain simple. However, this principle also supposed that every component of a network was trustworthy, i.e., that all endpoints would be willing to cooperate. In terms of QoS and due to the different policies being applied by different ISPs, or to some end-users QoS requirements such as anonymity, it is not possible to apply this principle directly to the design of new protocols. But, given the current Internet hierarchy and the fact that inside an ISP elements can be considered trustworthy, one way of dealing with end-users requirements is to group QoS issues into intra-AS and inter-AS. This allows any QoS architecture to push the complex operations to the network boundaries, and yet, still hold to the E2E argument.
In an effort to address such complex issues, during the past two decades, QoS research has given rise to several [3] technologies, and architectures. Among them, two architectures stood out as possible standard candidates to the deployment ofE2E service differentiation: the Integrated Services (IntServ) [32], and the Differentiated Services (DiffServ) [43] models, born as fruit of the work developed by the corresponding working groups of the Internet Engineering Task Force (IETF) [15]. We next present these two architectures, along with issues related to the QoS data plane.
1.1. QUALITY OF SERVICE ISSUES
5
1.1.1 The Data Plane: Traffic Handling IntServ was proposed in 1994, with the goal to provide QoS guarantees per flow 3 to applications. Service differentiation is given in the form of two basic services, the Guaranteed Service (GS) [46], and the Controlled Load Service (CL) [24], which add to the regular service provided by IP, the Best Effort (BE) service: by default, IP networks deliver packets to their destination without any guarantees, and without any special allocated resources. GS has as main goals to assure a certain delay and bandwidth level, as well as no queuing losses to flows. CL is similar to the BE service, for an unloaded network, i.e., it assures low losses. Adding to the services provided, IntServ is composed of several units which have different roles, depending on their placement in the network: at the network entrance, routers perform policing and shaping; in the core, switches perform classification and scheduling; signaling between hosts and routers is performed per flow, by the Resource reSerVation Protocol (RSVP) [33]. This model provides the benefit of dynamic resource reservation with a fine granularity, but it has drawbacks that affect its scalability, namely, it requires each element on the network to be IntServ enabled and also, it requires routers to keep state per flow, hence attaining a high cost in terms of state information, and signaling load. DiffServ was proposed in 1998 as a possible solution to the scalability problems of IntServ. To avoid the scalability issue brought by providing quality guarantees per flow, DiffServ provides QoS guarantees per traffic class, i.e., per aggregate. The treatment of data based on aggregates diminishes the state and signaling required at routers. DiffServ introduced another benefit: it pushed the complex operations to the network boundary, hence relieving its interior, the core: at boundaries, traffic is marked either in-profile or out-of-profile, according to agreements established between the network and the users, i.e., Service Level Agreements (SLAs). At the core, routers simply differentiate and forward traffic according to specific service classes, named Per Hop Behaviours (PHBs) in DiffServ terminology, and the previous marking. Similarly to IntServ, DiffServ enhances IP networks with two more services, the Assured Forwarding (AF) [21], and the Expedited Forwarding (EF) [49] PHBs. EF stands for a service equivalent to a virtual leased line, i.e., it provides low loss, latency, and jitter guarantees, within a DiffServ defined domain. AF stands for a flexible service framework, able to provide service differentiation. It defines four default service classes, each of which has three different drop precedences, that allows to discard traffic according to different requisites. AF assures that in-profile packets will be delivered with high certainty. As mentioned, these two models represent strong candidates for the standard Internet QoS architecture, since they provide solutions for issues related to the forwarding and the granularity of the service differentiation provided, i.e., data plane issues. Therefore, the QoS data plane is 3 We
consider the 5-tuple definition of a flow, i.e., a flow is identified by source IP address and port, destination IP address and port, and protocol number.
6
CHAPTER 1. INTRODUCTION
now well understood, and there are several solutions that address its issues. However, the same type of understanding is needed in terms of control plane issues, i.e., in terms of mechanisms for configuring, reserving and maintaining the necessary data plane resources, as we explain next. 1.1.2 The Control Plane: Provisioning and Managing Resources The two architectures described in the previous section take care of possible issues related to the forwarding of traffic, and to the cost derived from such treatment. However, despite those efforts, in order to deploy E2E QoS there is the need for a mechanism capable of establishing, controlling, and managing resources on the path, i.e., a control plane mechanism. Control plane issues are addressed within IntServ by RSVP. As for DiffServ, provisioning is provided by SLAs. Because a SLA represents a provider-customer agreement, DiffServ elements have to be configured according to the SLA requisites, in a static way. This type of top-down [53] provisioning approach works well for services without high-quality guarantees. However, if there are multimedia applications, or if there are strong variations of traffic volume, DiffServ per se will not be able to provide the network with the required guarantees, unless the network is adequately over-provisioned. To address this problem, the use of RSVP over DiffServ is being investigated in the context of the IETF Integrated Services over Specific Link Layers (ISSLL) [14] working group. Still within the DiffServ context, the Yet anothEr Sender Session Internet Reservation protocol (YESSIR) [31] represents a viable alternative to RSVP. All of the above add up to a lack of a complete solution capable of defining end-to-end QoS mechanisms, and as a consequence, represent an impairment to the deployment of a standardbased end-to-end QoS solution for the Internet. Recognizing this problem, the IETF working group Next Steps in Signaling (NSIS) [27] was created to address E2E signaling concerns and possible deployment in the Internet. The current NSIS charter is considering RSVP as a possible starting point for the design of a general signaling framework, given the benefits of this protocol, and its generalized deployment state. While RSVP or YESSIR can be used within an AS to provide resource reservation, it is the signaling between different ASs that needs further investigation, first due to the high volumes of traffic that inter-AS links experience and second, due to the different policies that each different region implements. An inter-AS signaling solution has, therefore, to be able to provide resource reservation establishment and management in a transparent way, without adding cost to the data plane solution used. Similarly to what is performed in the data plane with DiffServ, state aggregation is an option that can be used on the control plane to reduce the information kept in each router along a path: instead of keeping state per individual reservation, routers keep only state per group of reservations, i.e., per aggregate. We next address issues related to state aggregation.
1.2. DISSERTATION GOALS AND OUTLINE
7
Aggregation, Achieving Control Scalability There are several proposals related to the control plane [35], [9] behavior, but they focus only on procedures within a region, and thus, do not represent E2E options. Specifically when thinking of control plane aggregation, the granularity and the scope chosen to perform aggregation affects the scalability of any solution: thinking of aggregation only inside a region results in possibly having to deaggregate and reaggregate again at boundaries between regions. This can become a major burden on boundary routers. Therefore, the granularity chosen to perform aggregation is a key factor in determining the state reduction that can be achieved. Aggregation could, for instance, be done at the flow level, per source and destination IP addresses. But, according to Huston [11] there were around 1.09 billion addresses visible in the Internet routing table in 2001, which translates into 1012 possible combinations of active IP addresses and, consequently, an aggregation scheme that may not scale. Alternatively, aggregation could be based on groups of aggregated IP addresses [54], i.e., network prefixes, which could reduce state along a path, but probably not by much, since such a scheme depends on how addresses are distributed over route prefixes, and on how routes are aggregated through each AS. A far better option is to aggregate reservations on the AS level, given that ASs are the basic building block of current Internet routing infrastructure. From a scalability standpoint, since there are currently 13,000 active ASs on the Internet [1], this represents a much smaller universe than the billions of active IP addresses. However, even if we assume AS level aggregation, other open issues are: what is the best way to aggregate control information when thinking of the amount of state kept, the load of control messages, and computing the optimal amount of resources that should be requested. Our main motivation is, therefore, to gain a better perspective into the scalability of different control aggregation approaches, and their ability to handle large reservation volumes. We focus on inter-domain control reservations, as we expect them to be the most stressful in terms of scalability. In the next section we present our goals, and outline the scope of this dissertation in greater details.
1.2 Dissertation Goals and Outline In this dissertation, we try to provide a basic understanding of the factors and parameters that affect the scalability of inter-domain control reservation mechanisms. In particular, we focus on evaluating two aggregation approaches that attempt to minimize the amount of state and processing associated with resource reservation on inter-domain links. Some of the basic questions involved are how, when, and where to aggregate individual reservation requests. There are several possible criteria that can be used to decide how to aggregate reservation requests on links connecting different ASs. Aggregation can, for example, be performed on the
8
CHAPTER 1. INTRODUCTION
basis of a single shared AS hop, or on the basis of a shared AS path segment, or simply be based on having the same destination AS. These different options translate into different trade-offs related to the efficiency of a mechanism. Hence, the goal of a scalable solution is to minimize the global amount of reservation state required. In addition to the amount of reservation state needed, a scalable solution should also take into account processing and signaling requirements, ensuring that both are kept as low as possible. A related factor is the bandwidth efficiency of a solution, and in particular how often the bandwidth allocated to an aggregate reservation is updated. Ideally, bandwidth should be updated after every change to the individual reservations of an aggregate. This would ensure that only the minimum possible amount is allocated, but would most likely translate into a significant signaling load. Alternatively, bandwidth allocation could be performed less frequently to minimize the consequent signaling load. However, this could affect network efficiency by providing some aggregate reservations with more bandwidth than they really need, potentially preventing others from getting the bandwidth they require. It is in the analysis of these issues that this dissertation is focused, in order to gain insight into the factors that optimize an aggregation approach. The remainder of this document is organized as follows. In chapter two, we explain issues related to inter-domain control aggregation, and introduce two aggregation approaches that will be used as a basis for this study. Chapter three describes a novel inter-domain control aggregation protocol that performs aggregation based on shared segments of an AS path. Chapter four presents a mechanism seen as a possible enhancement to protocols based on the aggregation approaches presented, and that can be used to further reduce their signaling load. We conclude in chapter five, where we also outline possible directions of future work.
2 Aggregation In this chapter, we focus on describing and evaluating several aggregation procedures that attempt to minimize the amount of control state and processing due to resource reservation on inter-domain links. We aim at providing insight into how to devise such type of procedures and hence, concentrate on the single aspect of state optimization, because this performance parameter is of the utmost importance to devise any aggregation procedure: if a procedure experiences low performance in terms of state optimization, then investigating its performance in terms of bandwidth efficiency, or signaling load will probably not make sense. Here, we consider state to be the building block of any aggregation approach, since it is the first crucial performance factor in terms of scalability. We consider two representative families of possible algorithms: the sink-tree aggregation family, which performs aggregation decisions based on sink-trees, and the shared-segment aggregation family, which relies on segments of an AS path shared by different reservations to perform aggregation. We consider algorithms [37, 38] that belong to each family, and evaluate their cost in terms of the state they require both at the AS and at the BRs level. To help in the understanding of the concepts used, we first introduce control aggregation terminology and notions.
2.1 Terminology and Concepts In this section, we introduce some terminology and concepts related with inter-domain control aggregation, that will be used throughout the dissertation. We consider an aggregation region or aggregation domain to be synonymous with an AS. An ingress router is a router placed at the boundary of an AS, crossed by traffic that enters the AS. Similarly, an egress router is a router placed at the boundary of an AS, but crossed by traffic that exits, as illustrated in Fig. 2.1. Requests having in common some path characteristics and crossing the same egress router can be bundled together in an aggregate. For instance, requests going to the same destination can be aggregated together hence being treated as a single request by BRs along a path. 9
CHAPTER 2. AGGREGATION
10
Incoming traffic
Egress Router AS1 Outgoing traffic
Ingress Router
Incoming traffic
Figure 2.1: Role of BRs within an AS.
Aggregates are characterized by their starting and ending ASs, as illustrated in Fig. 2.2: an aggregate is named originating if it starts in the current AS; it is named ending if it ends in the current AS, having therefore to be deaggregated; it is named transient if it is just passing by the current AS.
AS 1
AS 2 A2
A1
Figure 2.2: Classification of aggregates: A2 is an example of an originating aggregate for AS 1, and at the same time, of a transient aggregate for AS 2. A1 is an example of an ending aggregate for AS 2.
The aggregation and deaggregation processes occur at BRs, as illustrated in Fig. 2.3: an aggregator is an agent in charge of processing and possibly merging requests as they leave an AS, hence positioned at egress routers. A deaggregator is an agent in charge of splitting ending aggregates into requests, hence positioned at ingress routers. Deaggregated requests
r2 r3
A3 A3
BR
r1 r2
A2 BR
r3 r4
A1
AS1 BR Aggregators
BR r1 r4
A4
A4
Deaggregators
Figure 2.3: Placement of aggregation agents within an AS. r1 and r2 represent end-to-end reservations; A1, A2, A3, A4 represent aggregates.
The processes of aggregation and deaggregation require that BRs keep some information related to the reservations, i.e., state information, as illustrated in Figs. 2.4 (a) and (b): while deaggregators only keep information about aggregates, aggregators have to map aggregates with their corresponding reservations. Merging of aggregates takes place when aggregates that cross different ingress routers and
2.1. TERMINOLOGY AND CONCEPTS
Incoming reservations
11
Outgoing aggregates
Incoming aggregates
Outgoing reservations
AGGREGATOR AGENT
Incoming aggregates
Aggregate
Reservation
Aggregate
Reservation
DEAGGREGATOR AGENT
Outgoing aggregates
Aggregate
Aggregate
...
Reservation
...
...
(a) Aggregator agent
(b) Deaggregator agent
Figure 2.4: Aggregation agents.
the same egress router at an AS have the same aggregation requisites (for instance, share a path segment). Such is the case exemplified in Fig. 2.5, where aggregate A2 is merged into aggregate A1.
Merging Point
A1 AS1
A1
A2
Figure 2.5: (a) Merging Point: aggregate A2 is merged onto A1 at the common egress router, in AS1.
Intra-domain links are connections between networking elements inside an AS, while interdomain links are connections between neighboring ASs. An AS hop is synonymous with an inter-domain hop. Also, we consider the notions of upstream and downstream in the way that the traffic flows: an AS j is downstream of an AS i if it is between i and a destination AS; AS j is upstream of AS i if it is between a source AS and i. In terms of state information and from a router perspective, each time a reservation crosses an interface, it consumes resources, e.g. memory and CPU cycles. Therefore, state is associated with the interfaces crossed by requests, as illustrated in Fig. 2.6. Hence, to account state in a realistic way, we consider that a request, whether individual or aggregate, occupies one generic unit of state each time it crosses an interface, as illustrated in Figs. 2.6 (a), (b), and (c). The average reservation state, Q, for a BR i in an AS is given in (2.1), where I i represents state due to incoming reservations and Oi due to outgoing reservations.
Qi = I i + O i
(2.1)
Correspondingly, the average state for an AS m, Sm (Eq.c2.2) is simply obtained by summing
CHAPTER 2. AGGREGATION
12
...
1 Aggregate
Incoming state: 2 units
x reservations
Outgoing state: 1+x units
Total state: x+3 units (a)
1 Aggregate
...
x reservations
Incoming state: x+1 units
Outgoing state: 2 units
Total state: x+3 units (b)
1 Aggregate
1 Aggregate
Incoming state: 2 units
Outgoing state: 2 units
Total state: 4 units (c)
Figure 2.6: State accounting at a BR: (a) an aggregate is split into x reservations. (b) x reservations are merged into a single aggregate. (c) an aggregate simply traverses the BR.
2.2. AGGREGATION APPROACHES AND ALGORITHMS
13
the state of its n BRs.
Sm =
n X
(2.2)
Qi
i=0
Both Sm and Qi are relevant performance measures for a given aggregation scheme: tracking state at the AS level gives an overall measure of performance, while tracking it at the router level can help identify variations in state that routers are required to maintain. For example, an aggregation scheme could achieve a low AS level state quantity by having state concentrated at a few routers. Next, we use a simple topology model to explain state accounting for the two aggregation approaches and for the several aggregation algorithms derived from these approaches.
2.2 Aggregation Approaches and Algorithms To describe and compare the aggregation approaches, we use the generic aggregation model illustrated in Fig. 2.7 , representing an AS-level dumbbell [2] topology. This model is simple and sufficient to explain the state accounting methodology used, since state along any path differs as a function of an AS location: sources, destinations, and intermediate ASs. In the illustrated scenario, ASs 1 to P represent source ASs, and P + K + 1 to P + K + N destination ASs. Between source and destination ASs, there is a segment with K ASs. Thus, for ease of understanding and visualization, traffic flows only from left to right. We also assume that each of the source ASs wants to establish Y individual reservations with each of the destination ASs and so, the average number of requests in the system, intensity of requests, is Y ∗ P ∗ N . Hence, red (dark grey) BRs are in charge of ingress operations, i.e., possible deaggregation, while blue (light grey) BRs are in charge of egress operations, i.e., aggregation. AS 1
AS P+K+1
BR BR
AS P+1
BR
...
BR
AS P+K AS3 BR
...
... P source ASes
BR
BR
BR
N destination ASes
K ASes
BR
BR
AS P
AS P+K+N Figure 2.7: Inter-Domain aggregation model.
In the selected topology, all paths have a size of K + 1 AS hops, where K is a variable that
CHAPTER 2. AGGREGATION
14
can be set to reflect a typical AS hop count, e.g., based on a given AS path size distribution. To obtain realistic values for K, we use the values collected by Telstra [12] and also presented in [47]. This data is based on BGP measurements obtained from five major operators in 2001 and gathered from a total of 60,978 ASs. Among other facts, it shows that the current maximum AS path has a size of 10 ASs1 . So, K is less than or equal to nine ASs, since the biggest path in our scenario has K + 1 ASs. In our model, source ASs (1 to P ) keep only state related with outgoing reservations. Destination ASs (P + K + 1 to P + K + N ) keep only state due to incoming reservations. State accounting for ASs P + 1 to P + K is more complex and depends on the aggregation approach used. Hence, we next detail the accounting of state in the given model.
2.2.1 The Sink-Tree Approach Fig. 2.8 displays an example of state accounting for the scenario of Fig. 2.7 when using the sink-tree AS based aggregation approach. Requests are aggregated on the basis of their destination AS, so that the resulting aggregate is in the form of a sink-tree whose root is the destination AS. In other words, all requests with a common destination AS are mapped onto the same tree, independently of their source AS. YN Requests
1
PN Aggregates N Aggregates
1 Aggregate
P+K+1
YP Requests
N Aggregates P+1
P+K
K AS YN Requests
P
N Aggregates
1 Aggregate
P+K+N
YP Requests
Figure 2.8: Example of state accounting for the sink-tree approach.
Since each source AS is generating Y individual reservations for each destination AS, we have Y ∗ N individual requests per source AS. Also, each source AS creates N aggregates, because aggregation is based on destination ASs. Therefore, there is a total of P ∗ N aggregates entering AS P + 1. At this AS, merging of the P ∗ N aggregates takes place based on their respective destination AS, which results in a total of N outgoing aggregates. With the sinktree approach, deaggregation occurs only at destination ASs, where incoming aggregates are deaggregated into individual reservation requests. Tab. 2.1 details state kept in each AS, showing the units required per interface crossed, both at ingress and egress routers, and (2.3) gives the global state count for a sink-tree aggregation approach in this particular scenario. 1 Data
is from 2002.
2.2. AGGREGATION APPROACHES AND ALGORITHMS
15
(2.3)
S1 = P N (2Y + 4) + 4N K − 2N Table 2.1: Global state for the sink-tree approach AS
1
(...)
P
P+1
P+2
(...)
P+K
P+K+1
(...)
P+K+N
Ingress (IN,OUT)
-
-
-
NP, NP
N, N
N, N
N, N
1, YP
1, YP
1, YP
Egress(IN,OUT)
YN, N
YN, N
YN,N
NP, N
N, N
N, N
N, N
-
-
-
Total
(Y+1)N
(Y+1)N
(Y+1)N
3NP+N
4N
4N
4N
YP+1
YP+1
YP+1
2.2.2 The Shared-Segment Approach In this approach, aggregation decisions are made based on the existence of a shared AS path segment between an existing aggregate and a new reservation. In contrast, the sinktree approach requires a shared segment that extends all the way to the destination AS. In the shared-segment approach reservation requests can be assigned to any aggregate with an ending point upstream of their destination AS. If no such aggregate exists, a new one is created, not necessarily extending all the way to the destination AS. The motivation for such flexibility is that shorter aggregates may accommodate more easily additional future requests. On the one hand, by aggregating reservation requests that share only a path segment, we expect to minimize the number of aggregates in use and consequently minimize global state. On the other hand, this process can result in having multiple deaggregation points, each contributing with state, wiping out the advantage of reducing the number of aggregates. In contrast, in the sink-tree approach individual requests require only one deaggregation point at the destination AS that rooted the sink-tree. So, the goal of this section is to explore if proper selection of the shared-segment size can lead to a solution with better performance than the obtained when following the sink-tree approach. Deaggregation
YN Requests
1
P Aggregates 1 Aggregate
1 Aggregate
P+K+1
YP Requests
P+K+N
YP Requests
1 Aggregate P+1
P+K
K AS YN Requests
P
1 Aggregate
1 Aggregate
Figure 2.9: Example of state accounting for the shared-segment approach.
Fig. 2.9 provides an example of state accounting for the scenario illustrated in Fig. 2.7, when AS P + K is the chosen deaggregation point. Note that this is the obvious choice in this particular example, but that more complex configurations may not yield such a clear choice. Tab. 2.2 shows the amount of state maintained in each AS when using the shared-segment approach, while (2.4) gives global state, S2 .
CHAPTER 2. AGGREGATION
16
Table 2.2: Global state for the shared-segment approach AS
1
(...)
P
P+1
P+2
(...)
P+K
P+K+1
(...)
P+K+N
Ingress (IN,OUT)
-
-
-
P,P
1,1
1,1
1,YPN
1, YP
1, YP
1, YP
Egress(IN,OUT)
YN, 1
YN, 1
YN,1
P,1
1,1
1,1
YPN, N
-
-
-
Total
YN+1
YN+1
YN+1
3P+1
4
4
2YPN+N+1
YP+1
YP+1
YP+1
S2 = 4Y P N + 4P + 2N + 4K − 6
(2.4)
Comparing equations (2.3) and (2.4) while varying the number of sources, destinations and also the value of K, we can see how the performance of both the sink-tree and the sharedsegment approaches changes. We notice that the number of individual requests per source AS, Y , has more impact on state equation S2 than on S1 . This means that the shared-segment approach is likely to be more sensitive to the intensity of individual requests than the sink-tree approach. In order to understand possible variations and also to explore how to choose an optimal deaggregation point, we introduce several algorithms based on the shared-segment approach, and compare them with the described sink-tree based algorithm, namely, Border Gateway Reservation Protocol (BGRP). 2.2.2.1 Weighted Deaggregation PointS (WDS)
As first example of an algorithm based on the shared-segment approach, we present WDS, which uses information on the likelihood that a given AS will be a termination point for many future requests. Specifically, WDS assumes that ASs with a larger number of downstream neighbor ASs are more likely to be deaggregation points. This makes such ASs better candidates for being the end-point of an aggregate, and is combined with the distance from the aggregation point when deciding how to create new aggregates. In other words, the aggregator computes a weight, Wm , for each AS m of each path request, based on the number of downstream AS neighbors and the distance from the aggregator to AS m. It then chooses as deaggregation point the AS holding the biggest weight. Eq. 3.1 defines Wm , where nj represents a downstream neighbor of m and d represents the distance from the origin AS to m, given in AS hops:
Wm =
X
nj ∗ d, ∀j, m
(2.5)
j
There are two special cases for the algorithm. The first occurs when two ASs yield the same weight value. In this case, the algorithm chooses the AS nearest to the destination. The second occurs when the destination AS is a leaf, i.e., it has no downstream neighbors. For this case, the algorithm assumes that nj = 1.
2.2. AGGREGATION APPROACHES AND ALGORITHMS
17
Tab. 2.3 displays the amount of state kept per AS in the scenario of Fig. 2.7 , when using WDS. In this scenario, AS K is always selected as the deaggregation point, since it yields the largest weight. Total state for WDS, S4 , is given in (2.6). (2.6)
S4 = 4Y P N + 4P + 2N + 4K − 6 Table 2.3: WDS state per AS AS
1
(...)
P
P+1
P+2
(...)
P+K
P+K+1
(...)
P+K+N
Ingress (IN,OUT)
-
-
-
P,P
1,1
1,1
1,YPN
1, YP
1, YP
1, YP
Egress(IN,OUT)
YN, 1
YN, 1
YN,1
P,1
1,1
1,1
YPN, N
-
-
-
Total
YN+1
YN+1
YN+1
3P+1
4
4
2YPN+N+1
YP+1
YP+1
YP+1
A possible drawback of WDS is that it has to know beforehand the number of downstream neighbors for each AS. This information has to be gathered at each aggregator. Nevertheless, WDS has the advantage of flexibility in the choice of where to deaggregate. 2.2.2.2 Multi-Level Weighted Deaggregation PointS (MLWDS)
A first analysis of WDS gives rise to the hypothesis that this algorithm reduces the number of aggregates created when compared to algorithms based on the sink-tree approach. However, it introduces the cost of having to keep state information about requests mapped to an aggregate at locations upstream of destination ASs. This cost might increase significantly global state and hence, the reduction of aggregates provided by the use of a shared-segment approach might not be enough to achieve an optimal aggregation method. Therefore, the problem seems to reside in having to keep information about individual requests at intermediate deaggregation locations. One possible way to deal with this problem is to use multi-level aggregation: if we consider individual reservations to be at aggregation level 0, regular aggregates are of level 1. Similarly, level n reservations can be aggregated to form reservations at level n + 1. If an algorithm performs multi-level aggregation, intermediate locations won’t need to keep information about individual requests, only about aggregates. The drawback of this approach is the processing cost of having to deal with reservations of different aggregation levels, at the first and last BR of a path. As a possible example of multi-level aggregation, we devise MLWDS, an algorithm that performs second-level aggregation at source ASs, first creating in each source an aggregate to the AS destination and then aggregating it into a second-level aggregate, created according to WDS rules, i.e., having as destination an AS upstream of the request destination. MLWDS behavior is illustrated in Fig. 2.10 , where dashed lines represent first level aggregates. In AS 1, there are two sources that want to establish reservations with destinations in ASs 6 and 7. Thus, MLWDS creates A1 and A2, first-level aggregates that end in ASs 6 and 7,
CHAPTER 2. AGGREGATION
18
AS1
A1
A3 AS3
AS6
AS5
AS4
A2
AS7
Figure 2.10: MLWDS example.
respectively. Then, MLWDS creates a second-level aggregate, A3, at AS 1, following WDS rules. Hence, A3 destination will be AS 5. A3, represented by the pink (light grey) tunnel between ASs 1 and 5, is a second-level aggregate unto which A1 and A2 are mapped. Therefore, between ASs 1 and 5 BRs keep only state due to A3. AS 5 keeps information about A3 at ingress and about A1 and A2 at egress, since these aggregates end one AS hop ahead. It should be noticed that in this example, aggregates A1 and A2 will not be reaggregated. However, in more complex scenarios, it is most likely that multi-level aggregation may occur at several locations upstream of the destination AS of the request, and not only at the source AS, as illustrated. The idea behind MLWDS is that it is most likely that a source AS has more than one request destined to the same destination AS. However, since the shared-segment approach minimizes the number of aggregates created, MLWDS will create a second-level aggregation, to avoid the cost of deaggregating to the level of individual requests in intermediate requests. There is an exception to the default behavior of MLWDS: if, according to the aggregation rules of WDS, the best deaggregation location of an aggregate is the destination AS, then MLWDS won’t perform second-level aggregation for that aggregate. Hence, MLWDS can be seen not so much as a recursive multi-level aggregation algorithm, but instead as a hybrid approach, that may perform second-level aggregation to avoid the cost of keeping state due to individual reservations at intermediate aggregators. Considering again Fig. 2.7 , Tab. 2.4 presents state kept per AS when using MLWDS. Global state for MLWDS, S5 , is given in (2.7). (2.7)
S5 = 2Y P N + 4P + 3P N + 2N + 4K − 6
In Tab. 2.4, we can see that state due to individual requests is now only kept at sources and Table 2.4: MLWDS state per AS AS
1
(...)
P
P+1
P+2
(...)
P+K
P+K+1
(...)
P+K+N
Ingress (IN,OUT)
-
-
-
P,P
1,1
1,1
1,PN
1, YP
1, YP
1, YP
Egress(IN,OUT)
YN, N+1
YN, N+1
YN,N+1
P,1
1,1
1,1
PN, N
-
-
-
Total
YN+N+1
YN+N+1
YN+N+1
3P+1
4
4
2PN+N+1
YP+1
YP+1
YP+1
destinations. However, multi-level aggregation may add the additional cost of processing. From
2.3. EVALUATION OF THE APPROACHES
19
the examples above, the intensity of requests seems to be a key factor for the performance of aggregation procedures: any of the algorithms experiences considerable performance variation when the intensity of requests increases. Also, both MLWDS, and the sink-tree approach appear to be less sensitive to configurations with high intensity of requests, which seems to indicate that intermediate deaggregation points for the case of first-level aggregation result in a high cost of state along a path. To analyse the hypotheses made, we next carry out simulations.
2.3 Evaluation of the Approaches In the previous section, we showed how state varied in terms of the location of both a BR and of an AS, for the sink-tree and the shared-segment approach. We used an AS-level dumbbell topology, highlighting the impact that different factors have in the overall state of a network. Factors that we considered were the impact of the location of routers, number of source and destination ASs, as well as the size of path segments shared by different reservations. However, when considering heterogeneous networks such as the Internet, there are other factors that might influence the global state, such as the traffic distribution, or the average duration of reservations. Hence, to understand the behavior of the algorithms under realistic scenarios, we carry out simulations using the network simulator version 2 [51], on Internet-like topologies. Because we are simulating reservations, we model the arrival of requests as a Poisson process2 , with exponential holding times. We create three examples of reservations: shortlived reservation requests with an exponential average duration of 20 s, long-lived requests with an exponential average duration of 120 s, and a mix of 50% of both, that stands for a particular example of mixed traffic. The chosen durations are simply representative: we simply aim at understanding how the duration of requests impacts on the state required. Hence, the long-lived requests have an average holding time six times longer than the short-lived type. To distribute the requests across topologies and since there is no current information about how traffic is distributed in the Internet, we apply two different distribution methods: an “homogeneous” and a “hotspot” method. In the former, source ASs are chosen randomly and destinations are chosen according to a real distribution of addresses [12, 47] per AS distance. In the latter, we use the concept of hotspot, i.e., an AS with higher incidence of traffic than the others. Also, and to make a consistent comparison of the algorithms, we keep the average number of requests per second in the system (intensity of requests) constant, while varying their average duration, according to the traffic intensity formula for an M/M/∞ model [36]. We use this model, since for the case of state accounting, blocking overhead can be ignored. For each simulation, state accounting is done both at the AS and the BR level, by collecting statistics dynamically for incoming and outgoing reservations: minimum, average 2 We chose
a Poisson process to model the arrival of requests since it is known to describe well user session arrivals, as mentioned in [52, 50].
CHAPTER 2. AGGREGATION
20
and maximum values are updated each time the corresponding variable changes 3 . With the values obtained, minimum, average, and maximum state per BR and per AS are computed with a 95% confidence level. Data obtained is only considered after an adequate warm-up period (30% of the overall simulation period), to assure that we obtain steady-state results. Also, to achieve statistically meaningful results, each experiment has been repeated several times using different random number seeds. First, we devise a scenario where one hotspot AS is placed randomly in the AS-level topology generated with BRITE [4] and illustrated in Fig. 2.11. Because we want to assess the impact of having a high intensity of requests either entering or leaving an AS, we devise two specific cases of hotspots: a source and a destination hotspot AS. 15
5 11
4
12
3 18
7 2 8 16
13 0
1
10 14
19
9
6 17
Figure 2.11: Hotspot experiments’ topology.
2.3.1 Source Hotspot Scenario To create a source hotspot, a random AS is chosen to be the source of 60% of the requests in the topology illustrated in Fig. 2.11. For the remaining requests, their starting and ending ASs are chosen by the homogeneous traffic distribution method. This hotspot scenario is an example of a source-tree scenario. Tab. 2.5 shows results obtained for an intensity of 5000 requests, which stands for an example of a high intensity. The table details state per AS and per BR in terms of minimum, average, and maximum values: the first column on the table represents the different types of requests, according to their average duration. The second column shows the scope of the obtained results, i.e., per AS or per router, while the third represents the type of variable. The first observation is that the duration influences the state: the mixed-type of reservations are the ones that require more state units, independently of the algorithm chosen. The short-lived type of reservations demand higher average and maximum state quantities, when compared to the long-lived ones. A hypothetical explanation for this behavior is the way 3 Formulas
presented in Appendix A.
2.3. EVALUATION OF THE APPROACHES
21
Table 2.5: Source hotspot scenario state, 5000 requests MLWDS (Avg/ 95% CI)
W DS BGRP
M LW DS BGRP
±24.09
436.76
±4.06
1.19
0.99
±25.85
566.16
±0.81
1.2
0.99
859.15
±28.82
697.95
±1.53
1.22
0.99
±0.48
65.61
±3.01
54.60
±0.51
1.19
0.99
±0.15
85.51
±3.24
70.77
±0.10
1.2
0.99
±0.20
107.39
±3.60
87.24
±0.19
1.22
0.99
σ
SCOPE
VAR
BGRP (Avg/ 95% CI)
20s
AS
Min
441.52
±3.87
524.86
Avg
571.51
±1.15
684.04
Max
704.26
±1.62
Min
55.19
Avg
71.44
Max
88.03
Router
50% 20s
AS
50% 120s
Router
120 s
AS
Router
WDS (Avg/ 95% CI)
Min
658.88
±6.23
812.78
±82.92
653.61
±6.46
1.23
0.99
Avg
781.42
±3.10
966.67
±94.83
776.25
±3.13
1.24
0.99
Max
903.52
±1.95
1123.54
±106.14
898.31
±1.93
1.24
0.99
Min
82.36
±0.78
101.60
±10.37
81.70
±0.81
1.23
0.99
Avg
97.68
±0.39
120.83
±11.85
97.03
±0.39
1.24
0.99
Max
112.94
±0.24
140.44
±13.26
112.29
±0.24
1.24
0.99
Min
460.87
±2.29
539.01
±20.77
456.30
±2.34
1.17
0.99
Avg
564.08
±3.01
664.46
±23.24
559.33
±3.40
1.18
0.99
Max
663.89
±2.49
792.69
±24.79
658.83
±2.23
1.19
0.99
Min
57.61
±0.29
67.38
±2.60
57.04
±0.30
1.17
0.99
Avg
70.51
±0.38
83.06
±2.91
69.92
±0.43
1.18
0.99
Max
82.99
±0.31
99.09
±3.10
82.35
±0.27
1.19
0.99
requests are generated, on the one hand, and the sensitivity of the algorithms to the path size of first requests, on the other: to keep the intensity of requests constant, we generate more shortlived requests than either mixed or long-lived, for the whole simulation duration. Due to the way sources and destinations are placed in the topology, a larger number of requests has more probability of creating more diversified path sizes. Hence, there might be higher probability that the first requests arriving to an AS have different path sizes. In terms of algorithm performance, MLWDS shows the best performance for any type of requests: even though MLWDS creates most likely more aggregates per source, it reduces the global average number of aggregates created, hence taking advantage of shared path segments. WDS, which possibly creates less aggregates - since only performs first level aggregation - has the drawback of using several intermediate deaggregation locations. To grasp the influence of the intensity of requests in state, we repeat this experiment varying the intensity of requests. The state evolution for the different average holding times is plotted in Fig. 2.12, where each chart corresponds to a different algorithm. Comparing the charts of the three algorithms, BGRP and MLWDS present similar state evolution when the intensity of requests changes, even though MLWDS requires less state. Hence, MLWDS performs better for scenarios with high intensity of requests. WDS presents a linear performance decay with the increase of the intensity of requests, because it needs to keep state due to individual requests at intermediate locations. From this experiment, we can conclude that state is influenced not only by the duration of requests, but also by the number of intermediate deaggregation locations required, since at those locations there is the need to keep state due to aggregates and to their mapped reservations.
CHAPTER 2. AGGREGATION
22
State Units
However, we cannot draw conclusions about how the traffic distribution influences state. Hence, we next simulate a destination hotspot scenario. This will shed more light into the impact of traffic on state.
800 700 600 500 400 300 200 100 0 100
BGRP Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 1000
5000 Requests per second
State Units
(a) BGRP
1200 1000 800 600 400 200 0 100
WDS Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 1000
5000 Requests per second
State Units
(b) WDS
800 700 600 500 400 300 200 100 0 100
MLWDS Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 1000
5000 Requests per second
(c) MLWDS
Figure 2.12: Source hotspot scenario state evolution.
2.3.2 Destination Hotspot Scenario
For this scenario, we again use the topology illustrated in Fig. 2.11. The difference from the previous scenario is that now there is an AS receiving 60% of the reservation requests, and the remaining 40% starting and ending points are computed using the homogeneous traffic distribution. Results are presented in the form of Tab. 2.6, for an intensity of 5000 requests. The first evidence presented is that MLWDS achieves the best performance, independently LW DS remains, as in the previous scenario, in the 99%, of the duration of requests: the ratio MBGRP showing that MLWDS presents an improvement of 1% when compared against BGRP. The ratio W DS is representative of the 30% deterioration in performance that WDS experiences when BGRP compared to BGRP.
2.3. EVALUATION OF THE APPROACHES
23
Table 2.6: Destination hotspot scenario state, 5000 requests MLWDS (Avg/ 95% CI)
W DS BGRP
M LW DS BGRP
±22.42
437.86
±5.12
1.12
0.99
±24.06
566.23
±2.59
1.14
0.99
817.13
±28.71
698.66
±3.21
1.16
0.99
±0.65
62.16
±2.80
54.73
±0.64
1.12
0.99
71.37
±0.39
81.02
±3.00
70.78
±0.32
1.14
0.99
87.87
±0.46
102.14
±3.59
87.33
±0.40
1.16
0.99
σ
SCOPE
VAR
BGRP (Avg/ 95% CI)
20s
AS
Min
443.29
±5.22
497.27
Avg
570.94
±3.12
648.19
Max
702.99
±3.67
Min
55.41
Avg Max
Router
50% 20s
AS
50% 120s
Router
120 s
AS
State Units
Router
WDS (Avg/ 95% CI)
Min
651.51
±7.65
750.82
±36.86
646.98
±7.38
1.15
0.99
Avg
778.46
±3.55
902.55
±48.35
773.73
±3.44
1.16
0.99
Max
897.04
±6.37
1051.24
±58.06
892.51
±6.38
1.17
0.99
Min
81.44
±0.96
93.85
±4.61
80.87
±0.92
1.15
0.99
Avg
97.31
±0.45
112.82
±6.04
96.72
±0.43
1.16
0.99
Max
112.13
±0.80
131.41
±7.26
111.56
±0.79
1.17
0.99
Min
459.40
±5.31
523.55
±19.93
454.38
±5.28
1.14
0.99
Avg
564.98
±3.01
649.22
±21.84
559.87
±3.16
1.15
0.99
Max
665.53
±2.67
775.39
±25.23
660.69
±2.87
1.17
0.99
Min
57.43
±0.67
65.44
±2.49
56.80
±0.66
1.14
0.99
Avg
70.62
±0.37
81.15
±2.73
69.98
±0.39
1.15
0.99
Max
83.19
±0.33
96.92
±3.15
82.59
±0.36
1.17
0.99
800 700 600 500 400 300 200 100 0 100
BGRP Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 1000
5000 Requests per second
State Units
(a) BGRP
1000 900 800 700 600 500 400 300 200 100 0 100
WDS Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 1000
5000 Requests per second
State Units
(b) WDS
800 700 600 500 400 300 200 100 0 100
MLWDS Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 1000
5000 Requests per second
(c) MLWDS
Figure 2.13: Destination hotspot scenario state evolution.
We repeat the simulation while varying the intensity of requests, plotting the results in Fig. 2.13, where once again BGRP and MLWDS present a similar behavior. In comparison to the previous scenario, there is a decrease of average state for WDS, while
CHAPTER 2. AGGREGATION
24
BGRP requires approximately the same state. For instance, when the intensity of requests is of 5000 and traffic is mixed, both MLWDS and BGRP require approximately 800 state units for the source and for the destination hotspot scenario. Also, and for either the sort-lived or the longlived reservations, both MLWDS and BGRP require approximately 600 state units. In contrast, WDS requires more state units for the source-tree scenario than for the sink-tree scenario. These results confirm that the shared-segment approach is more sensitive to the intensity of requests, due to the intermediate deaggregation locations. We can also conclude that having an AS with higher incidence of requests impacts on the performance of the algorithms. However, we cannot assess the effect of having an AS with higher incidence of requests in the overall state, without further evaluating the algorithms in scenarios where there are no hotspots. Hence, we next devise a scenario where now traffic is distributed in a more homogeneous manner - all ASs have the same probability of being a source AS.
2.3.3 Homogeneous Traffic For this scenario, we use a larger AS level topology, illustrated in Fig. 2.14, where traffic is distributed using the homogeneous method described. The simulation results are presented in Tab. 3.1, and a major difference from previous scenarios is that state kept per AS and per router is lower, due to the fact that the current topology is larger than the topology used in the previous experiments (cf. Fig. 2.11). Results obtained in the previous experiments are coherent with the results shown in Tab. 3.1. In terms of reservation duration, the mixed-type still requires more state, independently of the algorithm run. In terms of algorithms, MLWDS still achieves the best performance. WDS experiences a performance decay up to 30% when compared to either MLWDS or BGRP. These results still hold when varying the intensity of requests, as shown in Fig. 2.15. BGRP and MLWDS show close performance, being MLWDS the algorithm that requires less global state. We next conclude this chapter. 21
31 44
6
41
24
22
46
34
14 9
29
7
0
1
25
20
11 38
5
18
13
2
27
4 33
23
3
42
16 40 48
39
49
8 30
10 28
17
43
35
26 36
19 12
47 15 32
45 37
Figure 2.14: Homogeneous traffic experiment topology.
2.3. EVALUATION OF THE APPROACHES
25
Table 2.7: Homogeneous traffic scenario state, 5000 requests σ
SCOPE
VAR
BGRP (Avg/95% CI)
WDS (Avg/95% CI)
WDS (Avg/95% CI)
W DS BGRP
M LW DS BGRP
20s
AS
Min
242.10
±0.66
267.87
±1.22
237.84
±1.51
1.11
0.98
Avg
373.32
±0.61
415.80
±0.30
385.92
±3.25
1.11
1.03
Max
506.21
±1.44
569.67
±2.19
539.98
±6.91
1.13
1.07
Min
30.26
±0.08
33.48
±0.15
29.73
±0.19
1.11
0.98
Avg
46.67
±0.08
51.98
±0.04
48.24
±0.41
1.11
1.03
Max
63.28
±0.18
71.21
±0.28
67.50
±0.87
1.13
1.07
Min
350.00
±2.26
422.21
±3.39
334.54
±2.73
1.21
0.96
Avg
460.30
±1.30
555.62
±2.26
446.60
±2.40
1.21
0.97
Max
567.41
±1.37
688.00
±3.14
557.09
±2.83
1.21
0.98
Min
43.75
±0.28
52.78
±0.43
41.82
±0.34
1.21
0.96
Avg
57.54
±0.16
69.45
±0.28
55.83
±0.30
1.21
0.97
Max
70.93
±0.17
86.00
±0.39
69.64
±0.36
1.21
0.98
Min
266.31
±1.89
296.24
±3.15
251.63
±1.50
1.11
0.94
Avg
363.54
±1.33
408.54
±2.04
351.22
±1.39
1.12
0.97
Max
459.80
±2.05
522.27
±3.01
451.28
±2.27
1.14
0.98
Min
33.29
±0.24
37.03
±0.39
31.45
±0.18
1.11
0.94
Avg
45.44
±0.16
51.07
±0.26
43.90
±0.17
1.12
0.97
Max
57.48
±0.26
65.28
±0.37
56.41
±0.28
1.14
0.98
Router
50% 20s
AS
50% 120s
Router
AS
State Units
Router
800 700 600 500 400 300 200 100 0 1001000
BGRP Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 5000
10000
Requests per second
State Units
(a) BGRP
1200 1000 800 600 400 200 0 1001000
WDS Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 5000
10000
Requests per second
(b) WDS
State Units
120 s
800 700 600 500 400 300 200 100 0 1001000
MLWDS Avg State σ=20s σ=50% 20s, 50% 120s σ=120s 5000 Requests per second
10000
(c) MLWDS
Figure 2.15: Homogeneous traffic scenario state evolution.
26
CHAPTER 2. AGGREGATION
2.4 Chapter Closure In this chapter, we introduced terminology and concepts related to inter-domain control aggregation. We detailed the behavior of two aggregation approaches and three algorithms derived from these approaches. BGRP follows the sink-tree approach, while WDS and MLWDS follow the shared-segment approach. Our goal was to gain greater insight into the ability of different inter-domain aggregation procedures in accommodating large volumes of reservation requests across different routing domains. As utility function, we considered the reduction of state achieved per AS and per BR. We first examined state accounting by means of a simple analytical example, to better address the strong points and drawbacks of each approach. The analytical example attested that the intensity of individual reservation requests is of major importance for any of the approaches. However, it is most relevant for the shared-segment approach in terms of firstlevel aggregation, because individual reservation information has to be kept at each router: with first-level aggregation, intermediate deaggregation points need to maintain information not only about aggregates, but also about the individual requests that are mapped to the deaggregated aggregates. However, this cost can be avoided if we perform instead second-level aggregation, as suggested by MLWDS. MLWDS achieves the best performance for any of the experiments, but its behavior is closely followed by BGRP, which presents also a good insensitivity to the intensity of requests. In contrast, WDS performance experiences a significant decay due to the increase of the intensity of requests. The shared-segment approach is, hence, more sensitive to the intensity of requests than the sink-tree approach, because it requires a way of mapping reservations to their aggregates, in each deaggregation location. In contrast, the sink-tree approach only keeps the mapping of reservations and corresponding aggregates at the first egress BR and the last ingress BR of an AS path. Nevertheless, the shared-segment approach reduces the number of aggregates when compared to the sink-tree approach. MLWDS is able to avoid the drawback of the shared-segment approach, because it does not deaggregate up to the individual reservation level on intermediate deaggregation locations. However, and even though it only requires second level aggregation to avoid this drawback, MLWDS brings additional complexity at source ASs. Depending on how reservations are populated across a topology, this may become a major drawback. We have shown that both aggregation approaches achieve state scalability, and that the shared-segment presents more flexibility. Given this, the solution to the sensitivity of the sharedsegment approach might be not on choosing where to deaggregate, but on what information to keep at intermediate deaggregation locations. Because this hypothesis requires further analysis
2.4. CHAPTER CLOSURE
27
not only in terms of algorithm but also, in terms of implementation, in the next chapter we introduce a protocol based on the shared segment approach that uses an enhanced version of WDS, and compare its performance against a protocol based on the sink-tree approach, that uses as algorithm BGRP. We chose to use WDS and not MLWDS, because even though both algorithms are representative of the shared-segment approach, WDS appears to be less complex and we believe that it can easily overcome the shared-segment drawbacks pinned down in this chapter.
28
CHAPTER 2. AGGREGATION
3 SICAP In the previous chapter, we addressed QoS control aggregation issues with the purpose of analysing different procedures based on two specific aggregation approaches, the sink-tree and the shared-segment approach. In this section, we describe a novel protocol, the Sharedsegment Inter-domain Control Aggregation Protocol (SICAP) [40], which uses the sharedsegment approach to aggregate reservations and thus, to decrease state required at BRs. As baseline assessment of SICAP, we use BGRP, which stands for an example of a reservation establishment protocol based on the sink-tree aggregation approach. Thus, by comparing the performance of these two protocols, we describe and analyse design issues derived from the use of these two aggregation approaches. We start by describing related work, to then present a detailed description of BGRP and SICAP.
3.1 Related Work Guérin et al. [35] present a survey of possible approaches to aggregate RSVP requests assuming unicast scenarios and covering issues such as RSVP state management and path characterization. The survey proposes the use of aggregation tunnels, i.e., pipes between entry and exit points of an AS. A similar approach is followed by Berson et al. [42]. They consider unicast and multicast scenarios, focusing also on RSVP aggregation within an aggregation region. These two approaches are concerned with RSVP scalability: RSVP requires all the routers on the path of an individual reservation to maintain state dedicated to that reservation. The resulting state information can be overwhelming specially for backbone routers that may have to support a large number of simultaneous requests. The two proposals reduce state by aggregating individual requests inside an AS, but neither considers the problem of inter-domain control aggregation. Schèlen et al. [28] introduce a resource reservation architecture that performs shared-media resource reservation taking into consideration future reservations, with the purpose of providing reservation support for real-time events, scheduled in advance. This architecture relies on the 29
30
CHAPTER 3. SICAP
placement of an agent per routing domain. Each agent is an end system which obtains topology information from internal routers, i.e., it contains a map of the topology of each domain. The agent performs parameter-based admission control, thus avoiding the exchange of signaling messages between routers and agents. If the agent receives a reservation request whose starting point is not within its domain, then it forwards the request to another agent, placed nearer to that starting point. Besides being in charge of admission control, agents can perform aggregation based on the destination of requests, i.e., sink-tree aggregation. Their solution is centralized, relying on agents that have knowledge about all the topology, and is focused in advanced reservations. Pan et al. [30] introduce BGRP, which also performs sink-tree aggregation. For each reservation, BGRP sends a pair of control messages along the path that an aggregate will follow to reach its destination according to BGP rules, i.e., a sink-tree. Hence, deaggregation takes place only at the destination AS. Pan et al. show that BGRP has good performance when compared with RSVP without aggregation. However, BGRP was not compared to other possible aggregation methods. Hence, assessing its effectiveness as an inter-domain solution remains an open issue. We next present BGRP in further detail.
3.2 BGRP BGRP is an inter-domain control aggregation protocol that is sender-initiated in the sense that it is the first BR on the path (first-aggregator) to trigger reservation requests. BGRP merges reservations going to the same destination AS. Hence, it creates aggregates in the shape of sinktrees, being their roots the reservation destination ASs. In practice, it should be noticed that the root of each sink-tree is identified by a combination of the last AS and of the last-deaggregator identifiers, since each AS usually has more than one ingress point. To manage reservations, BGRP exchanges messages with reliability, i.e., the exchange is performed over Transmission Control Protocol (TCP) [23] connections, similarly to the reliable messaging of BGP. BGRP uses five message types, PROBEs, GRAFTs, TEARs, ERRORs and REFRESHs. PROBE messages are, as their name evinces, used for probing resources on the path from a first-aggregator to a last-deaggregator, according to agreements established between ASs, i.e., according to BGP rules, and to inter-AS QoS agreements. The PROBE message contains information concerning a specific reservation (resources, QoS requisites, destination), and additionally, it is used to gather the identifiers of crossed BRs, i.e., it contains a route record. Each identifier is composed of the IP address and AS number. It should be noticed that PROBE messages do not require any state along the path, since their single purpose is to probe resources. The PROBE is forwarded to the next AS, according to the BGP next hop attribute information. When it reaches a last-deaggregator on a path, i.e., the root of a sink-tree, that BR
3.2. BGRP
31
triggers a GRAFT message, that will update resources for the corresponding tree along the path previously collected by the PROBE message. Using such scheme avoids the consequences of having to deal with asymmetric paths, i.e., following a path from AS B to AS A that is different from the path followed when going from AS A to AS B: probing the path prior to allocating resources, ensures that a reservation is established on the reverse path earlier probed from sender to destination AS. Therefore, BGRP uses a two-phase reservation establishment mechanism: in the first phase - probing - the path is probed with a PROBE message sent from the firstaggregator to the last-deaggregator. In the second phase - allocation - the last-deaggregator uses the information gathered by the PROBE to update resources of the tree into which it will merge the reservation, i.e., the tree whose root is the last AS. It then sends a GRAFT message that allocates the necessary resources along the path traversed by the earlier PROBE message. Along the path previously probed, each intermediate BR keeps state information related to each tree. Such information is composed of a label that uniquely identifies the tree (combination of last-deaggregator IP address and destination AS), of the reservation destination network prefix, as well as of the last-deaggregator identifier, and of the share of resources reserved for the tree. Additionally, it may include information about a possible traffic class. The sink-trees created by BGRP share the soft-state property, i.e., their information is periodically refreshed between neighboring BRs with REFRESH messages: if some reservation has not been refreshed within a defined interval, the corresponding BR removes the reservation. A REFRESH message carries a list of all its source BR active reservations, and the carried state information is compressed, similarly to what has been proposed as an RSVP enhancement [25]. BGRP also uses optional TEAR messages, that routers can send to explicitly remove reservations. To better show how BGRP works, we exemplify its behavior with the scenario illustrated in Fig. 3.1, where R1 represents a reservation request that requires 5 bandwidth units from an endhost in AS 1, to an end-host in AS 5. When router S1 receives R1 , it sends P ROBE(1), which contains the request identifier R1 , the source identifier S1, the identifier of R1 destination, and the bandwidth requirement b{S1,E1} = 5, where {i, j} represents the link between BRs i and j, and an empty route record. P ROBE(1) goes through E1, E3, E4, E5, each of which inserts its identifier in the route record. P ROBE(1) stops in case of error, or when it reaches the lastdeaggregator, D1. If it fails to reach D1, an ERROR message is sent back to S1 by the router where the failure occurred, and no further processing is required since, as explained, PROBE messages don’t require any state at intermediate BRs. If P ROBE(1) reaches D1, this BR replies with GRAF T (1), which contains the same information as the P ROBE(1), along with a label that uniquely identifies the sink-tree whose root is AS 5/ D1. GRAF T (1) will establish that sink-tree along E5, E4, E3, and E1, reserving for it 5 bandwidth units in each link. We next assume, as illustrated in Fig. 3.2, that a request R2 requiring 1 bandwidth unit
CHAPTER 3. SICAP
32
R1 AS1
S1
PROBE(R1 ,b=5,[E1,E3,E4,E5])
PROBE(R1 ,b=5)
E1 AS2 S2
AS3 E2
E3
D1 AS5 E4
E5 AS4 E6
GRAFT(R1, b=5)
D2 AS6
Figure 3.1: BGRP example: reservation R1 goes from AS 1 to AS 5, and asks for five bandwidth units.
starts in AS 2 and is also destined to an end-host in AS 5. S2 sends P ROBE(2), containing the identifier R2 and bandwidth b{S2,E2} = 1. When this message arrives at D1, the router replies with GRAF T (2), that will increment in b{S2,E2} units the bandwidth previously allocated to tree A, until E3. Hence, from E3 to D1 , each BR will increment in 1 unit the bandwidth of tree A. Between E3 to S2, GRAF T (2) triggers the creation of a new branch of A, allocating for it b{S2,E2} = 1 bandwidth units. With this example, we showed how BGRP can merge reservations into an existing sink-tree. AS1
Sink−tree A
S1 E1
AS2 S2
AS3 E2
E3
E4
GRAFT(R , b=1) 2
E5 AS4 E6
D1 AS5
PROBE(R2 ,b=1,[E1,E3,E4,E5])
D2 AS6
PROBE(R2 , b=1)
R2 Figure 3.2: BGRP example: reservation R2 goes from AS2 to AS5 and asks for one bandwidth unit.
Let us now suppose, as illustrated in Fig. 3.3, that a request R3 requiring three bandwidth units, again starting in AS 2, is destined to AS 6. When P ROBE(3) reaches D2, the lastdeaggregator on the path of R3 , this router triggers the creation of a new sink-tree, B, that extends all the way to S2 and that is independent of the tree A even over their common segments, since they have different destination ASs. D2 triggers GRAF T (2), which will allocate three bandwidth units for the new sink-tree B, on each link of its path. This situation is an example of how BGRP creates different sink-trees. BGRP introduces several improvements, when compared to RSVP without aggregation: • Stateless probing. As explained, BGRP performs stateless probing by having a PROBE message collect information about path resources. In contrast, RSVP uses a PATH
3.2. BGRP
33
AS1
Sink−tree A
S1
D1 AS5 E1
AS2 S2
AS3 E2
E3
E4
E5 AS4 E6
Sink−tree B PROBE(R3)
D2 AS6 GRAFT(R3)
R3 Figure 3.3: BGRP example: reservation R3 goes from AS2 to AS6 and asks for three bandwidth units.
message, that requires state at intermediate routers, so that a RESV can be sent back to the sender on the same path: this implies that routers have to keep information about senders and receivers, and has a huge impact on the scalability of a solution. Considering a worstcase scenario of a network with N nodes, RSVP would require N 2 entries, while BGRP would only require at most N entries. • Type of aggregation. RSVP can combine reservations in different ways: (1) several multicast receivers may have reservations for the same sender, resulting in an aggregate with a size approximately equal to the maximum of the size of each reservation; (2) several (multicast) senders share a single reservation. However, the size of an aggregate in BGRP is exactly the sum of its merged reservations, for each different instant. • Bundled refresh. With standard RSVP, state increases proportionally with the number of sessions. RSVP reservations have soft-state, thus are periodically refreshed by either PATH or RESV messages. Therefore, in standard RSVP, refreshes are performed per individual reservation, which results in scalability issues. Several [25] enhancements have already been proposed, in order to reduce the need for refreshes in RSVP. In contrast, BGRP performs refreshes using a single REFRESH message per aggregate, and not per individual reservation, which drastically reduces the overhead due to refreshes. • Reliable messaging. RSVP relies on the exchange of session information for robustness, but messages are delivered with no guarantees. In contrast, BGRP messages are reliably delivered. Pan et al. show that BGRP scales well in terms of control state, message processing, and bandwidth efficiency, when compared to RSVP without aggregation. However, and partially given that it was the first approach to explore in detail the issue of inter-domain control aggregation, they did not provide a comparison with other aggregation protocols. In the light of such open research issue, we next present SICAP, a novel inter-domain control aggregation protocol that performs shared-segment aggregation.
CHAPTER 3. SICAP
34
3.3 SICAP Design and Operation SICAP, like BGRP, is sender-initiated and uses a two-phase mechanism to establish reservations. Before describing the design of SICAP, we first explain how it chooses intermediate deaggregation locations.
3.3.1 Choosing Where to Deaggregate SICAP uses an enhanced version of the WDS algorithm, named Weighted Deaggregation PointS Enhanced (WDS-E) [40], to decide how to aggregate. WDS-E still assumes that ASs with a large number of downstream neighbor ASs are more suitable to be aggregate endpoints, since those ASs are more likely to experience higher intensities of requests. However, the weight it computes requires less information than the weight computed by WDS: for each AS m of a path, WDS-E computes a weight Wm equal to the number of next-hop ASs1 of m, nm :
Wm = nm , ∀m
(3.1)
Deciding on where to deaggregate is a complex decision that depends mostly on the relationship between neighboring ASs, and the way traffic is exchanged between them. In a BR, we can parse the information provided by BGP, to infer inter-AS relationships. We consider that an AS with a large number of next-hop ASs is a likely candidate to be the endpoint of an aggregate, since traffic crossing that AS will most likely be spread over different paths. It should be noticed that the WDS-E algorithm is not presented here as the optimal (or unique) solution to decide on how to aggregate. Instead, WDS-E is presented as a possible simple algorithm based on the shared-segment approach, that does not require too much information, and that yet allows the shared-segment approach to possibly achieve better performance than the sink-tree approach. But, to check if the rules followed by WDS-E are indeed the best ones to be applied, there is the need for some thorough research on the subject. We address this issue as future work, in chapter 5. Fig. 3.4, where each circle represents an AS, exemplifies how WDS-E works. When the last-deaggregator at the destination AS receives request R1 , it uses the weights of the ASs on the path, carried by the PROBE message, to choose the first Intermediate Deaggregation Location (IDL). As shown in Fig. 3.4 (a), the AS yielding the heaviest weight is D1, which becomes the first IDL. To increase the probability that requests coming from different source ASs will use aggregates already established, the process is repeated recursively between each IDL and the destination AS. Fig. 3.4 (b) shows the second and final iteration for the segment 1 The
number of next-hop ASs can be computed by parsing data obtained from the BGP paths of a router, e.g., in Cisco routers, that information can be obtained with the command show ip bgp paths.
3.3. SICAP DESIGN AND OPERATION
35
between D1 and the destination. Therefore, in the given example, WDS-E triggers the creation of three different aggregates: the first extends from the source AS to D1; the second extends from D1 to D2; the third goes from D2 to the destination AS.
R1
R1
Destination AS
Destination AS A3
W=1
W=2
W=2 W=6
W=2 W=3
A1
W=1
W=0
A2
W=3
W=1
W=2
Source AS
IDL (D1)
Source AS
(a) First IDL
IDL (D1)
IDL (D2)
(b) Second IDL
Figure 3.4: WDS-E example.
The weights of the ASs on the path, which will be used to choose IDLs, are carried through SICAP messages, which we present next together with several examples of messaging sequences. 3.3.2 Messages To manage reservations, SICAP uses five different message types, which share a common header, presented in Fig. 3.5, and containing the following information: • Version (4 bits). This field designates SICAP’s version (currently 1). • Type (4 bits). Designates the message type (REQ, RESV, TEAR, ERROR or REFRESH). • Message size (16 bits). Presents the total message size. • Bandwidth (16 bits). Designates the request bandwidth, in Kb. • Request ID (16 bits). This is a request identifier, generated by the first-aggregator. • Source BR (128 bits). This is the IP (IPv4 or IPv6) address of the first-aggregator, i.e., the message source. For the specific case of IPv4, the 32-bit address will occupy the less significant 32 bits (network order). • Destination IP (128 bits). This is the IP (IPv4 or IPv6) destination address. For the specific case of IPv4, the 32-bit address will occupy the less significant 32 bits (network order).
CHAPTER 3. SICAP
36
16 bits
Version
16 bits
Type
Message Size
Bandwidth
Request ID Source BR (first−aggregator) Destination IP
Figure 3.5: SICAP’s common header format.
Additionally, each message might carry other information, according to its type, as described next.
3.3.2.1 REQ
REQ messages have the format illustrated in Fig. 3.6 (a), and are sent by first-aggregators to probe network resources. Along the path, each BR adds its identifier to the REQ message. Thus, when a REQ reaches a destination AS, it carries the route record containing the identifiers of the crossed BRs and ASs, and resources required. The route record is represented in Fig. 3.6 (b) and has a variable size, since it depends on the number of routers crossed. According to current Internet statistics [12], the current average path size is of five ASs and the maximum is of eleven ASs. Therefore, in average the route record will have a size of seven, and a maximum size of nineteen, since the last AS contributes only with one BR, and since the first BR identifier is already included in the common header.
Common Header
Route record
(a) REQ format
BR1 IP
BR2 IP
ASN 1
ASN 2
W1
W2
...
(b) Route Record
Figure 3.6: REQ message.
3.3. SICAP DESIGN AND OPERATION
37
3.3.2.2 RESV
RESV messages are sent upstream by the last-deaggregator of a path, as a reply to a received REQ message and are used to allocate the required resources. The RESV contains the information of the corresponding REQ, and an aggregate label (32 bits) that uniquely identifies the aggregate into which the reservation will be merged, as illustrated in Fig. 3.7. The aggregate label field is reset at intermediate aggregators, and is again set at the corresponding intermediate deaggregators. Common Header Aggregate Label
Route record
Figure 3.7: RESV format.
3.3.2.3 ERROR
ERROR messages are used in case of reservation failure. If a reservation is rejected, an error message of sub-type REJ is sent upstream, to notify the first-aggregator of the rejection. If a reservation fails, not due to resources or link failure, but because the corresponding aggregate state was deleted, a generic ERROR message is sent downstream, to notify the next router in the path that the reservation should be retried. The ERROR message only contains the common header, and is forwarded according to the information of the BGP next-hop attribute. 3.3.2.4 TEAR
TEAR messages are triggered by the source of a reservation. Their purpose is to delete the reservation along its path. The TEAR message contains only the common header, and are also forwarded according to the information of the BGP next-hop attribute. 3.3.2.5 REFRESH
These messages update the information regarding reservations along a path. They are exchanged periodically each Tr seconds between neighboring BR. Similarly to BGRP, refresh in SICAP is bundled: a single REFRESH message carries all the state information contained by the router. By default, Tr is set to 30s, since this is the default value for the BGP timer
CHAPTER 3. SICAP
38
KeepAlive. If a router does not receive a refresh for a specific reservation after 90s, the default value of the BGP timer HoldTime, it will delete that reservation state. The five message types presented are used in different situations, to which we provide examples next.
3.3.3 SICAP Operation AS1
S1
AS2 S2
D1 E1 AS3 E3 E2
AS5
E5
E4
AS4
E6 D2 AS6
Figure 3.8: SICAP scenario.
To illustrate how SICAP works, we use the scenario of Fig. 3.8, where ellipses represent different ASs, Si is the first-aggregator and Di is the last-deaggregator on the path of a reservation Ri . We exemplify three situations with this scenario: the first deals with the establishment of reservations R1 and R2 , the second describes the deletion of reservation R1 , and the third illustrates a possible exchange of error messages in the case of a failure of reservation R1 . 3.3.3.1 End-to-End Reservation Establishment
To establish a reservation, SICAP uses a pair of REQ/RESV messages: when a reservation request is triggered, the SICAP agent on the first-aggregator sends a REQ to probe the path, until it reaches the last-deaggregator on the path. Similarly to the BGRP case, this represents the first phase of the reservation establishment, which ends when a REQ message either reaches a last-deaggregator, or when it reaches a point of failure. For the latter case, an ERROR message (of sub-type REJ) is sent upstream directly to the first-aggregator: since the reservation has not been established yet, intermediate routers do not need to be notified that the reservation was rejected. The reception of a REQ message by a last-deaggregator triggers the second phase of a reservation establishment: from the list of ASs crossed on the path, the last-deaggregator chooses as previous IDL the AS holding the largest Wm value. It then looks for an aggregate that ends in the current AS, and that either starts or crosses the previous IDL. If there is no such aggregate, a new one is created. However, if such an aggregate exists, its resources are updated by a RESV message sent by the last-aggregator: at each BR, the RESV triggers the creation
3.3. SICAP DESIGN AND OPERATION
39
or the update of the aggregate. This process is repeated until the RESV message reaches the first-aggregator on the path. To better explain how SICAP establishes reservations, Figs. 3.9 (a) and (b) show, respectively, the logical diagram and the message exchange of R 1 establishment, for the scenario illustrated in Fig. 3.8. To start the establishment of R1 , S1 sends REQ(1) to AS 5, inserting its own identifier S1 , the reservation identifier R1 and its bandwidth requirement, b{S,E1} , where {i, j} represents the link between routers i and j. REQ(1) is sent over TCP to the BGP next-hop E1, which inserts its own identifier and AS number, and forwards it to the next-hop E3. E3 again inserts into the route record its IP address, the AS number and also, the computed Wm , and forwards the message to E4. Hence, when D1 receives REQ(1), this message contains a route record with the format [E1/AS3], [E3/AS3/1], [E4/AS4], [E5/AS4/2]. D1 realizes that the request ends in AS 5 and therefore, uses the information collected to choose the aggregate that R 1 will be merged into. Because there is no adequate aggregate, D1 triggers the creation of a new aggregate, A 1 , and selects E5 as its starting point. To establish A1 , D1 sends RESV (1), requesting b{S,E1} on each link of the reverse path provided by REQ(1). When RESV (1) arrives at E5, the aggregate label is reset and RESV (1) is sent to the previous-hop, E4. E4 looks for an aggregate that might carry R1 until S1. Not finding any, E4 triggers the creation of another aggregate, A 2 , that extends all the way from S1 to E4, and updates the aggregate label in RESV (1) to A 2 . If RESV (1) succeeds in reaching S1, then the reservation is established. S1
R1
REQ(1)
REQ(R1 ,b=5,[E1,E3,E4,E5])
S1
REQ(R1 ,b=5)
E1 AS2 S2
A1 A2
AS3 E2
E3
E4
E5 AS4 E6
D1 AS5 RESV(R1, b=5)
Establishment
AS1
E1
E3
E4
E5
D1
REQ(1,[E1]) REQ(1,[E1,E3])REQ(1,[E1,E3,E4]) REQ(1,[E1,E3,E4,E5) RESV(1,[E1,E3,E4,E5])
D2 AS6 (a) Logical diagram
(b) Messaging sequence
Figure 3.9: Reservation R1 establishment.
Let us now consider a request R2 originating in AS 2 and destined to AS 6, as illustrated in Fig. 3.10: when D2 receives REQ(2), it triggers the creation of aggregate A 3 to E6, and sends RESV (2) to establish the reservation. When RESV (2) arrives at E4, this router realizes that R2 can be merged into the existing aggregate A2 , and therefore simply updates the resources of A2 . However, because A2 heads towards AS 1 and not AS 2, a new aggregate branch A4 is created from E3 to S2. This branch is directly merged into A2 at E3, so that E3 simply represents a merging point.
CHAPTER 3. SICAP
40
AS1
A2
RESV(R , b=1) 2
S1
A1 D1 AS5 E1
AS2 S2
E3
AS3 E2
E4
E5 AS4 E6
REQ(R2 ,b=1,[E1,E3,E4,E5])
D2 AS6
REQ(R 2 , b=1)
R2
Figure 3.10: Reservation R2 establishment.
3.3.3.2 Reservation Deletion
The explicit deletion of a reservation, whether it is individual or aggregate, is carried out by a TEAR message. The deletion of an individual reservation is done from the first-aggregator to the last-deaggregator as a consequence of an end-host request, and it represents an update of resources to an established aggregate. However, the deletion of an aggregate A is only triggered by its source, when its bandwidth is equal to zero, i.e., when the aggregate is empty. The TEAR message is then sent downstream, either until it reaches the destination of the aggregate, or until it reaches a BR where the aggregate holds more resources for the reservation that triggered the request, which implies that other reservations are being merged into the aggregate A, and therefore, the TEAR stops. The messaging sequence required to delete R1 is illustrated in Fig. 3.11: S1 sends T EAR(1), which carries the reservation identifier R1 and also b{S,E1} . Between S1 and E4, each router decreases the bandwidth of A2 in b{S,E1} units. When T EAR(1) reaches E4, the aggregate field is reset, and T EAR(1) is forwarded to the next-hop, E5. This router knows that R1 is mapped to aggregate A1 and therefore updates the aggregate label of T EAR(1) to A1 . E5 then decreases the bandwidth of A1 in b{S,E1} units.
S1
Deletion
TEAR(1)
E1
E3
E4
E5
D1
TEAR(1) TEAR(1) TEAR(1) TEAR(1)
Figure 3.11: Reservation deletion messaging sequence.
3.3. SICAP DESIGN AND OPERATION
41
3.3.3.3 Reservation Failure
As illustrated in Figs. 3.12 (a) and (b), a reservation failure can occur in either any of the two phases of the establishment of R1 . We first consider a failure during the probing phase of R1 , and assume that when REQ(1) reaches E3, this router realizes that there are not enough resources to satisfy R1 . Hence, REQ(1) is stopped and REJ(1) is sent towards S1 to notify this router of the failure, so that it can release the associated resources: BRs between S1 and E3 do not have yet any state related with R1 , since the failure occurred during the probing phase. Such is the case of E1, which simply passes REJ(1) to S1. Failure can also occur during the allocation phase, as illustrated in Fig. 3.12 (b). We assume that when RESV (1) reaches E3, this router notices that there are not enough resources on link {E1, E3}. As a result, E3 not only sends a REJ(1) message towards S1 , but it also needs to delete the partially established reservation towards D1. This is accomplished by sending a T EAR(1) message towards D1.
S1
E1
E3
E4
E5
D1
S1
E1
REQ(1)
Allocation
Probing
REQ(1) REQ(1) REQ(1) REJ(1)
(a) Probing phase
E3
E4
E5
REQ(1) REQ(1)
D1
REQ(1)
RESV(1)RESV(1)
REJ(1)
RESV(1) TEAR(1) TEAR(1)
TEAR(1)
(b) Allocation phase
Figure 3.12: Reservation failure messaging sequence.
3.3.4 State Management at Intermediate Deaggregation Locations In the shared-segment aggregation approach aggregates might not extend all the way until the destination of some of the E2E reservations they carry. Instead, they may end at an IDL AS. At IDLs, reservation requests have to be switched from an ending aggregate at the ingress router, to a new aggregate at the egress router. Therefore, aggregators at an IDL have to keep track of the mapping between individual reservations and aggregates. One way to achieve this is to keep each reservation identifier and resources at the aggregator. However, this solution incurs a significant overhead in the amount of state that must be kept, as we explained in chapter two. SICAP avoids this state penalty by keeping track of the mapping between aggregates and reservations at the level of destination ASs, rather than explicitly mapping individual reservations to aggregates. In other words, SICAP maintains per aggregate a list of the
42
CHAPTER 3. SICAP
destination prefixes advertised by the ASs an aggregate provides access to. As an example of how such information can be used to efficiently manage reservations, we again address the scenario illustrated in Fig. 3.8. During the establishment of R1 , and when REQ(1) arrives at D1, this router looks for the most specific advertised prefix that matches R 1 destination address. D1 then inserts the found prefix(es) in the RESV (1) message. When this message arrives at E5, SICAP updates the list of destination prefixes of A1 , adding to that list the prefix(es) contained in RESV (1). Thus, when R1 gets torn down and T EAR(1) arrives at E5, this router simply looks up the most specific prefix that matches the destination address carried by T EAR(1), at the set of destination prefixes kept per aggregate, and finds out that A 1 contains the most specific match. Therefore, A1 resources can be updated without mapping explicitly R1 to A1 . The state cost of this solution depends mostly on the number of prefixes each AS advertises. Broido et al. [1] present measurements of the Internet routing table, where from a possible universe of 12,399 ASs, the majority of ASs advertised a maximum of 99 prefixes 2 , which is a reasonable number, when compared to the much larger number of individual reservations crossing BRs.
3.4 SICAP and BGRP Comparison There are several measures of efficiency that can be used to evaluate the ability of an interdomain signaling protocol to reduce storage and processing cost at BRs. This cost is related to the number of aggregates that are maintained and to how often their state and bandwidth needs to be updated, which translates into the bandwidth efficiency and consequent signaling load of a solution. In the context of the regular mode of operation for both BGRP and SICAP, the bandwidth of an aggregate is updated per individual request, i.e., an aggregate’s bandwidth is equal to the sum of the bandwidth of its reservations. As a consequence of the update per individual reservation, both protocols achieve the same signaling load. Therefore, the performance parameter left to focus on is state, since this is the only efficiency parameter where these protocols may differ. To analyse state, we use an extended version of ns2, which incorporates BGRP and SICAP modules we developed. We present a simulation scenario first introduced in [37]. Such scenario helped to detect previously the shared-segment weaknesses. Its re-enactment will help to determine if SICAP is able to reduce state by not keeping information about individual reservations at IDLs. The scenario uses the 50 node AS-level topology illustrated in Fig. 3.13, and a distribution of requests where each node has the same probability of being a source, and where destinations are placed according to a distribution of addresses based on AS distance [47]. The arrival of requests is modeled as a Poisson process with average holding time σ. 2 Their measurements of the Internet routing table in December 2001, show that from a possible universe of 12,399 AS’s, 40% announced only one prefix. These prefixes however, represented only 4.9% of 102,394 prefixes. The data also sustains that the number of AS’s advertising over 100 prefixes is only 1%.
3.4. SICAP AND BGRP COMPARISON
43
21
31 44
6
41
24
22
46
34
14 9
29
7
0
1
25
20
11 38
5
18
13
2
27
4 33
23
3
42
16 40 48
39
49
8 30
10 28
17
43
35
26 36
19 12
47 15 32
45 37
Figure 3.13: Internet-like topology used in the simulations.
The results3 presented in Tab. 3.1 comprise the minimum, maximum and average state values, calculated within a 95% confidence interval. In order to exemplify three possible cases of requests, and to achieve a consistent comparison of the performance of the protocols, the duration of requests was varied while keeping the system load constant. Three scenarios were considered: short-lived requests, with an average duration of 20s; long-lived requests, with an average duration of 120s; mixed traffic, 50% of short-lived requests and 50% of long-lived requests. The results show that SICAP consistently outperforms BGRP, which confirms the former’s ability to reduce state by lowering the number of aggregates created, since the state associated holds with individual requests is the same for both protocols. The state ratio SICAP BGRP approximately the same value when the duration of requests changes, showing that the duration affects both protocols in a similar manner. It should be noticed however, that state varies as a function of the duration of individual requests: short-lived requests require more average state than any of the other types. This phenomenon is merely a consequence of the increased “load” associated with shorter duration requests, i.e., in order to keep the intensity of requests constant while varying the duration, it is necessary to generate more short-lived requests than either mixed or long-lived. Fig. 3.14 shows the difference in average state for different intensities. Each bar represents the average state value that a protocol requires for a particular type of requests, and for a particular intensity. Note that the difference of state between BGRP and SICAP does not grow proportionally to the intensity of requests, because that difference is only due to the number of aggregates created. However, the difference remains significant in terms of scalability, since state due to individual reservations is only kept at the end-points of a path, but state due to aggregates is kept in each BR crossed. 3 Further
results can be found in [39].
CHAPTER 3. SICAP
44
=3>
9':
34 33
;!4 66
;!4 66
7 54 ;= 7 54 3-9
53 33
34 33
;!4 ;6
9
9 :!4 5>
54 3;
9!4 > 8
;>!4 56
9!54 >8
< 3 33
554 ;
!4 8-
8->!4 36
5>-4 = 3
53 333
;=!4 ; >
6!4 :!5
:-4 >=
;-54
84 5;
56-4 =6
6!4 5
6
;;-4 85
;
54 >;
7 54 5;
=
8!54 5:
34 3;
34 =6
9!54 : >
9;!4 9-5
7 9!4 6!5
< 3 33
34 ;=
984 >=
9 =!4 =!5
9!4 5=
5:-4 9=
8-
6 34 9 :
59-4 ;9
53 333
; 3!4 ?=@ A > B
CD?EFG HI
JK7?I L I ?@ ?I KH> A M=NOP QR
SFH=7> A @ H> A M=TSUV
SFH=> A @ H> A M=TSUW
:#X$A =YI ?H@ ?
Z'A E=HG A =E(I ?[FY8> A M=
:*X+A =YI ?H@ ?
Z'A E=HG A =E,I ?[FY8> A M=
:#X$A =YI ?H@ ?
Z'A E=HG A =E,I ?[FY8> A M= c7] \7b
^\\
\] \\
\] \\
V ] ^`
\] \\
V ] ab
\] \\
^\\\
\] \\
\] \\
VW ] W ^
\] \\
Vd ] W7V
\] \\
c7^] d a
e\\\
\] \\
\] ` V
W e'] \7`
e'] ^c
W7d ] d ^
^\] bb
Wd ] bc
^\\\\
\] \\
^] V b
ec'] `b
W ] \7a
ea7] W \
a'] bb
ea'] cb
^\\
\] \\
\] \\
^] a d
\] \\
V ]e V
\] \\
c7] \ W
^\\\
\] \\
\] \\
V c'] VW
W ] \\
V c7] ^e
`'] V a
V c'] ec
e\\\
^ V ]V ^
^`'] ^ d
cc'] e'^
V a7] \7e
cc7] ^c
V b'] ^`
c\] d e
^\\\\
V `'] ` W
^b'] `c
V `'] ba
V ^] W e
V a7] cb
V \] `e
VW ] \7`
e\'_9] ^_
^\\
\] \\
\] \\
V ] ^^
\] \\
V ] Wd
\] \\
V ]a d
e\'_9^_
^\\\
\] \\
c'] a`
V e'] dW
\] W \
V a7] a W
c'] a`
V e'] dW
e\\\
Vd ] `e
a'] b W
c W ] ab
^ V ]d a
c W ] ba
^ V ]d a
cc'] Vd
^\\\\
a'] a`
^a'] VW
W a'] V \
VV ] e'^
c`7] b V
V e'] \\
cb'] \ W
\'] ^\'_
] ^_ T ^_
is the protocol offering the smallest increase in blocking probabilities, but BGRP is able to achieve the best signaling reductions. Results obtained with these simulations imply that first, the dynamic function achieves better results than the quantisation approach; second, over-reservation does help to reduce the signaling load at the expense of some blocking increase. For the quantisation approach, the performance decreases with the increase of Q, which is simply a consequence of asking for larger bandwidth shares, without considering the aggregate demand. Nevertheless, there are some issues that are not yet fully understood, namely, if one of the protocols achieves better performance and also, how some factors influence their behavior. Some of the issues that are not clear may be related to the use of a function that does not relate to the true demand of an aggregate. Therefore, we next present some simulations, but now using the over-reservation mechanism with history about the aggregate.
CHAPTER 4. OVER-RESERVATION
64
4.4.3 Over-Reserving with Aggregate Demand History In this section, we again run some of the previous simulations, but now rely on the estimation of the aggregate demand to compute bS (x). The goal here is to see if by adding history of the aggregate demand, we are able to achieve lower blocking probabilities. Hence, we again present simulations related to long-lived, and to short-lived requests. We also test the two over-reservation approaches already presented, the dynamic function b S (x), and the quantisation approach, but now only with Q = 2, given that we’ve shown previously that the increase of Q has a negative impact in the performance achieved by the quantisation approach.
4.4.3.1 Estimating the Aggregate Demand
The bandwidth estimation is essential to the efficiency of the over-reservation algorithm used, since a bad estimation may easily lead to resource starvation, or underutilisation of resources. The function used will rely on previous and current bandwidth samples, and try to estimate the near-future aggregate demand. It has to be able to: 1. react fast to bandwidth changes; 2. converge fast to the current demand; 3. smooth out possible bandwidth peaks. 4. avoid too much computation complexity. Considering the requirements mentioned, and considering that we want to be able to predict the aggregate demand based on previous and on current measurements, we opted to estimate the demand of an aggregate using a variation of the Exponential Moving Average (EMA) [45] estimation. The estimation is updated each time the aggregate experiences a change in its demand. The regular EMA estimation is given in (4.6), where e(n) represents the nth estimation of the aggregate demand at the nth sampling instant, t(n), and w is the EMA smoothing factor :
e(n + 1) ← (1 − w) ∗ e(n) + w ∗ bA (n), w ∈ [0, 1]
(4.6)
The regular EMA fills requisites 2), 3), and 4): it is able to smooth out short-term variations and also, to track long-term trend variations; it is simple, requiring only history about the previous estimation; however, it may be slow to adjust to changes. Also, the aggregate demand depends not only on the bandwidth changes, but also, on the duration of each sample, which the
4.4. PERFORMANCE EVALUATION
65
regular EMA cannot account for. Therefore, we present in (4.7) a possible function 4 to estimate the aggregate demand, that takes into consideration not only the bandwidth samples, but also the sampling intervals. We refer to this function throughout this document as EMAA:
ebA (n + 1) =
b0 (n + 1) ∆(n + 1)
(4.7)
where: b0 (n + 1) = (1 − w) ∗ b0 (n) + w ∗ [t(n + 1) − t(n)] , w ∈ [0, 1] ∆(n + 1) = (1 − w) ∗ ∆(n) + w ∗ [t(n + 1) − t(n)]
(4.8)
We start by analyzing the performance of over-reservation in a scenario where requests are long-lived. Since the aggregate demand estimation is simply used to compute b S (x), it has no influence whatsoever on the regular operation mode of each protocol. 4.4.3.2 Long-Lived Requests
Results obtained for SICAP and BGRP are respectively presented in Tabs. 4.11 (a) and (b). If we compare these results with the ones obtained when running the same simulation, but without aggregate demand history (cf. Tabs. 4.5, 4.6), we see that for this scenario the aggregate demand history does not bring any significant improvement: there are some differences, mostly because of the statistic nature of the results. The similarity between results obtained whether or not we use an estimation of the aggregate demand are due to the long-lived nature of requests: the aggregate demand is not frequently updated, so state variability is low. Hence, using a system with memory for this type of scenarios does not bring additional performance improvements. In order to assess if this is specific to this type of reservations, or if the aggregate demand estimation does not bring a significant improvement we next present a scenario where reservations are short-lived, which stands for a scenario with higher state variability and hence, where the estimation might prove to be more useful. 4.4.3.3 Short-Lived Requests
We again present summarized results for SICAP and BGRP respectively in Tabs. 4.12 (a) and (b). If we compare these results with the ones obtained when running the same simulation, but without aggregate demand history (cf. Tabs 4.9, 4.10), we can conclude that the 4 In
[41], we compared the performance of this function with another EMA variation. The function here described was the one that achieved the best performance.
CHAPTER 4. OVER-RESERVATION
66
Table 4.11: Long-lived requests, relying on aggregate demand history
f*g 87
7 7
h ij ki'l m j n
oDkpqr st
uDvkt w t kl k8t vsj m xiyz{ |}
~qsij m l sj m xi~
f*m i'
t ksl k
'm pisr m i'p(t kq
j m xi
f*$m i
t ksl k
7m pisr m ipt k'q
8j m xi
'
' '
'
8'
'
'
'
7 8
'
7
'
'
'
'
'
'
8'
'
8
'
7 8
7
7
7
'
7
8'
'
'
'
7
'
'
7
7 8
'
7
'
' 7
7
7 8
'
(a) SICAP
* µ¶ ·8µ7¸
¶ ·¸ ® ·¸
'
D ¡ ¢£
¤D¥£ ¦ £ 8£ ¥¢ §¨©ª «¬
¢ ¢ §®¯°
*± '²£ ¢
³' ¢¡ '(£ ´ ² §
*±$ ²£ ¢
³7 ¢¡ £ ´' ²8 §
·µµ
µ¶ µµ
µ'¶ µµ
·¶ ¹µ
µ'¶ µµ
·¶ º»
·µµµ
µ¶ µµ
µ'¶ µµ
·º7¶ µ¼
µ'¶ µµ
° ·¶ ½½
»µµµ
µ¶ µµ
µ'¶ µµ
¼ ° ¶ ¼½
µ'¶ µ °
¼»7¶ ¹½
·µµµµ
µ¶ µµ
µ'¶ °¾
¼7º7¶ ¾ »
µ'¶ ¹¹
»7·¶ » °
·µµ
µ¶ µµ
µ'¶ µµ
·¶ µ¼
µ'¶ µµ
·¶ »¼
·µµµ
µ¶ µµ
µ'¶ µ¿
·»7¶ »»
·¶ ¼ ¾
·º7¶ µµ
»µµµ
··¶ µ¿
· ° ¶ µ¿
° ½7¶ ¹»
·½'¶ ½ ¾
¿7·¶ ·8¼
·µµµµ
° ½7¶ µµ
··¶ ½½
°¾ ¶ ° ¿
· ¾ ¶ ºµ
°¾ ¶ »¹
»µ7¸¶ ·¸
·µµ
µ¶ µµ
µ'¶ µµ
·¶ ¿µ
µ'¶ µµ
·¶ ¼½
»µ7¸·¸
·µµµ
µ¶ µµ
µ'¶ µµ
·¹7¶ ¿ ¾
µ'¶ µ½
·º7¶ ¹7·
»µµµ
µ¶ °¾
»7¶ ½ °
¿º7¶ ½¿
· ° ¶ ¼7º
¼µ'¶ ° ·
·µµµµ
¾ ¶ µµ
·8µ'¶ · ¾
¼ ° ¶ ½¼
½7¶ ·8µ
° ¼'¶ ° µ
(b) BGRP
4.4. PERFORMANCE EVALUATION
67
Table 4.12: Short-lived requests, relying on aggregate demand history
À*Á
 ÃÄ ÅÃ'Æ Ç Ä È
ÉDÅÊËÌ ÍÎ
ÏDÐÅÎ Ñ Î ÅÆ Å8Î ÐÍÄ Ç ÒÃÓÔÕ Ö×
ØËÍÃÄ Ç Æ ÍÄ Ç ÒÃÙØÚÛ
À*ÜÇ Ã'ÝÎ ÅÍÆ Å
Þ'Ç ÊÃÍÌ Ç Ã'Ê(Î ÅßËÝ Ä Ç ÒÃ
À*Ü$Ç ÃÝÎ ÅÍÆ Å
Þ7Ç ÊÃÍÌ Ç ÃÊÎ Åß'ËÝ8Ä Ç ÒÃ
âàà
àá àà
à'á àà
ââá äå
à'á àà
â Û á æç
âààà
àá àà
à'á àà
Û ä7á äà
à'á àà
å7âá àå
èààà
àá àà
à'á àà
åè7á àç
à'á àà
åè7á èå
âàààà
àá àà
à'á àà
åè7á æ7â
à'á àà
åé7á â Û
âàà
àá àà
à'á àà
ââá Û ç
à'á àà
â Û á æ7â
âààà
àá àà
à'á àà
Û ä7á âå
âá èæ
åà'á çà
èààà
â Û á çà
æ7á ÛÛ
å'âá âå
â Û á èà
Û å7á êè
âàààà
Û ê7á Û ç
â8à'á èè
Û æ7á àå
ââá éä
âé7á å Û
èà7ãá âã
âàà
àá àà
à'á àà
ââá é7â
à'á àà
â Û á èê
èà7ãâã
âààà
àá àà
à'á àà
Û ä7á æà
à'á àä
å7âá âè
èààà
Û æ7á äè
Ñ Û è7á éæ
åç'á äé
Ñ Û àá èê
Û ä7á ç Û
âàààà
âæ7á ä7â
Ñ âé7á êå
Û æ7á éå
â Û á à7å
Û à'á à Û
àá â8à7ã
á âã Ù âã
(a) SICAP
ë*ì
#
$ # $#
í îï ðî'ñ ò ï ó
ôDðõö÷ øù
" # #
úDûðù ü ù ðñ ð8ù ûøï ò ýîþÿ
ë ò îù ðøñ ð
! ü ü !
'ò õîø÷ ò î'õ(ù ð ö ! " " " !! " ! " ! " #
öøîï
ï ò ýî
ò ñ øï ò ýî
ë $ò îù ðøñ ð
! ! ! " ü !
7ò õîø÷ ò îõù ð 'ö8ï " ! ! ! # ! # ! " !
ò ýî
(b) BGRP
estimation function introduces benefits, specially for traffic with mixed requirements. To give a specific example, let us look at the entries corresponding to the third bandwidth distribution and to an intensity of 5000 requests: the BP increase is now a negative value, both for SICAP and for BGRP. This means that over-reservation helped to reduce the global blocking, even when compared to the regular mode operation of the protocol. We believe that this is a consequence of two different factors. The first is that short-lived requests imply more frequent updates to aggregates, meaning that we get more opportunities to release over-reserved bandwidth. The second reason is, as mentioned earlier, related to the nature of the mixed bandwidth type of reservations, i.e., requests have very different bandwidth requirements. Over-reservation is possibly helping to accommodate requests with smaller bandwidth requirements, e.g. 0.1% of the link bandwidth, at the expense of blocking additional requests requiring a larger amount of bandwidth.
68
CHAPTER 4. OVER-RESERVATION
The results let us conclude that over-reservation can benefit from the use of an aggregate demand estimation, specially for cases where demand fluctuates most. The aggregate demand estimation function does not bring severe additional complexity, as explained, and we’ve shown that even for cases where state is more stable, the function will work properly. With the simulations presented so far, we showed that it is possible to reduce the signaling by performing over-reservation. This reduction was achieved in terms of REQ/RESV messages, which are the ones that contribute most to the signaling load, since they must be exchanged for each individual reservation establishment (TEAR messages are optional for both protocols). We’ve tested two different approaches, a dynamic and a static one, reaching the conclusion that the dynamic function provides better results. However, over-reservation may be further improved by delaying the release of resources. Hence, before presenting detailed conclusions about the performance of each protocol when over-reserving, we next present simulations that analyze the possible benefits of a delayed release. Since we have shown that relying on the history of the aggregate demand yields better results, we will rely on that estimation to compute bA (x) in the next simulations. We will also only present results obtained for long-lived requests, given that the previous simulations showed that a session’s holding time does not have a major impact in the performance of a protocol, when performing over-reservation.
4.4.4 Delaying the Release of Resources We have shown that over-reservation can achieve positive results in trading off decent reductions in the signaling load at the expense of small increases in the blocking probability. So far, the reduction in signaling was achieved through explicit over-reservation. However, such reduction can also be achieved by delaying the release of resources that are not in use anymore. This also offers the advantage of possibly reducing the number of TEAR messages. This simulation uses function bf (x) given in (4.3) to compute how much to release: when a TEAR i arrives at a BR asking for the release of bi units, the delayed release mechanism checks if the release should be triggered. If so, then it releases bf (x) bandwidth units. It should be noticed that this share can be less than the bi units. The decision is made aiming at achieving some balance between the aggregate reserved bandwidth, and the link capacity, as explained before in section 4.3.2.2. Tab. 4.13 presents results for the situation where the delayed release is applied alone to the regular mode of operation of the protocols. In general, the delayed release with function bf (x) reduces the number of exchanged messages. The reduction is significantly better for BGRP. This happens because BGRP only creates one aggregate on the path. The delayed release starts to create reserved resources near reservation sources and hence, allows BGRP to stop propagating messages earlier than SICAP. For SICAP, this kind of mechanism will possibly
4.4. PERFORMANCE EVALUATION
69
Table 4.13: Delayed release only %'&
( )* +#), - * .
/0+#123 45 %9:
PQ RP >
RPP RPPP UPPP RPPPP RPP RPPP UPPP RPPPP RPP RPPP UPPP RPPPP
PQ PP PQ PP PQ PP PQ PP PQ PP PQ PP RRQ PX YTQ PP PQ PP PQ PP PQ YS SQ PP
Q R >\[ R >
U#P > Q R > #U P > R >
6( 798;:=< >@?43 2+, A %9: 6- 1)43 - )1D5 +#E2F* - G) ( )F#5 +#4, + /0I9J /0I96K L I'8;/ 0 PQ PP SQ TU SQ TU V UQ SV PQ PP RRQ ZU RRQ ZU S UQ XR PQ PP RSQ TY RSQ TY W PQ XW PQ PP RTQ R#T RTQ R#T W RQ PX PQ PP SQ R#U SQ R#U Y Q TY PQ PP YWQ WW YWQ WW W Q ZW YPQ XX XWQ YU U#PQ RR X UQ UY YYQ RU XWQ VR UZQ RR VV Q ZX PQ PP RRQ TU RRQ TU R#Z Q WS PQ PP XYQ PV XYQ PV R#W Q R#Z WQ ST VPQ TZ VVQ UW YZ Q WP R#WQ U#V VYQ ZP UXQ XZ VPQ ZU
%9B /;:=< >C?43 2+, A %: 6- 1)43 - )1H5 ( )F5 +4, + :/0M %I PQ PP WQ XT PQ PP XRQ YS PQ PP UTQ T#V PQ PP SUQ Z#P PQ PP RQ R#X PQ PP YZQ SS RRQ WY VSQ VR WQ WV VPQ ZW PQ PP VQ XZ PQ PP XXQ PW VQ WU SVQ PT SQ XT SZQ PP
+E2F* - G) B /08 NOL WQ XT XRQ YS UTQ T#V SUQ Z#P RQ R#X YZQ SS SXQ X#P WPQ ZZ VQ XZ XXQ PW SWQ X#P WWQ SR
L9I8;/ YXQ VP VZQ WV SUQ VR STQ TZ YQ PP R#WQ UV VYQ TT VWQ YX RRQ YU YUQ ZZ VVQ TW UYQ TY
create resources for the first aggregate on the path. However, the remainder aggregates won’t have reserved resources and therefore, the signaling has to be propagated further downstream. Blocking increases when compared to the regular operation mode, again only for saturation situations, i.e., where blocking already occurred.
In terms of the performance achieved by delaying the release for each protocol, the delayed release per se is not a good solution for SICAP. On the other hand, BGRP seems to profit a lot from the delayed release. In general, BGRP seems to profit more than SICAP from the use of the delayed release mechanism, as it translates into lower BP increases. BGRP attains also a better signaling reduction. We believe that this behavior is a direct consequence of the fact that BGRP only creates one aggregate for a path. The delayed release gives rise to over-reserved resources at the reservation origin, which then allows BGRP to stop propagating messages on the entire path. In contrast, for SICAP this mechanism creates over-reserved resources only on the first aggregate of the path. Possibly, remainder aggregates won’t yet have reserved resources and therefore, signaling messages need to be propagated further downstream. This not only results in more messages being exchanged, but it is also likely that subsequent requests find resources in some aggregates of their path, but not in all. Hence, it appears that the delayed release alone is not a good mechanism to be applied to SICAP. This is corroborated by the fact that SICAP now not only achieves the lowest signaling reduction, but also the highest blocking. On the other hand, it seems that the delayed release approach provides a significant improvement for BGRP.
Before analysing if there is a global best option, we present a third approach to overreservation, a combination of over-reserving by using requests already being transmitted and of delaying the release
CHAPTER 4. OVER-RESERVATION
70
Table 4.14: Over-reservation and Delayed Release ]'^
_ `a b#`c d a e
f0b#ghi jk ]9o
~ ~ q
~~ ~~~ ~~~ ~~~~ ~~ ~~~ ~~~ ~~~~ ~~ ~~~ ~~~ ~~~~
~ ~~ ~ ~~ ~ ~~ ~ ~~ ~ ~~ ~ ~~ ~ ~~ ~ ~~ ~ ~~
q\ q
#~ q q # ~ q q
l_ m9n;o=p q@rji hbc s ]9o ld g`ji d `gDk b#uhva d w` _ `v#k b#jc b f0x9y f0x9lz { x'n;f 0 ~ ~~ ~ ~~
~
~
~ ~~ #~ #~ ~ ~ ~~ ~ ~~ # # ~ ~~
~
~
# #~
#~
~ ~~
#
#
~ ~ ~~
~
#
#
~ #
]9t f;o=p qCrji hbc s ]o ld g`ji d `gHk _ `vk bjc b of0| ]x ~ ~~
~ ~~
~ ~ ~~ ~ ~ ~~
~ ~~
~ # ~
~ ~ ~~ ~ ~~
~ ~
buhva d w` t f0n }O{
~ ~
#~
#~
#~
{9xn;f
#
~
~ #~ # ~
~
4.4.5 A Hybrid Approach: Over-reserving and Delaying the Release Previously, we presented simulations that showed how over-reservation can be performed, using two different approaches. Let us next look at the performance obtained by combining explicit over-reservation and delayed release. The goal is to understand if this combination introduces improvements either when compared to explicit over-reservation, or to the delayed release alone. Results for this approach are presented in Tab. 4.14. In general, the signaling reduction achieved by this combined approach is better than when either simply using one of the two other methods. The picture is, however, slightly different when it comes to blocking probability, as there are a number of scenarios where combining over-reservation and delayed release actually increases blocking. This is particularly the case for BGRP, where the increase in blocking is consistent across all scenarios, while SICAP does not really suffer from such problem. 4.4.6 Approaches’ Comparison In order to present a combined and final picture of how the different analyzed mechanisms affect the performance of the two protocols, we summarize the results obtained in two main tables: Tab. 4.15 gives global results for SICAP, while Tab. 4.16 holds the results for BGRP. From Tab. 4.15, we see that the best approach for SICAP is the hybrid approach: This approach generates a slightly higher blocking in highly congested scenarios, but this is more than compensated by a substantially higher decrease in signaling load, often in the 50% range. If we look at Tab. 4.16 (BGRP results), it shows that the delayed release used alone is the approach that appears to achieve the best trade-off between higher blocking and lower signaling load. It achieves sizable reductions in signaling load, especially at high intensities, which are in fact the cases where the signaling reduction matters most. The reductions are close to those
4.4. PERFORMANCE EVALUATION
71
Table 4.15: SICAP global results '
#
¦§ ¨¦©
¨¦¦ ¨¦¦¦ ¯¦¦¦ ¨¦¦¦¦ ¨¦¦ ¨¦¦¦ ¯¦¦¦ ¨¦¦¦¦ ¨¦¦ ¨¦¦¦ ¯¦¦¦ ¨¦¦¦¦
§ ¨#©\² ¨#©
¯#¦©³§ ¨© ¯#¦©³¨©
0# 9¡ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¨¨§ ®± ®±§ ¯¨ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ±§ ¨«
; 9¡¢ £ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ °§ ª° «§ «¯ ¦§ ¦¦ ¦§ ¦¦ ¨§ °« «§ °¨
¤ H £ ª§ «¨ ®¯§ ® °±§ ¨¦ °«§ « § ¬ª ®¯§ ¨¦ ®ª§ °ª ®¯§ ¨#± § ¬ ®¯§ «¦ °°§ ¨« °°§ ¨#¯
; ¡¥ £ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ®¦§ °° ®®§ ¨#¯ ¦§ ¦¦ ¦§ ¦¦ ¬§ ± ¨#¬§ ¯«
¤ D £ ¨#ª§ ¬ ®ª§ ¬« °«§ ±¬ °¯§ ¦ ¯§ ¦« ®¨§ ¨#¬ «¦§ ª± «¬§ ¨#¯ ¨«§ «ª ®¬§ ¦ª °§ °ª «¯§ ¬¯
; 9¡¢ £ # ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ ¦§ ¦¦ § «¬ ¨#°§ ª#¦ ¦§ ¦¦ ¦§ ¦¦ ¨§ °¦ ¯§ ¨#¯
\ # ¤ D #£ ®#«§ ª «¯§ «¯ ¯®§ «° ¯°§ ±° ª§ ±« ®±§ °® °¬§ ±¨ «°§ °± ¨«§ °« ®§ ¨#° °¯§ ¯¦ «¦§ ¦®
Table 4.16: BGRP global results ´'µ
¶ ·¸ ¹#·º » ¸ ¼
ÍÎ ÏÍÐ
ÏÍÍ ÏÍÍÍ ÔÍÍÍ ÏÍÍÍÍ ÏÍÍ ÏÍÍÍ ÔÍÍÍ ÏÍÍÍÍ ÏÍÍ ÏÍÍÍ ÔÍÍÍ ÏÍÍÍÍ
Î Ï#Ð\Ù Ï#Ð
Ô#ÍгΠÏÐ Ô#ÍгÏÐ
½0¹#¾¿À Á ´9Ê ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÏÏÎ ÒØ ÒØÎ ÔÏ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ØÎ ÏÖ
Ã;Ĺ Š ¹º ¹Â ÄÁ¸ ´9Ê¢» ·Ë ¹Áº ¹ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÒØ ÍÎ ÍÍ ÍÎ Í× Ï#ÒÎ Í× ÏÏÎ ÓÓ ÍÎ ÍÍ ÍÎ ÍÍ ÔÎ ÓÒ ÏÍÎ ÏØ
» Æ·
Ì» ¾·ÁÀ » ·¾H ¹È¿Ë¸ » Æ· ÏÎ ÑÍ Ï#ÕÎ ÍÖ ÖÒÎ ÖÓ ÖÕÎ ØÔ ÏÎ ÍÖ Ï#ÔÎ ÔÔ ÒÓÎ ÑÔ ÒØÎ Ò× ÏÎ ×Í Ï#ÑÎ ×Ø ×ÕÎ Ó× ÖÒÎ ÓÖ
Ç ¹À Á¼¹È½;¹À ¹Áº ´Ê¥» ·Ë ¹Áº ¹ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÍÎ ÍÍ ÏÏÎ ÑÒ ÑÎ ÑÖ ÍÎ ÍÍ ÍÎ ÍÍ ÖÎ ÑÔ ØÎ ×Ó
¹
Ì» ¾·ÁÀ » ·¾D ¹È¿Ë¸ » Æ· Ï#ÒÎ ÑÒ ×ÑÎ ÖÒ ØÏÎ Í× ØØÎ ÓÕ ÏÎ ÖÒ ÒÔÎ ØÒ ÔÍÎ ÓÑ Ô×Î ÍØ ØÎ ØÓ ×ÍÎ ÑÏ ÔÓÎ ÑÔ ØØÎ ÖÓ
Ã;Ĺ Š ¹º ¹Â ĹÈÁ·Èȹ#À Á¼¹#ÈÉ ¹À ¹#Áº ¹ ´9Ê¢» ·Ë ¹#Áº ¹ Ì» ¾·ÁÀ » ·¾D ¹#ȿ˸ » Æ· ÍÎ ÍÍ ÏÏÎ ÓÔ ÍÎ ÍÍ Ö×Î ÒÑ ÍÎ ÍÍ ÑÏÎ ØÔ ÍÎ ÍÍ ÑÓÎ ÑÕ ÍÎ ÍÍ ×Î ×Ñ ÍÎ ØÓ ÒÑÎ Ø× Ï#ÓÎ ÍÔ ÔÔÎ Ò× ÏÖÎ ×#Ö ÔÑÎ ÑÕ ÍÎ ÍÍ ÖÎ ÕÔ ÍÎ ÍÍ ×ÏÎ ÏÏ ÓÎ ÕÕ ØÏÎ ÑÍ ÏÍÎ ×#Í ØÕÎ ÒÔ
achieved by the hybrid approach. In the case of BGRP, over-reservation alone consistently under-performs the delayed release, while this was not the case for SICAP. The blocking penalty associated with over-reserving, while real, is not overwhelming (about 12% and 14% for SICAP and BGRP, respectively, in the worst case), and both protocols see substantial reductions in their signaling load. More important, in both cases, the reductions are most significant under high intensity scenarios, which is when the original signaling load is at its highest. It seems that the relatively minor increase in blocking is a price worth paying for the improvement in scalability that over-reservation affords. In Tab. 4.17, we present side-by-side the approaches for which each protocol presented the best performance. We aim to analyse if one of the protocols profits more from the use of over-reservation than the other. First looking at the achieved blocking probabilities, SICAP presents the best performance, given that there is only one case where it achieves a higher blocking increase, when compared to BGRP. Now looking at the signaling reduction, what we have is that SICAP presents the best performance when the intensities of requests are low. For higher intensities, BGRP achieves a better signaling reduction. However, the cases where
CHAPTER 4. OVER-RESERVATION
72
Table 4.17: Best Approaches Ú'Û
Ü ÝÞ ß#Ýà á Þ â
ã0ß#äåæ çè
ö÷ øöù
øöö øööö ÿööö øöööö øöö øööö ÿööö øöööö øöö øööö ÿööö øöööö
ö÷ öö ö÷ öö ö÷ öö ö÷ öö ö÷ öö ö÷ öö øø÷ ú ú ÷ ÿø ö÷ öö ö÷ öö ö÷ öö ÷ øû
÷ ø#ù ø#ù
ÿ#öù³÷ øù ÿ#öù³øù
éÜ ê9ë;ì î;ï ßè ð è ßà ßè ï çÞ Ú9ì¢á Ýôè ßçà ß ö÷ öö ö÷ öö ö÷ öö ö÷ öö ö÷ öö ö÷ öö ý÷ ûþ ø ÷ ü#ö ö÷ öö ö÷ öö ø÷ ö ÿ÷ ø#ÿ
á ñÝçÝòó ß#æ çâß#ò\è ßæ ß#çà ß éá äÝçæ á ÝäHè ßòåôÞ á ñÝ úû÷ üý ûÿ÷ ûÿ ÿú÷ û ÿ ÷ ü÷ û ú ÷ ú þ÷ ø û ÷ øû÷ û úý÷ ø ÿ÷ ÿö ûö÷ öú
Ú9í ã0ì ó ßæ çâßò\ã;ßæ ßçà Úìõá Ýô#è ß#çà ß ö÷ öö ö÷ öö ö÷ öö ö÷ öö ö÷ öö ö÷ öö øø÷ þú þ÷ þû ö÷ öö ö÷ öö û÷ þÿ ÷ ý ß
éá äÝçæ á ÝäDè ß#òåôÞ á ñÝ øú÷ þú þ÷ ûú ø÷ ö ÷ ýü ø÷ ûú úÿ÷ ú ÿ#ö÷ ýþ ÿ÷ ö ÷ ý #ö÷ þø ÿý÷ þÿ ÷ ûý
BGRP achieves a better signaling reduction are also the cases where there is occurrence of high blocking. The bigger reduction is coupled with the fact that more requests are being blocked. Even though SICAP seems to present a better global behavior, we cannot clearly state if one of the protocols achieves a better performance when over-reserving than the other, without some future work on this explicit subject. Before presenting global conclusions, we address in the next section general implementation issues, and possible enhancements to this mechanism.
4.5 Enhancements and Implementation issues In this section, we discuss functional implications of the use of the over-reservation mechanism described and analysed in the previous sections. Then, we address possible enhancements. 4.5.1 Implementation Issues We start by explaining issues inherent to the over-reservation mechanism, and that affect in similar ways both protocols, to then detail issues specific to SICAP. We do not explore specific implementation details related to BGRP, because that work has already been done up to a reasonable extent by Nikolouzou et al. [8]. 4.5.1.1 Route Aggregation and Overlapping Prefixes
An over-reservation mechanism requires not only that resources are available in advance, but also, a way to identify an already established and sufficiently provisioned aggregate, that
4.5. ENHANCEMENTS AND IMPLEMENTATION ISSUES
73
AS4 192.168.0.0/16 AS6
AS3
192.168.2.0/24 AS5
AS1
192.168.0.0/16
AS2
192.168.0.0/16
192.168.0.0/16
AS1
A1
192.168.0.0/16
R1: 192.168.4.10
R1: 192.168.4.10
AS2
R2: 192.168.2.1
(a)
A1
AS4
AS3 AS5
R2: 192.168.2.1
(b)
Figure 4.5: IP prefix overlapping examples.
a reservation can be merged into: such identification is easily performed by SICAP, since it already requires the mapping of each aggregate with a set of destination network prefixes the aggregate provides access to BGRP also supports this enhancement, as mentioned. Let us consider the specific situation illustrated in Fig. 4.5 (a), where R 1 and R2 are reservation requests originated in AS 1, and with respective IP destination addresses 192.168.4.10, and 192.168.2.1. The destination AS for both these requests is AS 6, which advertises two prefixes for two different AS paths: 192.168.0.0/16 for the AS path {AS1, AS2, AS3, AS4, AS6}, and 192.168.2.0/24 for the AS path {AS1, AS2, AS3, AS5, AS6}. We assume that an aggregate A1 is already in place from AS 1 to AS 6, and that the prefix 192.168.0.0/16 is mapped to A 1 . In the regular SICAP operation mode, R1 would be mapped into A1 , and R2 would originate the creation of a new aggregate on the AS path {AS1, AS2, AS3, AS5, AS6}. However, with over-reservation and assuming that A1 has sufficient resources to automatically provision R1 and R2 at AS 1, the aggregation of the prefixes advertised by AS 6 - the advertisement reaches AS 1 as a single prefix, 192.168.0.0/16 - would generate an error situation: R2 would be incorrectly merged into A1 , since in fact it should follow a different path. To avoid this situation, we follow the solution [18] used in the context of the BGRP quietgrafting mechanism. That solution is to exclude possible prefixes that are contained in a more specific match at the destination AS, but that are not a match to the required IP destination address. In the given example, that would mean to exclude the block 192.168.2.0/24 from 192.168.0.0/16, thus avoiding the incorrect mapping of A 1 to the prefix 192.168.2.0/24 in AS 1. However, we add a note to the mentioned solution: excluding prefixes only at the destination AS will not solve overlapping situations such as the one illustrated in Fig. 4.5 (b), where the route aggregation occurs at an AS upstream of the destination AS. To avoid this problem, it is necessary to look for more specific prefixes that are not a match to a certain IP address, each time a request crosses a BR. This may have serious implications in terms of processing cost,
CHAPTER 4. OVER-RESERVATION
74
BR1 IP
BR2 IP
ASN 1
ASN 2
W1
W2
...
(a) regular format
BR1 IP
BR2 IP
ASN 1
ASN 2
W1
W2
b s(1)
b s(2)
...
(b) enhanced format
Figure 4.6: Route Record.
and might be a major drawback of over-reserving. We consider such study to be future work.
4.5.1.2 Support for Over-Reservation in Messages
Over-reservation can be performed by each of the protocols without involving changes to the regular format of messages, if only the BR that receives a request is allowed to request for bandwidth. However, if any BR i on the path is allowed to perform that request, then it is necessary to keep a record of each BR i request, bS (i), on the message, so that the lastdeaggregator can compute how much to reserve, based on the previous requests. This means that the regular route record field would have to be changed to include the values of b S (i). Figs. 4.6 illustrate both the regular route record (a), and the new format (b). This route record can be applied transparently to any of the protocols, since it only means changing the route record, which is already a field of variable size.
4.5.1.3 State Management at the Last Deaggregator and Intra-domain Issues
One of the issues with over-reservation, is that the REQ messages (or PROBE, in the case of BGRP) may not travel all the way until the last-deaggregator. Therefore, that BR may not have knowledge about all the reservations, which may be a problem, depending on how the interaction with the intra-domain mechanism takes place. We address this issue, for both SICAP and BGRP, sending a specific probing message to the last-deaggregator, that will notify this router of the requirements of the individual reservation. Without this message, it is not possible to ensure that the reservation will be correctly established, since the intra-domain mechanism will not be able to check resources on the segment between the last-deaggregator and the destination end host.
4.5.1.4
SICAP Specific Issues
In this section, we discuss implementation issues that are specific to SICAP, as a consequence of the use of the shared-segment aggregation approach.
4.5. ENHANCEMENTS AND IMPLEMENTATION ISSUES
75
When performing aggregation based on shared-segments aggregates may not extend all the way to a request destination. Over-reservation is performed per aggregate and there may be more than one aggregate along the path of a reservation. Hence, it is necessary to verify the resources of each aggregate on the path. This process may increase the signaling load required by SICAP, because even though an aggregate may have sufficient resources to accommodate a request, its path may represent only one segment of the whole path; thus, it is also necessary to check the other segments of the path. So, messages have to travel through all the path, either until they reach the source of the segment that provides direct access to the destination AS, or the destination AS. But, even though they travel through all the path, they may be processed only at the source (or on a segment) of each aggregate path. Jumping segments
Some inconsistent situations may arise when performing overreservation, due to the way that an aggregate is chosen to carry a reservation. SICAP maps an aggregate to a set of destinations that the aggregate provides access to. For the case of overreservation, the choice of an aggregate is performed without having all the information about the path of the request. However, when SICAP performs over-reservation, requests going to the same destination may decide to use different aggregates, simply because the decision on the choice of the aggregate to use is done without full knowledge about the path. In contrast, in the regular operation mode, the choice of the aggregates’ starting points is always performed having full knowledge about the path of a request and consequently, requests with the same destination AS always share the same set of aggregates. In Fig. 4.7, we give an example of the occurrence of a possible inconsistent situation. Ellipses represent different ASs, and there is an aggregate A1 in place from AS 4 to AS 5, as illustrated in Fig. 4.7 (a). When a request R1 arrives at AS 4, it can be merged into A1 , since that aggregate is sufficiently provisioned to carry R1 until its destination. This triggers the creation of a new upstream branch of A 1 , from AS 3 to AS 4. Next, request R2 - starting in AS 6 and destined to AS 4 - is sent along its path. When it reaches the last-deaggregator in AS 4, it triggers the creation of two new aggregates, A2 and A4 . A2 goes from AS 3 to 4, and A4 goes from AS 6 to 3, as illustrated in Fig. 4.7 (b). Let’s suppose that a third request, R3 , which is originated in AS 1, and destined to AS 5, is propagated until AS 3 - since there is no aggregate in place in AS 1. In AS 3, there are two possible aggregate candidates, A1 and A2 . Assuming that none of the aggregates has enough resources immediately provision R3 , the request is sent downstream, until it reaches the lastdeaggregator as shown in Fig. 4.7 (c). At AS 5, The last-deaggregator chooses to merge R 3 with aggregate A1 , and chooses as previous intermediate deaggregation location AS 4. Therefore, A 1 resources are updated until the egress router of AS 4, where R3 is sent to the ingress router, so a new aggregate is again chosen among the possible candidates. Here, the new aggregate chosen would be A2 . The inconsistent situation arises from such choice: if R3 would have been merged in AS 3 while on the probing phase of its establishment, i.e., while traveling downstream, the choice at this router would have been to merge it into aggregate A 1 . However, because the Avoiding Inconsistent Situations
76
CHAPTER 4. OVER-RESERVATION
request was propagated through all the path, aggregate A2 was chosen instead of aggregate A1 . If an update related to R3 is sent after its establishment, then there will be two possible aggregates, A1 and A2 , that can be chosen as the aggregate of R3 , in AS 3. To avoid such type of situations, SICAP always “follows” the first choice made, i.e., it only creates an aggregate if there is not one aggregate that provides access to the request destination and also, that reaches one of the possible intermediate deaggregation locations. Therefore, in the given example, and for request R3 , SICAP would always choose aggregate A1 , ignoring the fact that AS 4 should be chosen as an intermediate deaggregation location for R 3 , since that aggregate also reaches the previous possible intermediate deaggregation location, AS 3. 4.5.2 Enhancements The over-reservation mechanism can profit from some simple enhancements. As an example, we provide the description of a possible negotiation that allows each BR on the path to request a bandwidth share on behalf of an aggregate. We also present a refresh mechanism for the over-reserved resources of an aggregate. 4.5.2.1 Negotiating How Much More to Ask
In section 4.3.1.1, we presented bS (x), a function that computes how much more to ask on behalf of an aggregate x. That additional request was performed by the first BR on a path that could identify an aggregate into which the reservation could be merged, but which did not have enough resources to immediately satisfy the reservation. To further enhance the overreservation mechanism presented, any BR on the path can use the same request message to ask more bandwidth on behalf of the candidate aggregate, if desired. In other words, a single request can be used by all the BRs along the reservation path to request more bandwidth for the aggregate. To better explain this enhancement, we use the scenario given in Fig. 4.8 (a), where an aggregate x is represented. Along its path, BRs 1, 2, and 3 request b S1 = 10, bS2 = 5, bS3 = 7 bandwidth units, respectively, besides the original br = 3 units. We use the notation already presented to describe the resources allocated to an aggregate, i.e., b :b A (x)/bR (x). Because an aggregate bandwidth is additive, over-reserved resources of an aggregate x at a BR i are seen as effectively being used at BRs downstream of i. Thus, for aggregate x, and P as illustrated in Fig. 4.8 (b), each BR i allocates br + i−1 b (x) bandwidth units to aggregate j=0 Sj x. For instance, the last-deaggregator D would allocate to aggregate x the sum of the requests: at BR 3, aggregate x would see an increase of 10 + 5 + 7 = 22 bandwidth units in its used bandwidth, from which 7 units are over-reserved. At BR 2, the corresponding increase would be of 10 + 5 = 15 bandwidth units, where 5 are over-reserved. Finally, at BR 1, there would be only an increase on the over-reserved resources of 10 units.
4.5. ENHANCEMENTS AND IMPLEMENTATION ISSUES
AS6
77
W= 7 6
A4 R1
AS7 W = 10 7
A1
AS1 W= 4
W= 9
AS2
AS4
AS3 W= 7
2
3
1
AS5
W= 5
W= 2
4
5
AS8
(a)
R2
AS6 W = 7 6
A4 R1
AS7 W = 10 7
W= 4
A1
A2
AS1 W= 9
AS2
2
AS3 W= 7
AS4 W= 5
3
1
AS5 W= 2
4
5
AS8
(b)
AS6
R2 W= 7 6
A4 R1
AS7 W = 10 7
A2
AS1 W= 4
W= 9
AS2
2
AS3 W= 7 3
1
AS4 W= 5 4
A1 AS5 W= 2 5
AS8 R3
(c)
Figure 4.7: Avoiding inconsistent choices.
CHAPTER 4. OVER-RESERVATION
78
AS1
BR1 bS =10 units
b =7 units S
1
b:2/2
2
AS3
AS5
3
b =5 units S BR2
BR3
AS4
b:2/2
Aggregate x
b:2/2
(a) initial resources
AS1
BR1 bS =10 units
b =7 units S
1
b:2/12
3
b =5 units S 2
AS3
BR2
BR3
AS4
AS5
D
b:17/24
Aggregate x
b:12/17
(b) over-reserved
Figure 4.8: bS (x) negotiation example.
AS1
BR1
D
b:2/10
BR3
AS3
BR2
AS4 b:10/14
AS5
b:14/20
Aggregate x
Figure 4.9: bS (x) negotiation example, allocating a value between bi and bi +
P
bSi (x).
The resources requested by each BR on the path are performed accordingly with the candidate aggregate demand and also, with the outgoing link available resources. Let us P suppose, however, that D cannot allocate the full br + bSi (x) = 25 units, because in the meantime some resources were allocated in the link upstream, but instead, it can give back 20 bandwidth units. From a global perspective, this corresponds to 80% of the requested resources. Therefore, D would update the values of bSi (x) in the message to bSi (x) ← 0.8 ∗ bSi (x). At each BR, this would be the reserved value. We present in Fig. 4.9 the corresponding allocated values.
4.5.2.2 Over-Reserved Resources Refresh Mechanism
SICAP and BGRP use soft-state to manage the aggregates created. However, there has to be some mechanism capable of managing independently over-reserved resources: some aggregates may reserve resources that they will never use, depending on their demand at a certain instant. This may increase the blocking probability of a mechanism, making it inefficient. Thus, overreserved resources require an independent refresh mechanism, that does not go against the softstate property of the protocols. We propose to add a global timer per agent in charge of periodically checking the over-
4.5. ENHANCEMENTS AND IMPLEMENTATION ISSUES
900s elapsed
YES
Reset timer
NO
79
next x exists
NO
send REFRESH
YES
flag timer x ON
NO
r(x)=1
YES
YES
set flag timer x OFF set flag timer x OFF
Add x and bf (x) to REFRESH
Compute b (x) f
b R
b − b (x) R f
Figure 4.10: Over-reserved resources timeout release algorithm.
reserved resources of the agent. This timer will have a timer-flag (one bit) associated with each of the existing aggregates: if that flag is ON, the aggregate has been already checked during the timeout interval and hence, there is no need to check it again. This will speed up the checking procedure at each agent. The default timeout interval is set by default to 900s, which is ten times more the interval sustained by SICAP in case no REFRESH message is received, and after which the state of an aggregate is deleted. The over-reservation refresh algorithm proposed is illustrated in Fig. 4.10. When there is a timeout, all the aggregates kept on the agent are checked: the r(x) function is triggered for each aggregate x that has the timer flag OFF. In case there are aggregates that require the release of resources, then a special REFRESH message is created. This message performs bundled refresh and hence, contains a list of all the aggregates identifiers requiring the release of resources, and a list of the corresponding shares of bandwidth to release, and will be sent to the next-hop on the aggregate path, after the check is over. This means that the number of REFRESHs sent depends on the number of aggregates that have different next-hops. On a worst-case scenario, where all the aggregates requiring the release of bandwidth follow different paths, this mechanism triggers only a message to be sent to each aggregate next-hop; on a best-case scenario, only a REFRESH message is sent, since all the aggregates requiring the release of over-reserved resources have the same nexthop. However, this mechanism may be further enhanced, to reduce the number of REFRESHs sent. For instance, instead of having to release bandwidth for all the aggregates it has, the agent
80
CHAPTER 4. OVER-RESERVATION
may opt to release bandwidth only of the aggregates that hold “more”, according to a certain bandwidth threshold, or opt to release a certain amount of its reserved bandwidth, e.g., 20%. The proposed refresh mechanism is a possible approach, that requires further study. Due to its implications and complexity, we leave this study to be addressed as future work.
4.6 Chapter Closure In this chapter, we addressed the issue of the high signaling load attained by current control aggregation protocols. As a possible solution to reduce that load, we presented an overreservation mechanism that provides aggregates with more resources than the share initially required, combining different approaches to compute such amount of resources, namely, a resource distribution and deletion algorithm, as well as several functions to compute the amounts to request and to release. This mechanism was applied to two different protocols, SICAP and BGRP, which use, respectively, two different aggregation approaches, the sharedsegment and the sink-tree approaches. Our investigation allowed us to draw several conclusions. The first is that over-reservation can be a meaningful option to reduce signaling load without incurring a penalty too high in blocking probability. Both protocols were found to significantly profit from over-reservation, with improvements that were most significant under conditions of high intensities of requests. In particular, the hybrid scheme provided the best performance trade-off for SICAP, while the use of delayed resource release alone appeared to be the best option for BGRP. Such a disparity is not entirely surprising given the different types of aggregation the protocols perform. Another interesting finding was that the use of demand estimation as an input to the over-reservation process could prove beneficial to better size what additional resources to request, especially for scenarios of many short-lived requests. Even though this study thoroughly investigated over-reservation, there is still some work that can bring benefits to the use of over-reservation. We intend, as future work, to analyse enhancements related to the resource distribution and deletion algorithms, including allowing resource negotiation by each BR on the path. We believe that this step may further reduce the signaling and lower the blocking probability. We also intend to analyse the mechanism performance in some other topologies and also, other traffic models, e.g., bursty traffic. Besides the analysis of the mechanism performance, we also presented implementation implications of the use of this mechanism, both in general terms e.g., changes required to messages, as well as to specific implications due to the operation of a protocol.
5 Summary and Future Work This final chapter presents a summary of the dissertation, highlighting our contributions and the conclusions we gathered throughout the work presented in this dissertation. We also present guidelines for open issues in the research field of inter-domain control aggregation.
5.1 Summary In this dissertation, we investigated inter-domain control path aggregation, especially focusing on trying to understand the rules that provide maximum efficiency. We started by, in Chapter 1, addressing QoS notions focusing on the QoS control plane, and why we decided to approach this research field. In Chapter 2, we introduced inter-domain control issues and nomenclature. We described two aggregation approaches, the sink-tree and the shared-segment approach, and analysed how to formulate different algorithms based on these two approaches. We explained the strong points and drawbacks of such procedures, and presented ns2 simulations that helped to understand the behavior of each aggregation approach, when considering as performance measure the required state information. In Chapter 3, we presented a novel inter-domain resource reservation protocol, SICAP, which we used not so much as to introduce another resource reservation protocol, but more to explain design issues related to the use of the shared-segment aggregation approach. We compared SICAP and BGRP in terms of their operation, exchanged signaling messages, and bandwidth efficiency, and additionally, to analyse the more complex issue of state, again used ns2 simulations. We showed that SICAP outperforms BGRP in state, but that none of the protocols lowered the signaling load when compared to non-aggregation approaches. To address the issue of the signaling load we presented in Chapter 4 an over-reservation mechanism, designed to be an extension to SICAP, but that can be also used with BGRP. We analysed the performance of this mechanism carrying out ns2 simulations both in the context of SICAP and BGRP, for different resource distribution and resource release functions. We 81
CHAPTER 5. SUMMARY AND FUTURE WORK
82
showed that over-reservation is indeed helpful to lower the signaling load of both protocols, and that different mechanisms enable both protocols to achieve their best performance in terms of the achieved signaling load reduction and blocking probability. We have also addressed possible implementation issues and enhancements to this mechanism.
5.2 Contributions This dissertation provides the major contribution of showing that there are several options in terms of aggregation: up until now, there was only one possible option, the sink-tree approach. Additionally, it is widely accepted that current QoS models lack a global control mechanism, capable of managing end-to-end resources. Aggregation is a major piece in the gigantic puzzle of QoS frameworks, fact corroborated by the ongoing work of the NSIS working group. In this context, our contributions were: • analysing and clarifying issues related to inter-domain control aggregation; • presenting the weaknesses and strong points of the sink-tree aggregation approach and of the shared-segment aggregation approach, not only in terms of state required, but also, in terms of bandwidth efficiency and signaling load; • introducing a novel resource reservation establishment protocol, SICAP, that outperforms in state the only other proposal, BGRP; • introducing and analysing the performance of a novel mechanism that can be used with inter-domain reservation protocols to reduce their signaling load, provided that they are similar to SICAP in operation and messaging sequence.
5.3 Future Work Guidelines This study investigated several paths related to the application of aggregation in interdomain resource reservation establishment, considering state required, bandwidth efficiency and the achieved signaling load. However, the aggregation control research field can gain from further research in some areas, which we present in this section. 5.3.1 Choosing Where to Deaggregate We presented several algorithms based on the shared-segment approach, and chose as the best the algorithm WDS-E, which is the algorithm that SICAP uses to perform aggregation. However, and even though this algorithm is simple and enough to bring out the full benefits of
5.3. FUTURE WORK GUIDELINES
83
the shared-segment approach, this piece of research cannot be concluded, without thoroughly investigating other possible aggregation rules, and comparing them to the ones that WDS-E uses. Possible investigation can be related to the fact that we opted to weight an AS based on the number of downstream neighbor ASs. Hence, we do not take into consideration the more complex relationships between neighboring ASs, namely, if the neighbor is a client, a provider, or even a sibling, according to Gao’s nomenclature [26]. The shared-segment approach has the great advantage of flexibility: it gives rise to numerous ways of aggregating, depending on the rules followed to choose the deaggregation points. The downside to such flexibility is that it may increase the complexity of the devised procedures. Given that we are dealing with interAS situations, the relationships between ASs are extremely important to devise the optimal aggregation rules.
5.3.2 Over-reserving For the over-reservation mechanism, there are still some issues which require further work. We analysed two different over-reservation approaches, a dynamic and a static one. We’ve shown that the dynamic approach achieves better performance, but it is now necessary to compare the resource distribution bS (x) used to others, in order to find the optimal function. We’ve also described some enhancements to the current mechanism. A possible field of research is the implementation and evaluation of those enhancements. Namely, it is necessary to analyse the performance of the resource negotiation part proposed in section 4.5.2.1., that allows each BR (and not only the source) on the path to ask for more bandwidth on behalf of an aggregate using the same request. We believe that this step may further reduce the signaling and lower the blocking probability. It is also necessary to analyse the performance of the overreserved resource timeout mechanism proposed in section 4.5.2.2, since a good management of the over-reserved resources is crucial to avoid resource starvation.
5.3.3 Further Evaluation The work presented in this dissertation was tested for different simulation scenarios, which tried to be as broad as possible, and which relied on a Poisson model to characterize the arrival of reservation requests. However, and even though the Poisson process is well known for characterizing call arrivals, it is also necessary to understand the behavior of the mechanisms described for other traffic models, e.g., bursty traffic. This research will also profit from simulations based on more realistic topologies and also, from using real traffic traces, possibly obtained from a Tier-1 provider.
84
CHAPTER 5. SUMMARY AND FUTURE WORK
Finally, the scalability of the mechanisms presented should be analysed in a real-scale testbed, involving a group of heterogeneous ASs.
Bibliography [1] A. Broido, E. Nemeth, and K. Claffy. Technical report, CAIDA, 2002.
Internet Expansion, Refinement, and Churn.
[2] A. Feldmann, A. Gilbert, P. Huang, and W. Willinger. Dynamics of IP Traffic: A Study of the Role of Variability and the Impact of Control. SIGCOMM’99, 1999. [3] A. Halteren, L. Franken, D. Vries, I. Widya, G. Tuquerres, J. Pouwelse, and P. Copeland. AMIDST Deliverable: 3.1.1 QoS Architectures and Mechanisms - State of the Art. Technical report, January 1999. [4] A. Medina, A. Lakhina, I. Matta, and J. Byers. BRITE: An Approach to Universal Topology Generation. International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems- MASCOTS ’01, Cincinnati, Ohio, August 2001. [5] Arbitron and Edison Media Research. Internet and Multimedia 10: The Emerging Digital Consumer. Technical report, 2003. [6] Chen-Nee Chuah. A Scalable Framework for IP-Network Resource Provisioning Through Aggregation and Hierarchical Control. PhD thesis, University of California at Berkeley, 2001. [7] D. Oran (Editor). OSI IS-IS Intra-domain Routing Protocol. Request for Comments 1142, Internet Engineering Task Force, February 1990. [8] Eugenia Nikolouzou, Peter Sampatakos, Lila Dimopoulou, Iakovos S. Venieris, Martin Winter, Bert F. Koch, Thomas Engel, Stefano Salsano, Vincenzo Genova Fabio Ricciato, and Gerald Eichler. BGRPP: Performance evaluation of the proposed Quiet Grafting mechanisms. Internet Engineering Task Force, draft, July 2002. [9] F. Baker, C. Iturralde, F. le Faucher, and B. Davie. Aggregation of RSVP for IPv4 and IPv6 Reservations. March 2000. [10] Forrester Research. Forrester Research, Inc. Available at http://www.forrester.com/, 2003. 85
86
BIBLIOGRAPHY
[11] G. Huston. Analyzing the Internet’s BGP Routing Table. Technical report, January 2001. [12] G. Huston. Telstra BGP Table Report. Available at http://bgp.potaroo.net/, February 2001. [13] GlobalReach. Global Internet Statistics by Language. Available at http://www.globalreach.biz/globstats/index.php3, March 2003. [14] IETF. Integrated Services over Specific Link Layers (issll) Charter. http://www.ietf.org/html.charters/issll-charter.html.
Available at
[15] IETF. The Internet Engineering Task Force site. Available at http://www.ietf.org/. [16] Jon Postel (Editor) Information Sciences Institute. Internet Protocol - DARPA Internet Program Protocol Specification. September 1981. [17] Institute of Electrical and Electronics Engineers. 802.11 Wireless Local Area Networks. Available at http://grouper.ieee.org/groups/802/11/, 2003. [18] IST Aquila Project. Public Deliverable D1203, Final System Specifications. Technical report, http://www-st.inf.tu-dresden.de/aquila/files/public-deliverables.htm, April 2002. [19] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, November 1984. [20] J. Hawkinson and T. Bates. Guidelines for creation, selection, and registration of an Autonomous System (AS). May 1996. [21] J. Heinanen, F. Baker, W. Weiss, and J. Wroclawski. Assured Forwarding PHB Group. Request for Comments 2597, Internet Engineering Task Force, (Request for Comments 1999), June 1999. [22] J. Moy. Open Shortest Path First Version 2. Request for Comments 1583, Internet Engineering Task Force, March 1994. [23] J. Postel. Transmission Control Protocol. Request for Comments 793, DARPA Internet Program, September 1981. [24] J. Wroclawski. Specification of the Controlled-Load Network Element Service. Request for Comments 2211, Internet Engineering Task Force, September 1997. [25] L. Berger, D. Gan, G. Swallow, P. Pan, F. Tommasi, and S. Molendini. RSVP Refresh Overhead Reduction Extensions. Request for Comments 2961, Internet Engineering Task Force, April 2001. [26] L. Gao. On Inferring Autonomous System Relationships in the Internet. IEEE Global Internet Symposium, November 2000. [27] IETF Working Group NSIS. Next Steps in Signaling Working Group Charter. Available at http://www.ietf.org/html.charters/nsis-charter.html, June 2003.
BIBLIOGRAPHY
87
[28] O. Schelen and S. Pink. Aggregating Resource Reservations over Multiple Routing Domains. IFIP Sixth International Workshop on Quality of Service (IWQoS’98), California, May 1998. [29] P. Ferguson and G. Huston. Quality of Service: Delivering QoS in the Internet and the Corporate Network. Wiley Computer Books, New York, NY, 1998. [30] P. Pan, E. Hahne, and H. Schulzrinne. The Border Gateway Reservation Protocol (BGRP) for Tree-Based Aggregation of Inter-Domain Reservations. Journal of Communications and Networks, June 2000. [31] P. Pan and H. Schulzrinne. YESSIR: A simple reservation mechanism for the internet. 8th International Workshop on Network and Operating Systems Support for Digital Audio and Video, NOSSDAV 98, July 1998. [32] R. Braden, D. Clark, and S. Shenker. Integrated Services in the Internet Architecture: an Overview. Request for Comments 1663, Internet Engineering Task Force, June 1994. [33] R. Braden, L. Zhang, and S. Jamin. Resource Reservation Protocol (RSVP) - version 1, Functional Specification. Request for Comments 2205, Internet Engineering Task Force, September 1997. [34] R. Callon. Use of OSI IS-IS for Routing in TCP/IP and Dual Environments. request for Comments 1195, Internet Engineering Task Force, December 1990. [35] R. Guérin, S. Herzog, and S. Blake. Aggregating RSVP-Based QoS Requests. Internet Draft, Internet Engineering Task Force, September 1997. Work in Progress. [36] R. Jain. The Art of Computer Systems Performance Analysis. John Wiley & Sons, Inc., 1991. [37] R. Sofia, R. Guérin, and P. Veiga. An Investigation of Inter-Domain Control Aggregation Procedures. International Conference on Networking Protocols, ICNP’02, Paris, France, November 2002. [38] R. Sofia, R. Guérin, and P. Veiga. An Investigation of Inter-Domain Control Aggregation Procedures. Technical report, ESE, University of Pennsylvania, July 2002. Available at http://einstein.seas.upenn.edu/mnlab/publications.html. [39] R. Sofia, R. Guérin, and P. Veiga. SICAP, A Shared-segment based Inter-domain Control Aggregation Protocol. Technical report, University of Pennsylvania, October 2002. Available at http://einstein.seas.upenn.edu/mnlab/publications.html. [40] R. Sofia, R. Guérin, and P. Veiga. SICAP, a Shared-segment Inter-domain Control Aggregation Protocol. High Performance Switching and Routing, HPSR 2003, Turin, Italy, June 2003.
88
BIBLIOGRAPHY
[41] R. Sofia, R. Guérin, and P. Veiga. A Study of Over-reservation for Inter-Domain Control Aggregation Protocols. Technical report, University of Pennsylvania, May 2003. Available at http://einstein.seas.upenn.edu/mnlab/publications.html. [42] S. Berson and S. Vincent. Aggregation of Internet Integrated Services State. Internet Draft, Internet Engineering Task Force, August 1998. Work in Progress. [43] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An Architecture for Differentiated Services. Request for Comments 2475, Internet Engineering Task Force, December 1998. [44] S. Casner. A fine-grained view of high-performance networking. http://www.nanog.org/mtg-0105/ppt/casner/index.htm, May 2001.
Available at
[45] S. Floyd. Comments on Measurement-based Admissions Control for Controled-Load Service. Computer Communication Review, 1996. [46] S. Shenker, C. Partridge, and R. Guerin. Specification of Guaranteed Quality of Service. Request for Comments 2212, Internet Engineering Task Force, September 1997. [47] S. Uhling and O. Bonaventure. Implications of Interdomain Traffic Characteristics on Traffic Enginering. Technical report, University of Namur, June 2001. [48] T. Telkamp. Traffic Characteristics and Network Planning. NANOG’26, October 2002. [49] V. Jacobson, K. Nichols, and K. Poduri. an Expedicted Forwarding PHB. Request for Comments 2598, Internet Engineering Task Force, (Request for Comments 2598), June 1999. [50] V. Paxson and S. Floyd. Why We Don’t Know How to Simulate the Internet. Winter Simulation Conference 1997, 1997. [51] VINT Project. The ns Manual. UC Berkeley, LBL, USC/ISI, Xerox Parc, September 2001. [52] W. Willinger and V. Paxson. Where Mathematics meets the Internet. Notices of the American Mathematical Society, 45(8), August 1998. [53] Y. Benet. The Complementary Roles of RSVP and Differentiated Services in the FullService QoS Network. IEEE Communications, February 2000. [54] Y. Rekhter and T. Li. An Architecture for IP Address Allocation with CIDR. Request for Comments 1518, Internet Engineering Task Force, September 1993.