Oct 16, 2009 - Maximally-Disjoint Path Computation Using a Greedy approach . ...... S. Qazi and T. Moors, âPractical Issues of Statistical Path Monitoring in ... administrator, BGP neglects such performance metrics, and only considers routing policies in trying ...... SIGCOMM workshop on Network troubleshooting, 2004, pp.
Scalable Resilient Overlay Networks Sameer Hashmat QAZI A dissertation submitted in fulfilment of the requirements for the degree of Doctor of Philosophy
The School of Electrical Engineering and Telecommunications The University of New South Wales
October 2009
2
ABSTRACT The Internet has scaled massively over the past 15 years to extend to billions of users. These users increasingly require extensive applications and capabilities from the Internet, such as Quality of Service (QoS) optimized paths between end hosts. When default Internet paths may not meet their requirements adequately, there is a need to facilitate the discovery of such QoS optimized paths. Fortunately, even though the route offered by the Internet may not work (to the required level of performance), often there exist alternate routes that do work. When the direct Internet path between two Internet hosts for instance is sub-optimal (according to specific user defined criterion), there is a possibility that the direct paths of both to a third host may not be suffering from the same problem owing to path disjointness. Overlay Networks facilitate the discovery of such composite alternate paths through third party hosts. To discover such alternate paths, overlay hosts regularly monitor both Internet path quality and choose better alternate paths via other hosts. Such measurements are costly and pose scalability problems for large overlay networks. This thesis asserts and shows that these overheads could be lowered substantially if the network layer path information between overlay hosts could be obtained, which facilitates selection of disjoint paths. This thesis further demonstrates that obtaining such network layer path information is very challenging. As opposed to the path monitoring which only requires cooperation of overlay hosts, disjoint path selection depends on the accuracy of information about the underlay, which is out of the domain of control of the overlay and so may contain inaccuracies. This thesis investigates how such information could be gleaned at different granularities for optimal tradeoffs between spatial and/or temporal methods for selection of alternate paths. The main contributions of this thesis are: (i) investigation of scalable techniques to facilitate alternate path computation using network layer path information; (ii) a review of the realistic performance gains achievable using such alternate paths; and (iii) investigation of techniques for revealing the presence of incorrect network layer path information, proposal of new techniques for its removal.
Keywords: Quality of Service, Overlay Networks, Peer-to-Peer Systems, Service-oriented Networks
3
4
ACKNOWLEDGEMENTS First, I would like to thank the All-Mighty. After that I am very profoundly grateful to my advisor Dr. Timothy Moors for his trust in me throughout the last four years, his unconditional support, patience and guidance without which I could not have accomplished this long research journey. I would also like to thank my Co-Adviser Dr. Aruna Seneviratne for guiding me in the initial stages of my PhD. I would also like to thank National University of Science and Technology (NUST), Pakistan for extending their generous financial support for 3 years of my PhD candidature. I thank my supervisor and the Head of Electrical Engineering School (UNSW), Dr. Timothy Hesketh to provide me with PhD completion scholarship for partial financial support during the fourth year of my candidature. My thanks are also to the Graduate Research School (UNSW) for awarding post graduate students with travel grants to help fund my conference travels. I would also like thank all the fellow Networks Group members (present and former): Arun, Arvind, Bo, Jack, John, Nick, Nixian, Mohammad, Nick, Shuo, Zawar; and other friends, Mark, Phu and Adeel for their companionship and help throughout the PhD journey. I would especially like to thank Dr. Eric. D. Kolaczyk (Boston University) for his helpful comments on the work on the removal of Routing Matrix Inconsistencies to improve statistical path estimation. I would also like to acknowledge the help extended to me by Theirry Rakotoarivelo from NICTA, with whom I shared fruitful discussions on the availability and use of Internet Datasets. I would thank also Ido Nevat for helpful discussions on robust regression techniques. I profoundly thank Jack Tsai and Arun Vishwanath for proofreading this dissertation. I would also like to thank Phil Allen who looked after the welfare of our research tools namely our PCs and software applications, whenever we had any issues. Finally, I would like to express my profound gratitude to my parents for their hard work and sacrifices; my sister, and my late grandmother. They all encouraged and inspired me in many ways. I would have never made it through this journey without their love and their continuous prayers.
5
6
LIST OF ABBREVIATIONS
AMP
Active Measurement Project
AS
Autonomous System
ASN
Autonomous System Number
BGP
Border Gateway Protocol
BLP
Best Linear Predictor
CAIDA
The Cooperative Association for Internet Data Analysis
CDN
Content Distribution Network
CO
Convex Optimization
CORR
Correlation
COV
Covariance
DHT
Distributed Hash Table
EDR
Earliest Divergence Rule
EID
Endpoint Identifier
FEC
Forward Error Correction
GPS
Global Positioning System
HLP
Hybrid Link-state Path-vector
IP
Internet Protocol
ISP
Internet Service Provider
KBR
Key Based Routing
MIRO
Multipath Interdomain Routing
7
8
MST
Minimum Spanning Tree
NCC
Network Coordination Center
NLANR
The National Laboratory for Applied Network Research
NIRA
New Internet Routing Architecture
NP
Non-polynomial time solvable
QoS
Quality of Service
RD
Rank Deficiency
RIPE
Réseaux IP Européens
RMI
Routing Matrix Inconsistencies
RON
Resilient Overlay Networks
RPE
Relative Prediction Error
RTT
Round Trip Time
SVD
Singular Value Decomposition
TCP
Transmission Control Protocol
ToR
Type Of Relationship
TTM
Test Traffic Measurement
UDP
User Datagram Protocol
VAR
Variance
VoIP
Voice over IP
ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’
Signed …………SAMEER QAZI…………..………….
Date
…………16 October 2009.…………………….
9
10
OUTLINE
Part I –Introduction and Background 1 Introduction 2 Literature Review 3 Description of Internet Datasets used in this dissertation
Part II –Scalable Heuristics for Selecting Disjoint Paths in Overlay Network 4 An Architecture for Selecting Disjoint Paths- Globally Scalable RON Service 5 Disjoint Path Selection in Overlay Networks using ToR Graphs
Part III-Path Monitoring in Overlay Networks 6 Issues of Statistical Path Monitoring in Overlay Networks 7 Conclusions and Proposals for Future Directions of Research
11
12
TABLE OF CONTENTS Abstract..............................................................................................................................................................3 Acknowledgements............................................................................................................................................5 List of Abbreviations ........................................................................................................................................7 Originality Statement .......................................................................................................................................9 Outline..............................................................................................................................................................11 Table Of Contents ...........................................................................................................................................13 List of Figures..................................................................................................................................................15 List of Tables ...................................................................................................................................................17 List of Publications..........................................................................................................................................19 Part I ................................................................................................................................................................21 Introduction and Background .......................................................................................................................21 1 Introduction ...........................................................................................................................................23 1.1 Why Overlay Networks? .............................................................................................................23 1.2 Dissertation Overview..................................................................................................................27 2 Literature Review ..................................................................................................................................29 2.1 Introduction..................................................................................................................................29 2.2 Exploiting Path Diversity in the Internet through Overlay Networks ....................................30 2.2.1 Overlay Topology ...............................................................................................................36 2.2.2 Monitoring Overlay Links .................................................................................................39 2.2.3 Selecting Overlay Paths .....................................................................................................43 2.2.4 Detouring Packets...............................................................................................................47 2.2.5 (In-)Feasibility of Selfish-Routing on Overlay-Networks ...............................................49 2.2.6 Open Research-issues with Overlay-Networks ................................................................50 2.3 Proposals To Modify Underlay Routing Mechanisms ..............................................................51 2.3.1 Re-Engineering BGP-4.......................................................................................................51 2.3.2 Enhancing network level packet forwarding decisions to exploit path diversity..........54 2.3.3 Fast Re-Route (FRR) construction to reduce failover times...........................................56 2.3.4 Open Research-issues with proposals to modify underlay routing mechanisms ..........58 2.4 Multi-Homing Solutions ..............................................................................................................59 2.4.1 Open Research-issues with Multi-homing........................................................................61 2.5 Chapter Summary .......................................................................................................................62 3 Description of Internet Datasets Used in This Dissertation ...............................................................63 3.1 Datasets considered and methodology for obtaining the datasets ...........................................63 3.2 Network Layer Characteristics of Overlay Paths Vs Direct Paths..........................................66 3.3 When is the Direct Internet path degraded? .............................................................................70 Part II...............................................................................................................................................................73 Scalable Heuristics for Selecting Disjoint Paths In Overlay Networks ......................................................73 4 An Architecture for Selecting Disjoint Paths- Globally Scalable RON Service...............................75 4.1 Introduction..................................................................................................................................75 4.2 Relationship between Overlay Network size and path diversity it offers................................75 4.3 Are some overlay paths preferred more often than others?.....................................................77 4.4 DG-RON Clients and Services ....................................................................................................80 4.5 Overlay Infrastructure ................................................................................................................80 4.6 Online Path Selection-Dynamic Path Monitoring.....................................................................82
13
Offline Path Selection- Landmark Based Heuristics ................................................................ 83 4.7 4.8 Performance Evaluation ............................................................................................................. 85 4.8.1 Impact of Detour Set Size .................................................................................................. 86 4.8.2 Evaluation of Offline Path Heuristics............................................................................... 88 4.8.3 Comparison with SPAD..................................................................................................... 89 4.9 Discussion ..................................................................................................................................... 91 4.10 Conclusion .................................................................................................................................... 91 5 Disjoint Path Selection In Overlay Networks using ToR Graphs..................................................... 93 5.1 Introduction ................................................................................................................................. 93 5.2 ToR (Type-of-Relationship) Graphs .......................................................................................... 93 5.3 Maximally-Disjoint Path Computation Using a Greedy approach ......................................... 95 5.3.1 Finding Valley-Free Edge-Disjoint Paths ........................................................................ 95 5.3.2 Finding Maximally-Disjoint Valley-Free Paths............................................................... 98 5.3.3 Comparison with Earliest Divergence Rule (EDR)....................................................... 100 5.4 Performance Evaluation ........................................................................................................... 101 5.4.1 Methodology used to construct ToR-graph ................................................................... 101 5.4.2 Network layer path characteristics inferred from ToR-graph .................................... 102 5.4.3 Performance-Evaluation of the Greedy-Approach ....................................................... 104 5.5 Chapter Summary ..................................................................................................................... 110 Part III........................................................................................................................................................... 113 PATH MONITORING IN OVERLAY NETWORKS.............................................................................. 113 6 Issues of Statistical Path Monitoring In Overlay Networks ............................................................ 115 6.1 Introduction ............................................................................................................................... 115 6.2 Algebraic Notation..................................................................................................................... 117 6.3 Routing matrices and Eigen Spectra of AMP and RIPE data sets........................................ 120 6.3.1 Extent of rank-deficiency ................................................................................................ 120 6.4 Selecting a Subset of Paths for Monitoring and Predicting the Unmonitored Paths Using Best Linear Predictor .............................................................................................................................. 123 6.5 Routing Matrix Inconsistencies ................................................................................................ 129 6.5.1 How RMI occurs? ............................................................................................................ 129 6.5.2 Can RMI be eliminated? ................................................................................................. 138 6.5.3 Quantification of RMI ..................................................................................................... 140 6.6 Statistical Techniques to Mitigate the Effects of RMI............................................................ 144 6.7 Improvement in Path Prediction and Anomaly Detection for AMP and RIPE networks after application of Robust Statistical Techniques......................................................................................... 146 6.8 Discussion ................................................................................................................................... 150 6.9 Conclusion .................................................................................................................................. 151 7 Conclusions And Proposals For Future Directions Of Research.................................................... 153 7.1 Reviewing the Goal.................................................................................................................... 153 7.1.1 Architecture...................................................................................................................... 153 7.1.2 Path Selection ................................................................................................................... 153 7.1.3 Path Monitoring ............................................................................................................... 154 7.2 Future Research Directions ...................................................................................................... 154 7.2.1 More accurate overlay topology ‘modeling’ .................................................................. 154 7.2.2 Accurate depiction of Internet failure models ............................................................... 154 7.2.3 Investigation of synergy between competing overlays .................................................. 155 APPENDIX ................................................................................................................................................... 157 References: .................................................................................................................................................... 159
14
LIST OF FIGURES Figure 1.1 Resilient Overlay Networks. Establishing Alternate paths via an overlay host when the path between two Internet hosts fail. .............................................................................................................24 Figure 1.2. Logical Overlay topology (top) and Network Layer Overlay topology inferred from traceroutes. ..............................................................................................................................................25 Figure 2.1 Direct path between UNSW and example.com and a one-hop overlay path via CMU...........31 Figure 2.2 (a) (top) Possible one-hop overlay path between end-hosts when the direct Internet path suffers from outage/service degradations. (b) Overlay tunnel establishment....................................33 Figure 2.3 (a) (top) Full-Mesh Overlay topology and corresponding network layer topology. (b) Constructing Minimum-Weight spanning tree to prune overlay topology by removing edges. ......37 Figure 2.4 (a) (top) Probing overlay links. Each overlay host probes paths to all other overlay hosts for measurement of path-metrics such as latency, throughput and loss rates. (b) Link-State Dissemination Protocol is used to share such measurements between all overlay hosts...................40 Figure 2.5 Algebraic method of path monitoring (assuming path symmetry)...........................................42 Figure 2.6 Earliest-Divergence Heuristic to select disjoint alternate paths ...............................................44 Figure 2.7 Using Key-Based Routing (KBR) to find paths between two end-hosts [36]. ..........................46 Figure 2.8 ‘Drafting’ behind Akamai servers. One-hop indirection through an overlay node. The overlay node is selected based on preference of Akamai-to serve content from one of its severs. ...48 Figure 2.9 Contention for same set of underlay links. Three overlay networks decide to use same set of underlay links to improve QoS on end-to-end paths increasing network load (congestion) on links and also towards possible oscillations in quest for better paths..........................................................50 Figure 2.10 (a) (top) A single link-failure invalidates several valid routes (shown by bold arrows). (b) Appending path-withdrawal messages with ‘cause-of-failure’ tags help eliminate all invalid routes quickly and converge to valid route quickly .........................................................................................52 Figure 2.11 MIRO routing example[76]........................................................................................................54 Figure 2.12 Path deflection decision made at router level can exploit the path diversity in the underlay network ....................................................................................................................................................55 Figure 2.13. Inter-domain MPLS path construction....................................................................................57 Figure 2.14 Single-homing Vs Multi-homing................................................................................................60 Figure 3.1 Location of AMP monitors in North America [100]. .................................................................64 Figure 3.2 Location of RIPE monitors in Europe and the rest of the world[101]. ....................................65 Figure 3.3 Network layer path length at IP level and AS level. (AMP-146-30/Jun/2006(top) and RIPE40-05/Sep/2007)........................................................................................................................................66 Figure 3.4 Percentage of one-hop overlay paths which diverge from the direct path at or before nth AS-hop (AMP-146-30/Jun/2006)..........................................................................................................................68 Figure 3.5 Percentage of one-hop overlay paths which diverge from the direct path at or before nth IP-hop (AMP-146-30/Jun/2006)..........................................................................................................................68 Figure 3.6 CDF of the difference between the mean path delay on direct Internet path and the mean delay on the best one-hop overlay path. ................................................................................................69
15
Figure 3.7 Probability plots for paths to show incidence of path outages and performance failures. (RIPE (top) and AMP). .......................................................................................................................... 71 Figure 4.1 Relationship between size of an overlay network and AS degree distributions. X-axis depicts ASes sorted according to their degree-(descending order) normalized by total number of ASes. .. 77 Figure 4.2 Overlay hosts sorted in descending order ‘z’ (x-axis) according to percentage of failures masked, and failures masked as Cumulative function ‘F[z]’ (y-axis) ................................................ 79 Figure 4.3 Finding Topologically diverse detours for underlay destinations. ........................................... 81 Figure 4.4 Offline Detour Selection based on Maximum Divergence Principle. ....................................... 84 Figure 4.5 Delay Gain Comparison between DGRON and RON with variation in detour set size......... 87 Figure 4.6 Delay Gain Comparison between DGRON and SPAD (|T|=12). .............................................. 90 Figure 5.1 Network layer paths between source-destination at AS level topology.................................... 94 Figure 5.2 Example of valid and invalid valley-free paths in ToR-graphs [61, 118]................................. 96 Figure 5.3 (Top) Example of valid valley-free path in the original ToR-graph (G). Dotted lines show concatenation of a set of C-P (forward) and P-C (backward) edges forming a valley free s-t path. (Bottom) Relaxation using the 2 layer model consisting only of forward edges................................ 97 Figure 5.4 Optimal solution to the Edge-Disjoint Path problem in the Two-Layer ToR-graph.............. 98 Figure 5.5 Path inflation between (a) AMP and (b) RIPE hosts (AS-hops)............................................. 103 Figure 5.6 Number of disjoint paths between (a) AMP (top) and (b) RIPE hosts using ToR-graph. ... 104 Figure 5.7 Number of candidate paths selected by greedy-approach for path outages and performance failures in the AMP-datasets: (a) AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06. .......... 107 Figure 5.8 Delay gain of best path selected for path outages and performance failures in the AMPdatasets: (a)AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06. .............................................. 108 Figure 5.9 Correlation of path-delay characteristics between direct-path and best-alternate-path selected using Greedy Path Selection (Path Outages for AMP-146-30/Jun/2006 and AMP-13331/Aug/2006).......................................................................................................................................... 110 Figure 6.1 (a) (left)How overlay resilience depends on topology of the underlay network. (b) Inferring maximum information about all virtual overlay links....................................................................... 116 Figure 6.2 Additive Network Metrics. ........................................................................................................ 118 Figure 6.3 Algebraic method of path monitoring ...................................................................................... 119 Figure 6.4 Eigen Spectra of AMP and RIPE Networks............................................................................. 122 Figure 6.5 AS degree for RIPE and AMP networks.................................................................................. 123 Figure 6.6 Problems in estimating of second order link metrics from traceroutes; link correlation matrices for AMP-30-30/Jun/2006. (a)(top) intra AS links; (b) interAS links ................................ 126 Figure 6.7. L1 error for RIPE and AMP networks as a function of monitored paths............................ 128 Figure 6.8 Load balancing inside an AS. .................................................................................................... 129 Figure 6.9 Incorrect path inference: some links are missed while other false links are added.............. 130 Figure 6.10 Frequency of path variation in AMP networks over 24 hr period....................................... 131 Figure 6.11 Adjusting path inside AS11537 causes significant delay reduction on path between ampupenn and amp-hawaii ......................................................................................................................... 132 Figure 6.12 Load balancing inside AS11096 causes anomalous delay measurements at 6th and last hop on path between amp-fiu and amp-emory.......................................................................................... 132 Figure 6.13 Dynamic Load balancing inside AS11537 for paths to amp-hawaii seems to affect some paths at different times but not others................................................................................................ 134 Figure 6.14 Comparison of performance of CO estimator for AMP networks....................................... 137 Figure 6.15 Removal of Routing Matrix Inconsistencies (RMI) using the DWI and DWR Heuristic for removal of false links ............................................................................................................................ 139 Figure 6.16 Comparison of performance of CO estimator before and after removal of RMI for AMP networks. ............................................................................................................................................... 140 Figure 6.17 Computed value of c as the number of sampled paths increase for AMP50 and RIPE-40 143 Figure 6.18 Comparison of the L1-error metric of BL and Robust predictor. ....................................... 147 Figure 6.19 Comparison of performance of BL and Robust estimator AMP networks......................... 148 Figure 6.20 Improvement in Variance of Relative Prediction Error using BL-ridge and Robust estimator for AMP networks ............................................................................................................... 149 Figure 6.21 Actual, BL, BL-ridge and Robust predictor delay profile for a selected (unmonitored) path in AMP-50-30/Jun/2006........................................................................................................................ 150
16
LIST OF TABLES Table 2-1 Factors affecting resilience and performance of overlay networks. ..........................................35 Table 3-1 NLANR-AMP and RIPE-NCC Datasets......................................................................................65 Table 4-1. Path stretch incurred by selecting overlay paths based on offline path heuristics (|T|=12).......89 Table 4-2. Average Performance of offline path heuristics in masking failures (|T|=12). Path outages for AMP-146-30/June/2006 and Performance Failures for AMP-133-31/Aug/2006. ..............................89 Table 6-1 Dimensions and rank of AMP and RIPE routing matrices. .....................................................120
17
18
LIST OF PUBLICATIONS
Journals S. Qazi and T. Moors, “On the impact of Routing Matrix Inconsistencies on Statistical Path Monitoring in Overlay Networks”, submitted for 2nd round of reviews to Elsevier Computer Networks (ComNet) journal. S. Qazi and T. Moors, “Finding Alternate Paths in the Internet: A Survey of Techniques for End-to-End Path Discovery”, submitted for 2nd round of reviews to IEEE Communications Surveys and Tutorials journal.
Conferences S. Qazi and T. Moors, “Practical Issues of Statistical Path Monitoring in Overlay Networks with Large, Rank-Deficient Path Matrices” In Proceedings of IEEE BROADNETS, 2008. S. Qazi and T. Moors. “Disjoint-Path Selection in Overlays Networks using Type-of-Relationship (ToR) graphs”, In Proceedings of IEEE GLOBECOM, 2007. S. Qazi and T. Moors, “A Robust Wide Area Routing Overlay Using Destination-Guided Detouring”, In Proceedings of IEEE ICC 2007. J. Risson, S. Qazi, T. Moors, A. Harwood, “A Dependable Global Location Service using Rendezvous on Hierarchic Distributed Hash Tables”, In Proceedings of IEEE ICN 2006.
19
20
PART I INTRODUCTION AND BACKGROUND
21
22
1 INTRODUCTION 1.1 Why Overlay Networks? The Internet has expanded to a massive scale, incorporating millions of devices belonging to tens of thousands of networks [1]. One feature that has enabled this scaling has been its use of hierarchical routing, in which separately administrated Autonomous Systems (ASes) can independently choose their own interior routing protocol (e.g. OSPF or IGRP) and are interconnected by a single exterior routing protocol, the Border Gateway Protocol (BGP). Whereas interior routing protocols can choose paths based on performance metrics chosen by the administrator, BGP neglects such performance metrics, and only considers routing policies in trying to find a route. This design of BGP is partially a response to the difficulty of reaching consensus across all ASes as to what performance metrics should be used and optimized, partly because merely accounting for service provider policies is sufficiently challenging in itself, and partly because link and device performance are dynamic, and accounting for their variations would limit the scalability of BGP.
Consequently, routes across the Internet are often not optimized for
performance. Yet many applications are sensitive to route performance. At one extreme, if a route simply does not work, in that it fails to deliver packets, then that will clearly impinge on applications that communicate across that route. BGP will eventually detect and recover from such faults, but to permit it to scale, BGP does not frequently disseminate path availability information, e.g. it may sometimes take several minutes to learn and apply path updates [2]. As a result, applications may experience lengthy network outages. A less extreme example of sensitivity to performance is real-time applications such as Voice over IP (VoIP) that are sensitive to the delay with which information is transferred across the network. For these applications, the connectivity that BGP provides may be insufficient, since they seek a certain Quality of Service (QoS) in terms of the performance of the route. Fortunately, even though the route offered by BGP may not work (to the level of performance required by an application), often there exist alternate routes in the Internet that do work. The question then is how can applications tap into the existing path diversity in the Internet which goes unexploited by BGP? This is complicated by the fact that source applications have little control of the route – source routing is often blocked since it poses a security threat and is also incompatible with the Internet routing model in which ISPs set routing polices based on destination addresses [3]. One approach is to use “Resilient Overlay Networks (RONs)”, in which the source does not address
23
One hop overlay path using an intermediate overlay host C
C
Overlay link A,C
A
Overlay link C,B
B Direct path between overlay hosts A and B fails
Figure 1.1 Resilient Overlay Networks. Establishing Alternate paths via an overlay host when the path between two Internet hosts fail.
its packets directly to the destination, but initially addresses them to a third party (Figure 1.1), in the expectation that the path between it and the third party, and then from the third party to the destination, gives better performance than the direct path. Clearly this can be extended to multiple intermediate parties.
The question then becomes how does the source determine which
intermediate parties to send its packets through? The first pioneering study [4] demonstrated the application of resilient overlay networks to improve the reliability with which the Internet can meet application performance metrics. It involved participating hosts periodically probing the performance of the underlay paths between each other, and so identifying which alternate path provides the best performance between any two hosts via a third host. Such a path between two overlay hosts using a third overlay host as an intermediary is often referred to as an one-hop overlay path. Note that the direct Internet (underlay) path between two overlay hosts is also referred to as an overlay link [5]. Also note the distinction of an overlay link from one-hop overlay path described earlier. A one-hop overlay path is formed by the concatenation of two overlay links (underlay paths). Throughout this thesis, we interchangeably use the terms overlay links, underlay paths or just paths to denote the end-to-end paths between any two overlay hosts. The mention of an overlay path or simply an alternate path strictly means a onehop overlay path even when it is not mentioned explicitly for the sake of brevity. While such path probing can ensure that the alternate path does not suffer the same degradation that may affect the primary path chosen by Internet routing protocols, it does require participating hosts to frequently probe performance (so that they can rapidly detect and respond to degradations), and this ultimately limited the scalability of RONs to tens of hosts.
24
Overlay link A,B A
B
D
C
TracerouteA,B= [a b c d e]
d A
a
b
e B
c
D
C
Overlay Link A,B= underlay path [a b c d e]
Figure 1.2. Logical Overlay topology (top) and Network Layer Overlay topology inferred from traceroutes.
To reduce path probing overheads, an alternate mechanism would be to select paths based on their network layer disjointness. For example, two overlay links may seem disjoint when we view the logical topology of the overlay network (Figure 1.2) but in reality may share many links in the underlay network with other overlay links. The logical topology of the overlay network consists of the set of all overlay hosts and end-to-end paths between them. To be able to see the extent of underlay link sharing amongst overlay links, one would need to know the network layer (underlay) topology of the overlay network. A snap shot of the full underlay topology is impossible to get, as ISPs rarely make such information publicly available. More feasible is to map all routing information of paths between overlay hosts and piece this information together, to obtain a routing graph (routing topology). In this dissertation, any references made to network layer overlay topology would strictly refer to an overlay routing graph, G = (V , E ) , where the vertex set V = (v1 , v2, ,..., vr )
refers
to
IP
routers
and
overlay
hosts,
and
the
set
of
links
E = (e1 , e2 ,..., es ); e =< va ,m , vb ,n > represent the set of directed underlay links used on paths between
overlay hosts as determined by some path measurement techniques, such as traceroute (where va ,m refers to the m th interface of router a ).
The overlay routing graph can sometimes also be
represented in a matrix notation as a routing matrix (Chapters 2 & 6). Note that inferring the network layer overlay topology in this manner is sometimes challenging as this requires 25
information about the underlay, which is out of the domain of control of the overlay and so may contain inaccuracies. These issues will be described in more detail in Chapter 6.
In this
dissertation, references made to just the overlay topology (e.g. Chapter 2, Section 2.2) would pertain to the logical overlay topology while references made to the network layer overlay topology will be made explicit through the terms underlay topology, routing graph or routing matrix.
The first contribution of this thesis is the implementation of a scalable RON service, DGRON, using a distributed architecture. In classical RON [4], all N overlay hosts need to maintain overlay links with each other RON host, thus generating O( N 2 ) overheads which poses scalability issues. In DGRON, an overlay host typically needs to establish overlay links with a small (fixed) number of overlay hosts independent of the size of the overlay network. These hosts are chosen with special consideration to their geographical diversity in the network and their past performance in providing good alternate paths. Thus, the path monitoring overheads for an overlay network with N participating hosts can be reduced from O( N 2 ) to O(N ) . We evaluate the tradeoffs in performance vis a vis topology maintenance and path monitoring overheads. Our results using real world Internet datasets show that even with a huge reduction in path monitoring overheads, DGRON’s performance matches closely that of classical RON in finding alternate paths; matching performance of the best possible alternate path for a majority (90%) of path degradations encountered.
The second contribution of this thesis is to propose heuristics with which disjoint alternate paths can be discovered, so reducing the candidate alternate paths to be considered. This thesis takes the approach of examining the topology of the underlying network at the AS level so as to estimate viable alternate paths that are likely to be unaffected by a degradation in a direct path. Because the network topology does not vary as frequently as link and device performance, this technique enables RONs to scale to larger populations of participating hosts by lowering path monitoring overheads. Previously proposed techniques such as the Earliest Divergence Rule (EDR) [6] aim to select AS disjoint paths which separate earliest from the direct path. This can still yield a large number of candidate paths from which a selection needs to be made. We propose more elegant graph based algorithms based on ToR (Type-Of-Relationship) graphs, which lowers the candidate path list over EDR by a factor of half to an order of magnitude in up to 60-70% of cases while yielding alternate paths with similar delay benefits to EDR.
The third contribution of this thesis is to establish methods to detect and reduce the effects of topology estimation errors. While, path measurement only requires the services of overlay
26
hosts, routing matrix estimation requires the information about the underlay network, which is out of the domain of control of the overlay and so may contain inaccuracies; e.g. routers may reveal inaccurate or false traceroute information. We first propose a light weight algorithm to detect false routing information from trace routes. We also propose heuristics aimed at perfecting statistical path measurement techniques based on the accuracy of such routing matrix estimation. Such techniques leverage topology information inferred from the routing matrix to select a few paths for monitoring that can lead to path quality estimation for unmonitored paths [7-8]. However, if the routing matrix cannot be determined accurately, these techniques can yield large path estimation errors. Our work shows that removal or mitigation of such routing matrix inconsistencies (RMI) using robust statistical methods alone can improve such path metric prediction by 10-20% and nonnegligible benefits for anomaly detection on unmonitored paths.
1.2 Dissertation Overview The remainder of this thesis dissertation is organized as follows. Chapter 2 presents an in depth overview of techniques for alternate path exploration in the Internet including a rigorous analysis of design criterion for Overlay Networks. Chapter 3 describes the Internet datasets used for tracebased simulations used throughout this thesis. The next three chapters of this thesis are divided into two separate parts addressing the issues of scalable architectures for alternate path selection (Chapters 4 and 5) and path monitoring in Resilient Overlay Networks (Chapter 6). Finally, Chapter 7 concludes this dissertation and outlines some future research directions.
27
28
2 LITERATURE REVIEW 2.1 Introduction The Internet seems to work most of the time but sometimes recovery from failures is painfully slow. For many of the user perceived performance failures/faults, e.g. delay in loading a web page or patchy audio in a VoIP session, there exists a possibility that using an alternate path may offer better QoS. Often, such alternate routes remain unexploited due to the scalability objectives of the BorderGateway Protocol (BGP), the de facto Internet inter-domain routing protocol that connects all networks into one giant Internet. BGP is primarily designed for scalable dissemination of network reachability information according to shortest paths compliant with the commercial traffic transit policies of ISPs. Incorporating QoS based routing decisions in BGP route selection would defeat its primary purpose of scalability, as QoS checks on paths need to be made more frequently and individually than mere reachability checks on aggregate IP blocks. There are also no inter-ISP benchmarks for acceptable levels of QoS which are defined by individual user applications that may be sensitive in different ways to the levels of delay, throughput and packet loss. Moreover, if such QoS based routing decisions could be incorporated into BGP it could cause route flapping; a phenomenon in which many path updates are triggered when one of the advertised routes repeatedly updates itself due to the distributed nature of BGP for learning global paths. This problem is bad enough in BGP when exchanging network reachability information alone; and to prevent this problem BGP inhibits frequent path updates; this can sometimes cause BGP to take several minutes to learn and apply path updates [2]. Internet applications e.g. VoIP applications need to meet their QoS demands, so they could benefit by tapping into the existing path diversity in the Internet for better paths which go unexploited by BGP as explained earlier. Research focuses on several interesting solutions for scalable end-to-end path discovery on the Internet without modifying the underlay framework; these techniques include deployment of overlay networks [4, 9], providing redundant network connections to end users through multi-homing to several ISPs [10] or a combination of the two [11]. Other proposals call for changes to the underlay network routing mechanisms [12-14]. We consider each of these proposals in Section 2.2. These proposals have already been experimentally deployed over the Internet but it will still be some time before their use becomes widespread. We then review proposals that call for changes to underlay routing mechanisms (Section 2.3). These proposals are still in the early stages with no experimental deployments. Finally, we review the
29
benefit of multi-homing (Section 2.4) which although it emerged as the first solution to create path diversity in the Internet faces stagnation now.
2.2 Exploiting Path Diversity in the Internet through Overlay Networks A natural approach to evaluate the extent of path diversity in the Internet would be to see how many different end-to-end paths are possible between all hosts. Figure 2.1 shows the path between an end host in University of New South Wales (UNSW), Sydney, Australia and a host, www.example.com, located in California, US. UNSW typically uses the services of bigger provider ISP such as AARNET (Australian Advanced Research and Educational Network) for its connection to hosts in the continental US. Most service providers, like AARNET, using the hot potato routing principle [15], will try to kick this traffic outside itself quickly at its nearest inter-domain egress point to send it to its US based destination. Traceroute shows that the original path uses an egress point of AARNET at Sydney that takes the packets to www.example.com via a router in Honolulu, Hawaii to an ingress point in Los Angeles in the US. Overlay Networks can exploit Internet path redundancy by deflecting packets away from the original path if it suffers from an outage. Now consider the situation, if the end host in UNSW and the host, www.example.com formed part of an overlay network together with another host inside CMU (Carnegie Mellon University). Now if CMU were to be used as the intermediate relay host assuming there was a fiber optic link fault on the default path via Honolulu, or this path had become congested due to a sudden surge in traffic. The new path used now uses an AARNET egress point at Sydney as before but takes the packets to a different ingress point inside the US, northwestern Seattle instead of south western, Los Angeles. Under normal circumstances, the original path has a delay of 150 ms. The one-hop alternate path has a delay of 318 ms (=234+84ms). This is expected as we span the width of the continental US twice in going to CMU, causing large path inflation. On the other hand, if we had picked an intermediary host situated very close to www.example.com (instead of CMU), it would have most likely used the same path as the direct one, as it would be highly unlikely to impact the traffic routing policy of AARNET. Thus, in choosing such a host, one must be very careful to get the optimal compromise between achieving path diversity and reducing path inflation.
30
234ms
•Seattle •Pittsburgh •www.example.com
84ms •Los Angeles
150ms
CMU
•Honolulu (Hawaii)
UNSW •Sydney
Direct path to example.com from Sydney Alternate path via CMU
Figure 2.1 Direct path between UNSW and example.com and a one-hop overlay path via CMU.
This simple example demonstrates how alternate path selection via overlay networks can help in tapping into the Internet path diversity. Furthermore, it also makes it clear that overlay networks help in exploiting the path diversity by changing ingress or egress points through ASes and thus routing through other ASes disjoint from the original path. This will become more clear in the following sections. It also highlights the importance of choosing the intermediate host to act as a detour, wisely. Several independent research findings [16-17] have shown evidence of path diversity in the Internet. Savage et al. [17] showed that for almost 80% of the paths used in the Internet there is an alternate route with a lower probability of packet loss, and for 15% of the paths, there is an alternative that offers an improvement in latency better than 25%. Similarly, Gummadi et al. [16]
31
showed that 54% of random path and performance failures could be masked by detouring packets to an intended destination via an intermediate host. Overlay networks [4] provide a systemic framework for exploiting the path redundancy in the Internet. Overlay networks are a group of end hosts in the Internet that agree to route packets between each other to exploit the topological redundancy in the Internet. For example, when the direct Internet path between the source x and destination y may fail or undergo a performance failure, it may be possible to use an alternate path by first detouring packets towards an intermediate host z before sending them towards the destination. Such a path is called a one-hop overlay path as described in Chapter 1. This is possible if the Internet paths between the source and the intermediate host, and the intermediate host and the destination are not affected by the failure due to being spatially disjoint (Figure 2.2a). Then the aim of the overlay network is to find an intermediate (relay) overlay host z ( z ≠ x, y ) to act as a relay in between source x and destination y such that the composite overlay path
x − > z − > y can optimize some path metric such as reduce path delay or packet loss rates, or increase bandwidth or data throughput.
32
z1
Source x
INTERNET
Destination y
z2 Direct Internet Path suffers from outage/ service degradation
End-hosts at edge of network
Possible one-hop overlay paths between any two edge-hosts
Nonoverlay Source x
z1
INTERNET Nonoverlay Destination y
z2 Direct Internet Path suffers from outage/ service degradation Alternate Tunnel
Non-overlay hosts at edge of network RON host
Figure 2.2 (a) (top) Possible one-hop overlay path between end-hosts when the direct Internet path suffers from outage/service degradations. (b) Overlay tunnel establishment
33
Overlay networks may be used to find and use such one-hop alternate paths to route around path failures. Several factors (Table 2.1) affect the resilience and performance of an overlay network as described in following subsections. The degree to which such alternate paths can be spatially disjoint from original paths between hosts is a function of the physical geometry (spatial characteristics) of the overlay network relative to the underlay network (Internet). For example an one-hop overlay path via an intermediate host may be seemingly disjoint from the direct Internet path but may share several underlay links in the underlay network. Similarly, the efficacy with which one of several alternate paths is selected depends on the ability to monitor the metrics of all one-hop overlay paths in the network. Note that the architecture just described assumes that selecting alternate paths to avoid path degradation is limited to the intra overlay paths. This poses an obvious question, “Can non overlay based source-destination pairs benefit from such path diversity?” For non-overlay sources and/or destinations, the alternate path computation described earlier could take the form of alternate tunnel computation between overlay hosts closest to source/destination (Figure 2.2b). Such non-overlay hosts intending to optimize their path selection would then have to subscribe to such a RON service where the packet forwarding along an alternate tunnel would be handled by them. The first decision in designing an overlay network is in where to place overlay hosts. Often hosts cannot control their location, so the next decision is which hosts to select to use for a one-hop overlay path. After overlay construction, comes the main (and inter-twined) task of overlay link monitoring and path selection. Sometimes the path monitoring and path selection decisions are application centric, as different Internet applications may have different QoS needs for which specialized packet detouring techniques need to be addressed.
34
Table 2-1 Factors affecting resilience and performance of overlay networks.
Overlay Network
Techniques discussed in literature
Property
Overlay topology
(i)Full-Mesh (Clique)Topology [4] (ii)Tree-based topologies [18-19]
Why Important?
Overlay Resilience, Scalability of Monitoring Paths
(iii)Bottom-up Approaches [20]
(Section 2.2.1) Knowledge of path-
Monitoring overlay links
(i)Topology-Unaware approaches [4] (ii)Topology –Aware approaches [8, 21-26]
performance to make timely decision for switching to better paths (Section 2.2.2) To select maximally-
(i)Disjoint Paths [6, 27] (ii)Path-Ranking
Selecting overlay paths
based
on
Performance
Metrics [4, 28] (iii)Using Path Diversity in Large CDNs [29-31] (iv)Using paths preferred by large CDNs [32]
disjoint path in the overlay network with least probability to fail when failure on primary path between two hosts (Section 2.2.3) Meet applicationspecific QoS demands (e.g. latency,
Detouring Packets
(i)Active and Reactive Schemes [4, 28]
throughput, loss-rate,
(ii)Multi-path routing schemes [9, 31]
multicasting such as for gaming, video conferencing) (Section 2.2.4)
35
2.2.1 Overlay Topology The topology of an overlay plays an essential role in the scalability of path monitoring and the accuracy in predicting alternate paths. An overlay network basically starts out as a group of participating hosts willing to route traffic for each other. A logical topology is formed based on decisions of establishing links between some or all hosts. Such links, often described as overlaylinks, may traverse several underlay links and two overlay links may share underlay links. This section surveys several proposals that have been made in this regard including the full-mesh topology; i.e. to establish a link between all overlay hosts, to more scalable tree-based and distributed approaches.
Full-Mesh (Clique) Topology RON [4] used a full-mesh architecture, in which individual overlay hosts are connected with all other hosts in a logical mesh. Each peer probes overlay links connecting it with all other hosts, and the measured path characteristics are disseminated in the network through link-state flooding (Figure 2.3(a)). This architecture is ‘ideal’ in the sense that each individual peer can find an alternate path with high probability by knowing the current performance of all overlay links. However, the associated overheads in such an architecture are O ( N 2 ) for N overlay-hosts, which limits the scale of such an overlay networks to 50 hosts [4].
Tree-based topologies Alternate overlay topologies have been proposed [18-19, 23], for achieving scalable overlay link monitoring. Monitoring overlay links between all pairs of overlay hosts is clearly inefficient when we observe that a large number of links may actually be shared amongst overlay links due to the power law topology of the Internet [33], which suggests that a few links are used by many paths. Tang and Nakao [19, 24] showed that it is possible to prune the overlay topology to remove redundant links. For example, one of two overlay links can be removed that have in common a large number of underlay links or removing an overlay link which is unlikely to be selected by the overlay routing algorithm. For example several overlay links between hosts in North America and Europe may traverse the same intercontinental fiber optic link. Monitoring only one such path could yield bounded performance estimation on all paths, since the major portion of the path delays on all such paths would be encountered on the intercontinental fiber optic link. Using the same argument, Li [18] and Nakao [19] proposed that mesh topologies can be reduced to a single tree or multiple sub-trees by pruning redundant overlay links. Overlay links are redundant when they overlap with each other at the network layer, as outlined by our previous example.
36
C D
A
E
B
F
Logical Overlay Topology (Full-Mesh)
C D
A E B
F
Physical Topology
H F
G
I J
E C
D
A B
Minimum-Spanning-tree to prune edges
19
2 6
2 A
3
9 9
9
7
2
C
5
G
9
E
4
H
4
F
4
1 I 3
J
9 14
D
28
2 Physical Topology
B
(Non overlay nodes excluded for clarity)
Figure 2.3 (a) (top) Full-Mesh Overlay topology and corresponding network layer topology.
(b) Constructing Minimum-Weight spanning tree to prune overlay topology by removing edges.
37
Li and Mohapatra [18] used a minimum-weight spanning tree (MST) algorithm to connect all overlay hosts which minimizes overall connection cost, i.e.
Minimize
∑c e∈E
e
(2-1)
where
E = ( e 1 , e 2 ,..., e k ) is the set of overlay links e = 〈 v i , v j 〉 and v i , v j ∈ V V = ( v 1 , v 2 ,..., v n ) is the set of overlay nodes ce is sum of weights of edges representi ng any desirable metric, e.g. latency, as shown in Figure 2.3(b). However, removing overlay edges may achieve desired scalability at the cost of resilience, as some crucial overlay link information is lost while pruning edges. Topology-aware heuristics can play a crucial role in the decision to remove or retain an overlay link when constructing such trees. For example, Eriksson et al. [34] provided evidence that it is possible to cluster hosts that share network paths which can help towards constructing sparser spanning trees. Another problem with MST construction is its dependence on accurate link costs, which may again vary due to differing levels of network congestion. This would require path probing on all overlay links, even if not frequently to update link costs for recomputing the MST.
Distributed topologies Both mesh and tree based topologies are aimed at connecting all or the majority of overlay hosts together. This may sometimes be not feasible for very large networks. A more scalable approach here would be to adopt a distributed architecture like CDNs [35-36], where each overlay node has a degree which is low, of the order of lg N for a network with N overlay hosts. Another architecture is proposed by Lee et al. [37] and Rakotoarivelo et al. [38], where overlay hosts record their path measurements to a few super-hosts in the network and the super-hosts maintain a database of network path measurements. This database can be later queried by all hosts seeking to optimize QoS between them and other hosts. Load balancing concerns may also warrant careful choice of super nodes. An obvious caveat with such an architecture as proposed by [37] is its shift from the aggressive path monitoring approach of RON to a more passive one. For example, querying a database may waste valuable time and then there is also the issue of staleness of the path information fetched. For example, a database having recorded a path as good may not have registered it going bad when such path queries are made. We present a distributed architecture in Chapter 4 where super nodes and detour sets are selected using a combination of landmark based approach and data mining. We show that it is possible to tactically choose a small set of detouring
38
nodes in order to find a reasonably good QoS optimized path with high probability, reducing O ( N 2 ) path monitoring overheads for N overlay-hosts to just O (N ) .
Topology based on Evolutionary Approach Early works (e.g. [4]) chose arbitrary locations/sites for overlay hosts which already gave them remarkable performance gains. Anderson et al. [4] were already able to recover from around 60% of Internet failures successfully. The authors of some studies, (e.g. [39-40]) focused on optimizing overlay node selection and proposed bottom-up strategies. Chun [39] considered overlay construction as a ‘non-cooperative game’ played by selfish hosts where each tries to minimize the number of overlay links it establishes by utilizing links established by other hosts. Slight modifications to the rules of the game result in wide ranging overlay topologies, from complete meshes to trees and node-degree distributions that range from exponential to power-law. Han et al. [40] also considered a bottom-up approach, and consider the problem of picking overlay hosts for maximum path diversity in the overlay network. They found that for minimal sharing amongst overlay links, overlay hosts should be in diverse ISPs that have no peering relationships with each other.
2.2.2 Monitoring Overlay Links Dynamic overlay link monitoring is essential in order to quickly recover from a failure in the underlay network through the use of an alternate one-hop overlay path. Literature [4, 28] suggests that monitoring path quality is best when using dynamic-online algorithms. However, the overheads of such techniques are large and are not scalable beyond a modest overlay size. There are a few proposals [18-19, 23-24] to reduce such overheads using topology-aware approaches.
Topology-Unaware Approaches The pioneering work in RON [4] connected overlay hosts in a full-mesh topology. Path quality is monitored by probing all overlay links between hosts in the network (Figure 2.4a); and distributing such measurements between hosts using link-state protocols (Figure 2.4b). Probing all overlay links aggressively and subsequent link-state flooding generates a large overhead. The routing overhead in an overlay topology with n hosts and average node degree d is [18]:
n × d × number of probing messages + n × ( n − 1) × number of link state messages
(2-2)
39
Anderson et al. [4] found that the probing overhead for 50 hosts (in a mesh topology) is approximately 30 Kbits/s of outgoing bandwidth per node when path probing interval is 12 seconds. We will use two Internet datasets RIPE [41] and AMP [42] (more details, Chapter 3) to evaluate the heuristics presented in this thesis where end-to-end path measurements (path delays) are made at average intervals of 30 seconds and 1 minute, respectively. Other approaches for monitoring paths include distributed approaches [37] (described earlier) where overlay hosts report their path measurements to super hosts, which can be queried later by other overlay hosts.
Topology-Aware Approaches Several research papers [5, 8, 19, 21, 23-24, 43-44], aim to reduce path monitoring overheads in overlay networks by leveraging network layer topology information. Several works propose graph-
te, ss ra et lo Pack
ate emin Diss
c aten ut, L ughp o r h T
re easu all m
y
ts men
Figure 2.4 (a) (top) Probing overlay links. Each overlay host probes paths to all other overlay hosts for measurement of path-metrics such as latency, throughput and loss rates. (b) Link-State Dissemination Protocol is used to share such measurements between all overlay hosts.
40
based approaches to reduce the mesh topology to a tree-based overlay topology with fewer overlay links to monitor. Tang and McKinley [23] proposed monitoring overlay links based on the application of normal and weighted variants of the set-cover algorithm, i.e. selecting overlay links which include as many unshared underlay links as possible. Finding the set cover is a known NPhard problem [23]. In this approach, they used a greedy algorithm for an approximate solution. This leads to performance estimation for a large number of overlay links while actually monitoring a small subset. A similar approach as been used by Madhyastha et al. in the iPlane project [45-47], to develop a distributed path monitoring system that can be used to predict path metrics based on shared components between paths and clustering endhosts based on BGP atoms [48] and developing a compact library of Internet measurements for peer-to-peer applications. Such techniques can yield good upper-bounds on path estimation while reducing overall path-monitoring overheads. Chen et al. [21] developed an approach to find a set of k paths which can be used to calculate performance of all n 2 end-to-end paths between overlay hosts (overlay links) for n overlay hosts with k 100 because of the power-law topology of the Internet. Chua et al. provided evidence in [8] that the set of r paths have disproportionate amounts of information and a small subset of the r paths can be used to statistically predict path metrics of all remaining unmonitored paths to predefined tolerance levels. Similarly, Song et al. [26] also reported substantial gains when using Bayesian estimation. Naidu et al. [49] claimed that since the main aim of overlay path monitoring is anomaly detection, further reduction is possible over the set of paths necessitated by Chen et al. [21]. They showed that up to 50% path reduction was possible by formulating an LP problem for selecting paths based on the knowledge of joint probability distribution of link delays. Coates et al. [22] studied the problem of path reduction in further detail and found that the reduction brought by [8] could be reduced by an order of magnitude if certain signal compression techniques, e.g. diffusion wavelets, were applied to incorporate both temporal and spatial path correlation.
42
One shortcoming of many of the above approaches is that while routing matrices, link and path characteristics may be easy to accurately obtain for some individual large ISPs and overlay test beds used in their case studies, they are not very easy to obtain for overlay networks with hosts deployed across different ISPs [50]. As we mentioned earlier, while path measurements require coordination between participation between overlay hosts only, topology estimation requires participation by non-overlay based elements e.g. routers. As a consequence, topology estimation is often inaccurate or incomplete. We review the impact of incorrect topology estimation using evidence from real world Internet datasets in Chapter 6 on such techniques [8, 21] and propose ways to identify and alleviate such errors.
2.2.3 Selecting Overlay Paths As discussed in the previous section, monitoring overlay links can help in alternate path selection. In worst cases, path decisions may need to be made in the presence of stale or no link performance information. Here we highlight a few key ideas used for end-to-end path selection using overlay-based techniques.
Disjoint Overlay Paths Several researchers [6, 27] have argued that since Internet paths are often stable on time-scales of days [51], maintaining complete topology information of the overlay network allows one to select the most disjoint alternate path without the need for path monitoring. This latter approach may work for path outages but sometimes may not be very efficient for ensuring strict application specific metrics, like delay, throughput etc. For example, path delays may not always be a simple function of fiber delays but a combination of fiber delays, congestion on individual links and packet queuing delays in routers. This makes path monitoring to meet application-specific QoS demands more difficult than merely ensuring spatial diversity. Nevertheless, the bulk of the thrust of new research is centered on improving design heuristics to choose disjoint overlay paths, which is a key factor in reducing the overheads and improving resilience at the same time. However, such disjointness needs to be established at the network layer of the network; two overlay links that are seemingly disjoint at the overlay layer could still share a link in the underlying IP layer. The shared IP link renders both useless in the event of path failure. A previous study [6], showed that an Earliest Divergence Rule (EDR) (Figure 2.6) can work well by selecting the alternate path which diverges at the earliest point from the default-path near the source. This technique assumes availability of AS level paths (from source overlay hosts to detouring overlay hosts). In Chapter 6, we show that traceroutes and other tools used for mapping
43
paths are known to reveal path information inaccurately [50, 52-57]. A second assumption of this technique is that the one-hop overlay paths that diverge earliest will also be the ones that converge latest with the direct paths. In Chapter 5, we present a more flexible Maximum Divergence Rule to pick an alternate path most divergent from both the source and the destination part of the original path using an AS Type-of-Relationship (ToR) graph that can be built with partial AS path information. Chapter 5 reveals that such an approach can reduce the number of candidate paths compared to using EDR [6] . New directions in research focus on making the overlay ‘topology aware’. One study [58] proposed utilizing routing-underlays to give better information about the underlying IP topology of the overlay network, so that only a subset of the overlay hosts (with orthogonal IP links) would be probed and considered for disjoint path selection. Instead of using dynamic online algorithms to monitor overlay paths, interestingly offline processing of path measurements can reveal spatial relationships (disjointness) between paths. Cui et al. [59] proposed a method which establishes performance-related correlations among the behavior of overlay links, e.g. link-latency. Such correlations can then be used to find a backuppath for a given primary-path between two overlay-hosts with least correlated-failure probability by solving the following optimization problem:
Minimize ∑ (i,j)∈Ε
0
∑
(m,n)∈ Ε 0
(2-4)
xij ymn Pr (Lij Lmn )
where: Alternate Path via an overlay host whose path diverges earliest from direct path End-host C AS P
AS A
AS B
AS Q
AS C
AS R
AS D
AS E
End-host A
End-host B Default Internet Path
Figure 2.6 Earliest-Divergence Heuristic to select disjoint alternate paths
44
E0 is the set of all overlay links xij and ymn are flows on primary and backup paths, respective ly which are set to1 if overlay links Lij and Lmn are used by primary and backup paths respective ly, else 0 Pr( Lij , Lmn ) is the joint failure probabilit y of overlay links on primary and backup paths
The above minimization problem can be coupled with other constraints such as delay bounds on the backup path. Such optimization problems become NP-hard for a large number of variables and constraints and are suitable for small networks. Moreover, the technique requires synchronization of participating hosts which may be somewhat difficult to achieve in large networks. A similar idea with a slightly different objective has been pursued by Antonova et al. [44], with the aim of finding the optimal way to split a video stream over multiple paths with bounded delay requirements.
Path-Ranking based on Performance Metrics A large amount of research discusses choices of an appropriate performance metric such as latency, throughput and loss rates for selecting backup paths in the overlay. Paths are ranked on the basis of these metrics using scoring functions; these range from weighted-moving averages over finite temporal windows to statistical approaches [4, 28, 59]. RON [4] distinguished between different paths on the basis of latency, throughput and loss rates, making the choice of ranking paths application-specific. Similarly, Kawahara et al. [60] and Uchida et al. [61] proposed selection of alternate paths by ranking the overlay nodes in order of frequency with which they provide an optimal path by acting as a relay node. Zhu [28] used available-bandwidth for alternate path selection claiming latency, loss rates and throughput metrics could be ‘misleading’ as they often depend on the protocol implementations, network heterogeneity or temporal effects. It argues that throughput is a function of TCP parameters and thresholds set for detection of allowable loss rate and latency could be misleading because of the dynamism and heterogeneity experienced by the network. Similarly, Lee et al. [37] measured capacity of overlay paths and selected paths based on available bandwidth criteria. Hu and Steenkiste [62] showed that in comparison to delay estimation on end to end paths, bandwidth is often bounded by the (bandwidth of) bottleneck links. Identification of such bottleneck links is often easy as they are often within a radius of within three to four IP hops from end hosts as the links in the core of the Internet tend to be over provisioned. Measuring the performance of only the bottleneck links combined with certain rules used in Internet path decisions such as shortest, valley-free paths (Chapter 5) [27, 63] can reduce the O ( N 2 ) overheads for a N host overlay network to linear overheads of O (N ) .
45
Using Path Diversity in Large CDNs CDNs [35-36] were motivated by the desire of scalable content distribution using cooperative hosts. Such overlays are often based on logical topologies based on Distributed Hash Tables (DHTs) [36, 64-65]. Every participating node and content (files) stored on the network is identified by a unique identifier (key) in the DHT identifier space. Each peer also maintains small distributed localized routing tables having entries for a small number of neighboring hosts (also identified by unique identifiers). Routing involves initiating a search for a key (query) to a neighboring peer closer to the key value than present node (Figure 2.7). Alternate paths between two edge hosts can thus be found in such CDNs via intermediate peer/s in a similar fashion to RON, once the direct Internet path experiences an outage. However, one issue warrant attention; the DHT based identifier mapping does not ensure that two neighboring hosts are also close in the underlying physical network. A landmark-based approach is proposed in Brocade [30] to counter both problems using a small number of super-hosts to ensure overlay routing does not incur large path stretch by using short-cuts between distant routing domains. New design proposals [29, 64] effectively try to lower both the number of overlay hops and optimize path metrics such as latency, throughput etc.
Using paths preferred by large CDNs to serve content Studies, e.g. [32], showed that it is possible for small overlay providers to use network
Direct Internet Path
Alternate Path through Structured-Overlay
Figure 2.7 Using Key-Based Routing (KBR) to find paths between two end-hosts [36].
46
observations from large CDNs, e.g. Akamai [7, 66]. It shows that a single-hop indirection through an overlay node close to the ‘preferred’ Akamai server to serve content can be effective in establishing an end-to-end path between hosts with desirable end-to-end path performance (Figure 2.8). Large CDNs already optimize the path selection problem and this can be leveraged by overlays. Some motivating facts found by the same study show that in some instances up to 200 CDN mirror sites were used to serve content over a 48 hr period and that sometimes CDN content was served by a mirror outside even when an Akamai server as close to the source due to time of day effects [32]. However, the two major issues are: (i) the surety that an adequate level of service from the CDN is available near all overlay hosts; (ii) large CDNs may use techniques to hide locality information about the servers and served-content to prevent exploitation.
2.2.4 Detouring Packets While the previous section addressed generic alternate path selection problems, path selection decisions could be more driven by more application specific objectives. Different flows in the Internet may have different application-specific QoS demands [67-69]. A real-time application, such as a VoIP packet can tolerate some loss but no delay and requires a different packet-detouring strategy than a packet in a ftp session which can tolerate delay. Similarly, applications using different transport mechanisms (UDP or TCP) may require application-specific, packet-detouring strategies. Following are the two main schemes which we identified from published literature.
47
Drafting:Select overlay node near a server ‘preferred’ by Akamai-to serve content
Servers preferred by Akamai Direct Internet Path suffers from outage/ service degradation One-hop Indirection using an overlay node near a server preferred by ’Akamai’,- to serve content
Servers NOT preferred by Akamai
Overlay hosts
Figure 2.8 ‘Drafting’ behind Akamai servers. One-hop indirection through an overlay node. The
overlay node is selected based on preference of Akamai-to serve content from one of its severs.
Reactive and Proactive Schemes There have been two popular schemes to detour packets on alternate paths. Primary internet paths and alternate (overlay) paths between end hosts may be aggressively monitored for performance metrics. Reactive schemes [4] use an alternate path only when the primary Internet path fails to deliver the required QoS. Proactive schemes tend to be ‘selfish’ and may opt for the best path using a greedy approach. While the proactive scheme optimizes path selection for some flows, Zhu [28] showed that it may cause: (i) oscillations in the network due to frequent path swapping hurting nonoverlay traffic; (ii) use of longer paths often for minor performance gains and thus, increasing the traffic-load on the network. This shows that the proactive scheme while intuitively desirable appears to be extremely detrimental to global network welfare.
48
Multi-path Routing Schemes Research [4] indicates that alternate paths between end hosts may fail independently of each other, since routing domains which are independently administered rarely share underlay links. Some studies [9, 31] investigated the reduction in path probing overheads possible by sending redundant packets along multiple overlay paths. Assuming the probability of packet loss on one such path to be pi , the probability that a packet will be lost if sent on N redundant paths is: N
Predundant = ∏ pi
(2-5)
i =1
To further reduce the probability of packet loss, advanced encoding schemes e.g. Forward Error Correction (FEC) schemes may be used to detect and correct errors, and hence tolerate packet loss. While Zhao [31] claimed positive results of using constrained multi-cast for ensuring end-to-end path in the face of failures, Anderson et al. [9] concluded that such schemes can only prove useful when links are suffering from low levels of congestion. Moreover, another alarming finding by the same study is the fact that failures on alternate paths on an overlay network are often more correlated than previously imagined; a packet loss on one path decreases the conditional loss probability for success of the redundant packet on an alternate path to about 60 percent. Even packet-encoding schemes such as FEC lose their effectiveness when path-failures are correlated. Moreover, a large number of packets sent on the network unnecessarily consume network resources, increase network load and rob other non-overlay/overlay based flows of their true share. This technique requires critical information about the underlying IP-level structure of the overlay topology in order to achieve optimum benefits.
2.2.5 (In-)Feasibility of Selfish-Routing on Overlay-Networks There are several commercial concerns regarding widespread use of overlay networks: ISPs do not want users to participate (as overlay hosts) due to concerns that overlay networks may impact the underlay routing policies such as Traffic Engineering [20], hurt non-overlay based traffic due to greedy utilization of network resources, or introduce oscillations in the Internet due to interaction of several overlay networks whose traffic rapidly switches paths based on performance benefits [70]. One study [20] observed that selfish-routing using overlays can harm traffic-engineering goals. Overlays choose paths which are longer than direct Internet paths and may prefer certain links more than others. This increases network load and increases congestion on some links as investigated by [20].
49
Debates [39, 70] on coexistence of multiple overlays and their co-existence with the (non overlay) Internet traffic have aroused suspicions on the effectiveness of overlays in the long term. It is well understood now that overlay routing networks can provide required performance benefits leveraging upon the inherent path redundancy in the Internet. However, they actually transfer the traffic from one subset of paths to another. Keralapura et al. [70] claimed that multiple overlays performing the same function using their own greedy and selfish routing metrics in selection of overlay paths could introduce race conditions leading to unwanted routing oscillations (Figure 2.9). It finds that the probability with which two overlay networks can get synchronized increases if the multiple interacting overlays are aggressive i.e. have short path probing intervals or path outage detection times close to each other. This can happen if the overlay hosts of multiple overlay networks are situated close to each other leading to similar path round trip times used for probe timeouts, an indicator of path failure. The more dissimilar the overlay networks are in terms of locality of hosts and path probing parameters, the smaller the probability of routing oscillations [70].
2.2.6 Open Research-issues with Overlay-Networks All major research related to the study of overlay-network behavior revolves around simulations using Internet-like topology generators [33, 71-72] or few overlay test beds [4, 73]. A majority of these topology generators use the hierarchical power-law model [72]. However, some works [7475] provided substantial evidence that such static power-law models may not capture the Internet
Figure 2.9 Contention for same set of underlay links. Three overlay networks decide to use same
set of underlay links to improve QoS on end-to-end paths increasing network load (congestion) on links and also towards possible oscillations in quest for better paths.
50
topology accurately enough because the Internet evolution is dynamic process shaped by a several interconnected variables; and thus the results derived from them could potentially be inaccurate and misleading. For example, Chang et al. [74] showed that Internet-topology arises as a multiparameter optimization problem that incorporates AS-geography, AS-specific business-models and AS evolution-history. Similarly Jaiswal et al. [75] dispel the notion that ASes ranked higher in the tier-structure always have high connectivity than those in the lower tiers. This thesis uses datasets to avoid problems from artificial simulated topologies or from testbeds that are too small.
2.3 Proposals To Modify Underlay Routing Mechanisms 2.3.1 Re-Engineering BGP-4 Overlay networks aim to overcome the shortcomings of BGP, leveraging the native path redundancy present in the Internet. Some studies [76-82] argue that instead of turning to new avenues for solving problems associated with the shortcomings of the Internet in handling failures efficiently, BGP-4 could be modified to meet the requirements. Some concerns [2] about delayed BGP routing-convergence after failures mainly stem from: (i) complicated path exploration through several paths which already may have been invalidated by a single failure (Figure 2.10a); (ii) suppression of new route updates [12, 83] to prevent routing oscillations, or “route flapping”. The authors of a few papers [81, 84], suggest that path-withdrawal or other route-update messages should be appended with cause-of-failure tags (Figure 2.10b), to simplify path exploration by invalidating all defunct routes; Similarly Bremler-Barr et al. [77] proposed that in the event of failure, path-withdrawal messages can be expedited in the whole network to rid the network of unreachable routes to speed up convergence.
51
Subramanium et al. [12] proposed a Hybrid Link-state Path-vector (HLP) protocol by proposing several architectural design changes to BGP to counter its churning issues. HLP uses a hierarchicalapproach instead of the flat-architecture of BGP; the network is divided into several domains and sub-domains; each sub-domain uses a link-state protocol which has much better convergence properties than path-vector protocols. The sub-domains then use a path-vector protocol to disseminate the routing information amongst themselves. HLP also specifies a routing granularity based on AS-level rather than the IP-prefix level used by BGP. The paper shows that by adopting On detecting failure try alternate paths one by one Source
BGP speakers
Destination
Routes invalidated by failure (dashed)
Source
Route Withdrawal messages, appended with causeof-failure tags
BGP speakers
Destination
Figure 2.10 (a) (top) A single link-failure invalidates several valid routes (shown by bold arrows).
(b) Appending path-withdrawal messages with ‘cause-of-failure’ tags help eliminate all invalid routes quickly and converge to valid route quickly
52
their architecture, BGP churning could be improved by a factor of 400. The previous proposals may reduce BGP churn but it still leaves open the debate on alternate path discovery through explicit mechanisms. Kushman et al. [85] specifically tackle this problem, and propose an architecture where alternate disjoint fail-over routes are also announced by BGP which ensure quick failover (if possible) and guaranteed BGP convergence without any routing loops. They provide detailed insight in to this problem and explain what failover routes are appropriate to be announced and where should they be announced in the AS hierarchy. Similarly, Quoitin et al. [86] propose that several of the BPG inter-domain path selection parameters could actually be used for traffic engineering purposes, e.g. forced selection of one of several alternate paths. This could be achieved by selectively advertising destinations on different paths based on IP prefixes, artificially inflating cost on one of the paths (AS path-prepending) to discourage its selection or advertising preference for a path to a neighboring AS explicitly through MED (multi-exit discriminator) attribute. Similarly, Local-preference attribute that BGP uses assign fixed weights to paths through dissimilar inter-domain bandwidth links could be made more sensitive to dynamic performance through active path measurements. Another technique for an AS to exploit inter-domain path diversity is to tweak its own Interior Gateway Protocol (IGP), which is used to select an inter-domain path that leads to least internal (intra-domain) cost may. This could end up constantly selecting one of several egress points towards other ASes. More granular IGP weight tuning could exploit path diversity by choosing other paths. While individual works have addressed single problems using individual solutions, Multi-Path Inter-domain Routing (MIRO) [78] addressed all issues, proposing several architectural modifications to the BGP. The architecture shows how it is possible for ASes to advertise multiple routes for destination-prefixes through on-demand path announcements –pull-based route retrieval. Pull-based route retrieval consists of two main steps, (i) a route-negotiation step, in which an interested BGP speaker floods a query for route request and requested hosts may return such paths through selective export policies so that other hosts stay oblivious to this information exchange; and (ii) routing-tunnel establishment where hosts flood information amongst themselves for any successfully negotiated route (Figure 2.11). This technique ensures that all such negotiated paths meet BGP policy constraints through selective export policies. Not only does the architecture meet all design objectives but it also proposes an evolutionary design-approach; offering attractive incentives to network-administrators adopting MIRO while at the same time making it possible for native-BGP users to co-exist.
53
Yang [14] proposed a New Internet Routing Architecture (NIRA) in which users have the flexibility of choosing inter-domain routes by using a new IP addressing scheme that includes intradomain and inter-domain sub-addressing. However, it leaves as open debate discussions about the revenue model ISPs will need to adopt to benefit when users have the power to choose inter-domain routes.
2.3.2 Enhancing network level packet forwarding decisions to exploit path diversity The authors of one study [3] proposed that instead of BGP (and ISPs) deciding the complete inter-domain and intra-domain sections of the paths, packet forwarding decisions made at the router level could be augmented to enable choosing from one of multiple potential next hop candidates to provide more ‘choice’ for exploiting the path diversity (Figure 2.12). Path deflection is possible while forwarding packets at routers by selecting one of the candidate choices. Moreover it shows that such deflections are possible while selecting shorter loop-free paths without violating ISP rules. Routers only need to consider a few simple deflection rules while forwarding packets. Similarly,
Figure 2.11 MIRO routing example[78]
54
Motiwala et al. [87] proposed path splicing where the main underlying idea is that instead of deciding upon packet deflection hop wise, a more scalable approach would be to do it at the granularity of path segments and allow traffic to switch paths at intermediate hops. Such (alternate) path segments are often known but not used, e.g. BGP records multiple paths between two points but selects only one based on routing policies. For other protocols, e.g. OSPF, IGRP etc, which recompute new paths after a failure, multiple paths could be recorded by running multiple instances of the routing protocol after altering network parameters used for path computation, e.g. by slightly perturbing link costs. Both of these techniques require packets to be encoded by a shim-header (in between the network and transport header) in order to inform path deflection decisions which potentially incurs non-negligible packet processing overhead. These questions are left as an open debate by these studies [3, 87], and hence scalability of such techniques needs to be investigated. Also, such studies so far have only investigated the feasibility of exploiting path diversity in few large ISPs, e.g. Sprint and Abilene, where their results for path diversity might be exaggerated. Its practical benefits and deployment issues over the wide area Internet are still a challenge when we consider that due to the power-law structure, there is a large degree of link sharing amongst paths [8, 21, 33, 72] indicating that there may not be as many path deflection choices as the studies indicate.
dst ISP C ISP D
src ISP A ISP B
Figure 2.12 Path deflection decision made at router level can exploit the path diversity in the
underlay network
55
2.3.3 Fast Re-Route (FRR) construction to reduce failover times The previous section dealt with exploiting path diversity by adding flexibility to routers in forwarding packets; e.g. by adding randomization in selecting a next hop neighbor to forward the packet to. This may help in exploring alternate paths but still it does not address the issue if those alternate paths would be disjoint from the native route thus effectively bypassing the failed element (link/router). This issue can be addressed by knowing the topological diversity of the paths and precomputing all possible alternate paths that allow bypassing of the failed elements. This technique is known as FRR (Fast Re-Route) construction [88]. This method is aimed for quick recovery from faults through pre-computed failover paths. Shand and Bryant [88] highlight several key challenges in FRR construction for purely IP networks. The first is how to choose such failover paths which can be utilized by the router first detecting the fault without consulting its neighbors or waiting for the protocol (e.g. IGP) to converge towards newer paths based on the topology change reflecting the fault and the computational complexity of computing such paths without overloading routers. The question is then, how to achieve an optimal tradeoffs between the two. Such FRR techniques can be implemented at both intra-domain level (IP-FRR for IGP) as well as inter-domain level (MPLS-FRR) (Francois and Bonaventure [89]). IP-FRR for IGP
Link state protocols (e.g. OSPF/ IS-IS) used as IGPs (Interior Gateway Protocols) converge much faster than BGP – a path vector based protocol owing to the small scale of network. Recovery times of sub 200ms are not uncommon [89]. Such small delays often go unnoticed even by VoIP customers demanding quick failover times. Interestingly, a majority of this hundreds of milliseconds time period is not spend on detection of failure, flooding new routing information (updates) and recomputing routing information but in loading the revised forwarding tables into the router’s Forwarding Information Base (FIB) [88]. Having pre-computed alternate path information, which avoid failed components can definitely help in quick recovery. Failover paths inside a domain are considered so that individual routers can try to try alternate paths instead of waiting to send/received routing updates to/from neighboring routers. For example, routers could identify Shared Risk Link Groups (SRLGs) , i.e. a set of links that fail together owing to a physical commonality between them e.g. adjacent to the same router. Various proposals have been made for selecting such paths which include: Equal Cost Multi-Paths (ECMP), loop-free alternate paths or multi-hop repair paths [88]. ECMPs are paths that do not traverse the failure while loop-free alternate paths are established through a direct neighbor of a router adjacent to the failure. Multi-hop paths are more complex to compute. Such paths cannot be often
56
computed/decided wholly by one router alone; for example can be specified using a loose-hops approach or multiple routers using their repair FIBs employing label based mechanisms for path discovery (label based path switching is described in more detail in the next MPLS-FRR section). Often majority of the destinations could be reachable by using the first two basic path selection techniques with multi-hop path construction methods required for the remaining [88]. In fact, it is not just fast recovery that can be obtained but traffic engineering information can be also be gleaned and paths selected accordingly to meet QoS requirements or load balancing on the links. For example, some IGP protocols often build up a Traffic Engineering Database. This database is typically used to optimize utilization of links inside the domain and minimize the cost of inter–domain traffic intended for an outside destination traversing its network. However, optimizing these intra-domain parameters may lead to a sub-optimal inter-domain path; e.g. kicking out packets on an inter-domain segment which is experiencing congestion. Even if the primary intradomain path satisfies the QoS requirements for its share of the inter-domain paths it does not guarantee that its chosen failover path would too due to the constraints of other external domains contributing to the inter-domain path. Pre-computing such failover paths and apprising neighboring domains can yield to quick and optimal failover. MPLS-FRR
MPLS (Multi-protocol Label Switching) is another popularly emerging solution for the solution to inter-domain traffic engineering for appropriate path selection using IGP FRR. Instead of
TED PCE
TED
TED
PCE
PCE dst
PCC
PCC
PCC src Head end nodes
LSRs
PCC=path computation client
TED=Traffic Engineering Databse
PCE=path computation element
LSR=Label Switching Router
Figure 2.13. Inter-domain MPLS path construction
57
switching (routing) packets at network layer based on the inspection of destination addresses, the routes should be negotiated in the beginning according to the demands of the application. Once such a path has been found, the negotiated path segments and all packets belonging to the application are assigned specific labels, and routing takes place on the basis of these labels. Although, this proposal is nothing new and is similar in concept to previous solutions like ATM [90], current efforts are now more dedicated towards improving its scalability and extending MPLS solutions to an inter-domain level. The proposed technique [91] uses an infra-structure based approach to exploiting path diversity in accordance with user specified path performance demands (Figure 2.13). A separate entity known as a Path Computation Element (PCE) [91-92] handles this task. The head end node, also called as Path Computation Client (PCC) puts a request for a primary (and possible back-up) Label Switched Path/s (LSP) to the PCE satisfying the user specified path constraints. The PCE responds with the criteria, the LSRs (Label Switching Routers) should apply to search for the paths. Searching for paths is somewhat similar to tweaking protocol parameters such as IGP weight tuning (as explained in the previous section) for exploitation of path diversity inside a domain. Note that not all implementations of IGP/ISIS may have provision of tuning and PCE may help in such circumstances. The primary novelty of MPLS-TE is in these three areas: (a) extending these concepts to an inter-domain level; (b) its approach to consider more dynamic path properties than just exploiting path diversity and (c) computation of back-up LSPs when primary LSPs fail. To cater for the extension to inter-domain LSP computation, it incorporates a special crankback mechanism [91-92]. Put simply, each domain (AS) is responsible for computing a segment of the LSP using the services of a PCE which would pass though it without revealing its internal structure or routing policies. Large domains may have more than one PCE. When one of the the Next Hop (NH) domains (ASes) are unable to find such a path they may refer a failure message to the adjacent predecessor domain (AS). This message will then be conveyed to the PCE (of this predecessor domain) which will re-compute path selection criteria so as to exploit different egress point/s to different NH domain/s (AS). To select path conforming to the required QoS requirement of the LSP request, the PCE uses TED (traffic engineering database) maintained by IGP/IS-IS protocols with TE extensions. PCE may also return primary and backup LSPs for failover if requested.
2.3.4 Open Research-issues with proposals to modify underlay routing mechanisms Proposals to modify underlay routing mechanisms seem attractive at the outset, however, they pose some challenges. For example, are path deflection decisions as proposed by [3, 87] able to
58
scale well enough at individual packet levels? Other core issues relate to the feasibility of implementation of the proposed changes to routers to support path deflection decisions. Also, these studies solve the issue to exploitation of the path diversity of the Internet but a core problem is monitoring path quality, which has hampered the deployment of large overlay networks due to scalability concerns. Another area of practical concern is that redesigning underlay routing mechanisms such as those suggested by [3, 87] including changes to BGP [77, 79-81] exposes underlay routing to several security vulnerabilities [3]. At present, end systems do not exercise any control over the paths, their packets would take which are determined solely by the network routers. Equipping end systems with the power to influence paths may open the network to be comprised by an adversary or cause breach of commercial traffic transit policies between ISPs causing conflicts over revenue. The primary motivation of the MPLS-TE solutions is only to exploit inter-domain path diversity but also to find paths that fulfill specific QoS requirements. It is based on the premise that neighboring domains can establish trust for finding such QoS optimized paths. Since, each individual domain does not have to reveal its internal structure it means that this trust will be weak unless there is some monetary incentive attached for it to do so. Another related issue is if the primary LSP fails, each domain may have its own priority to compute a restoration paths that may not be acceptable to other participating domains [92].
2.4 Multi-Homing Solutions Multi-homing refers to solutions which allow hosts at the edge or transit providers in the core of the Internet to maintain redundant connections to the Internet which can be exploited for the purposes of finding fault tolerance, traffic engineering or optimizing QoS. Thus, multi-homing can be categorized as of two types: site multi-homing and ISP multi-homing. Figure 2.14 shows an example of site multi-homing. End-host A which is multi-homed via three distinct ISPs stands a higher chance of reachability in the event of failures on one of the access links, compared with endhost B which is single-homed. Site multi-homing is more challenging than ISP multi-homing due to the scalability issues arising from huge number of Internet hosts when compared with transit providers. Another challenging issue is to be able to switch paths of longer packet flows so that path changeover remains transparent to the flow without resetting the connection, i.e. to maintain transport-layer survivability.
59
Core (Tier-1 ISPs)
ISP A
ISP B
A
ISP C
B
Figure 2.14 Single-homing Vs Multi-homing.
Site domain multi-homing can take one of several forms. Host (stub) domains may announce single/multiple connections to single/multiple ISPs over single/multiple IP addresses [93]. Previously, the approach towards multi-homing was more liberal. Stub domains could acquire special Provider Independent (PI) addresses from the Regional Internet Registry (RIR).
PI
addresses are globally unique IP addresses which are not assigned by transit providers for their assigned address blocks. For example, if a stub domain multi-homed to two provider network is assigned a PI address, than it can advertise this to both of its transit providers which will propagate it to their own upstream providers, where it will reach other parts of the Internet for the dual connectivity of the host domain. Using PI addresses was a simple approach to multi-homing. However, this led to scalability issues together with the problem of depleting IP address space in IPv4. Presently, stub domains are only allowed to use Provider Aggregatable (PA) address. Stub domains thus consider one of their immediate provider networks to be their primary ISP and the remaining as secondary. This address is then advertised to its secondary ISPs. However, this using PA addresses becomes less useful since, due to scalability issues BGP routers do not accept destination prefixes smaller than /24. This means although the secondary ISPs would advertise the PA address of the multi-homed site separately in addition to its own (as it cannot be merged with its own aggregate), the address block advertised by the primary ISP would be a stronger match for the destination since Internet uses longest prefix matching when routing to destinations. Thus the primary ISP will be used to connect to the stub network for inbound packets until there is something wrong with its connection to the primary ISP when the secondary ISPs will be used to connect to the stub domain. Thus, the redundant paths cannot be used simultaneously to meet Traffic Engineering (TE) objectives or to achieve quick failover as dictated by the stub domain as this traditional approach to multi-homing will again depend on BGP reaction time to provide a failover path. Also, note that even using PI addresses, introduces one additional routing entry per multi-homed hosts. Huston [94] and Bu et al. 60
[95] note that the number of BGP routing entries in the Internet increased by an order of magnitude between 1995 and 2005. Many new proposals have been considered by the research community for multi-homing in IPv6 as surveyed by De Launois and Bagnulo [96], learning from the mistakes and shortcomings of multi-homing approaches in IPv4, namely to provide fault tolerance, traffic engineering, router aggregation and multi-homing independence. These include: middle box tunneling approaches through use of NAT or MHTP (Multi-homing Translation Protocol) boxes which convert PA addresses to PI addresses and newer transport protocols like SCTP, TCP-MH and DCCP [97] that enable using multiple IP addresses associated with multi-homed hosts to ensure transport layer survivability.
2.4.1 Open Research-issues with Multi-homing While multi-homing can improve availability at the edge of the Internet, overlay networks can also improve availability within the core as well as improving the performance of end-to-end paths. Effective multi-homing only requires that the customer network be reachable through two or more topologically diverse ISPs so that it can connect to the outside ‘world’ with reasonable assurance. Akella et al. [10] and Tao et al. [98] considered performance using key path metrics, delay (RTT), loss-rate and throughput when edge hosts are multi-homed via multiple providers and also have choice a of overlay paths when the direct-path undergoes degradation. The results from such studies may be somewhat biased as they report the results from ISPs which gave best results across all destinations considered. Akella et al. [10] reported that the performance-advantage is 20-40% for delay and 15-25% for throughput, when the edge host is multi-homed via three providers; increasing the number of providers beyond three results in marginal benefits. The same study [10] however, also concluded that multi-homing has only limited benefits compared to when end-hosts have a choice of overlay paths between them. This is because end-to-end path diversity in the core of the Internet can be leveraged effectively through use of overlay networks. Another paper [99], stated similar results when considering the number of shared routers and underlay links on alternate paths provided by multi-homing solutions, but interestingly also proves that overlay paths may not offer as much path diversity as previously thought. It reveals that even if the edge ASes were removed from consideration where overlay links most-likely merge; there are still many overlay links which share physical routers and links with other overlay links. Randomly selecting overlay hosts for disjoint backup paths has little probability of success. Multi-homing provides physical redundancy while working within the BGP framework. However, multi-homed hosts announce their multiple routes within the BGP framework through announcements of routes using different upstream-provider ISPs. Multi-homing has been blamed as
61
one of the leading factors for the exponential increase in the size of BGP routing tables since 1999 [95, 100]. Multi-homing creates ‘holes’ in the routing table [95] because certain subsets of IP subblocks already contained within the prefix set of one of its providers of a multi-homed AS are announced again by one of the multi-homed AS’s providers for the purpose of fault tolerance.
2.5 Chapter Summary In this chapter we provided a rigorous literature review discussing the three main approaches for providing QoS to end users; namely overlay network approaches, proposals to modify the underlay routing mechanism and multi-homing. Although the main aim of all three is identification of a path anomaly and switching over to better alternate paths, their implementation methods differ. Multihoming has limited benefits and proposals to modify underlay routing mechanisms are still in infancy requiring the efforts of the broader community. This leaves overlay networks as the promising area to tap into the path diversity of the Internet. This thesis also looks into two core issues, namely the selection of disjoint paths and reducing path monitoring overheads by exploiting overlay topology information and overcoming challenges posed when such information is not available or is inaccurate.
62
3 DESCRIPTION OF INTERNET DATASETS USED IN THIS DISSERTATION 3.1
Datasets considered and methodology for obtaining the datasets
The main focus of this dissertation is to present scalable heuristics for the monitoring and selecting alternate overlay paths when the direct underlay path fails. To analyze the performance of these heuristics, we only require the requisite end to end path metric and topology information. Fortunately records of such information are publicly available from several experimental overlay networks already deployed throughout Europe and North America. Throughout the remainder of this thesis (Chapters 4-6) we analyze the performance of overlay networks using real Internet datasets, so it is important that the methodology of obtaining this datasets is explicitly described before proceeding any further. Our datasets include two experimental networks. The first is a US based project, Active Measurement Project (AMP) [42], managed by National Laboratory for Applied Network Research (NLANR) and the second, a European project, managed by RIPE-NCC (Réseaux IP Européens -Network Co-ordination Center) [41]. Starting July 2006, CAIDA [101] took over operational stewardship of all NLANR machines and data. Our choice for these two datasets is driven by two main reasons. Both of these datasets provide (a) end-to-end measurements at small intervals (order of 30 sec to a minute), e.g. path delays; (b) network layer path information using traceroutes. Another popular overlay network dataset, PlanetLab’s All Pair Ping project [73], only provides regular end-to-end measurements; traceroutes are only conducted if an end to end measurement registers a path fault. Also, All Pair Ping’s end-to-end path measurements are made at 15 minute intervals (2005), which makes it infeasible to make accurate path selection using this dataset alone. This is because both path outages and performance failures occur on much smaller time scales; a path outage may be defined as an extended period of disconnectivity lasting few minutes in the Internet between two hosts due to a major event like a link failure (e.g. fiber cut) while a performance failure may be defined as a minor transient failure (e.g. due to router queues being congested) leading to an increase in latency, throughput or loss rates by a factor of two or three [4]. Research shows route updates following an outage may cause BGP to take up to 15 minutes [2] before converging to alternate paths; AMP dataset shows most path delay degradations last less than a minute. NLANR’s Active Measurement Project (AMP) performs active measurements between hosts connected by high performance IPv4 networks. 150 AMP monitors take site-to-site measurements. AMP monitors are mainly deployed throughout the United States (Figure 3.1). Some monitors are
63
however located outside US in Taiwan, Switzerland, Chile and Korea. The hosts considered are connected inside two virtual mesh-topologies. One is the AMP-HPC (High Performance Connection) Network comprising AMP-hosts located in US academic institutions and the second is the AMP-International Network comprising of hosts external to the US. These datasets provide one round-trip time (RTT) delay measurement for each pair of hosts per minute, and IP-trace-route information obtained around once every ten minutes. AMP avoids probing outside its own network. An IPv6 version of the AMP performs traceroutes between eleven sites. Starting July 2006, CAIDA [101] took over operational stewardship of all NLANR machines and data. The datasets used in this dissertation are from 30th June 2006 and 31st August 2006, when this work was undertaken reporting the data for 146 and 133 AMP hosts respectively. The datasets for an available 24-hr snapshot can be obtained as compressed .gz files, with delay and traceroutes between pairs of AMP hosts (Table 3.1). RIPE-NCC’s Test Traffic Measurement (TTM) measures key parameters of the connectivity between a given site and other test boxes. Like NLANR AMP, the RIPE NCC TTM system performs probing only inside its own network. It also provides routing vectors both at the AS level and the IP level from traceroutes, but does not report hop wise delays. In addition to the routing vector information, the TTM system also records, among others, one-way delay, packet loss and bandwidth. This is possible as each box in the system has GPS. Measurements have been made approximately twice a minute, starting October 2002. RIPE monitors are mainly deployed throughout Europe, with a few in the United States and Asia (Figure 3.2). These datasets however, are not available as individualized 24 hr snapshots as with AMP but are available according to user supplied queries for a particular pair of RIPE hosts and a date/time tuple. Hence to obtain the delay and traceroute data in bulk we implemented automated “GET http://” queries using shell scripts. We downloaded a 24 hr snapshot (5th September 2007) for selected 40 RIPE hosts (mostly from
Figure 3.1 Location of AMP monitors in North America [102].
64
Europe). Both the datasets used suffer from some missing data; e.g. probe being lost for RTT measurement or one way delay measurement in AMP and RIPE networks, respectively. Both datasets register missing data with specific flags and timestamps. Similarly, missing traceroute hops are marked with asterisks (*). We filter the data to remove the impact of such missing path delay data (Section 3.3) and traceroutes (Chapter 6) by neglecting such paths. In this dissertation, we select all or a subset of the AMP and RIPE monitors to behave as virtual RONs; subsets are selected especially where we need to compare results across similar sized RIPE and AMP networks. Such subset selection is random without any preference for some hosts unless mentioned otherwise. We denote such virtual RONs as AMP-SIZE-dd/mmm/yyyy or RIPE-SIZEdd/mmm/yyyy where SIZE specifies the size of the RON and followed by the date of the dataset. Table 3-1 NLANR-AMP and RIPE-NCC Datasets.
NLANRAMP RIPE-NCC
No of Hosts 146 133 40
Dataset Date 30-Jun-06 31-Aug-06 5-Sep-07
Figure 3.2 Location of RIPE monitors in Europe and the rest of the world[103].
65
3.2 Network Layer Characteristics of Overlay Paths Vs Direct Paths In this Section we consider the characteristics for overlay paths vs direct paths as seen from the datasets used in this dissertation. We present the results here for AMP networks behaving as virtual RONs. We look at the network layer properties of direct Internet paths and all possible one-hop overlay paths. Figure 3.3 shows that most of the AMP host-pairs have paths which traverse four Autonomous Systems or more. The corresponding length of the path in the underlying IP network is between 10 and 20 hops at an average of two to three IP hops per AS. RIPE gives similar results. Note as RIPE datasets records routing vectors more frequently than RIPE. We have recorded AS and IP level path lengths for all such paths. This is the reason that the number of paths exceeds the actual number of source-destination pairs.
Path Length (hops)
35
AS path-length IP path-length
30 25 20 15 10 5 0 0
Path Length (hops)
35
2000 4000 6000 8000 Source-Destination Pairs
10000
AS path-length IP path-length
30 25 20 15 10 5 0 0
1000
2000 Paths
3000
4000
Figure 3.3 Network layer path length at IP level and AS level. (AMP-146-30/Jun/2006(top) and RIPE40-05/Sep/2007).
66
Figures 3.4 and 3.5 depict the distribution of one-hop alternate paths between AMP host-pairs via a third AMP host which diverge from the direct path at the n th hop at IP and AS granularity respectively. A majority of the alternate hops diverge at the fourth or fifth IP hop Figure 3.5 or second AS hop Figure 3.4. This reveals non-negligible path sharing between direct and one-hop overlay paths. Similar results are obtained for AMP-133-31/Aug/2006 (not shown). We neglect RIPE data here because the dataset contains missing routing vectors between RIPE host pairs as bulk downloading of complete datasets is not possible.
67
100 90
n=4
% Overlay Paths
80 70
n=3
60 50
n=2
40 30 n=1
20 10 0 0
5000
10000
15000
20000
25000
Source-Destination Pairs Figure 3.4 Percentage of one-hop overlay paths which diverge from the direct path at or before nth AS-hop (AMP-146-30/Jun/2006).
n=10-20
100 90
n=9 n=8
% Alternate Paths
80
n=7 n=6
70
n=5
60
n=4
50
n=3
40
n=2
30 20
n=1
10 0 0
5000
10000
15000
20000
25000
Source-Destination Pairs Figure 3.5 Percentage of one-hop overlay paths which diverge from the direct path at or before nth IP-hop (AMP-146-30/Jun/2006).
68
Fraction of paths
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -100
-50
AMP-30/Jun/2006 AMP-31/Aug/2006 RIPE-05/Sep/2007
0
50
Delay (ms)
100
150
200
Figure 3.6 CDF of the difference between the mean path delay on direct Internet path and the mean delay on the best one-hop overlay path.
Figure 3.6 shows the delay benefit of using an alternate one-hop overlay path even if the direct path has not degraded in performance. For 80% of the paths there is a (one-hop) alternate path providing a lower value of mean delay than the mean delay on the direct path in both RIPE and AMP networks. For AMP a majority of these alternate paths can provide up to 75 ms lower mean delay than the mean delay on the direct path. For RIPE, a majority of these alternate paths can provide up to 150ms lower mean delay than the mean delay on the direct path. The disparity in these figures is due to the fact that most of the AMP hosts are connected by high speed links on the US academic network (AMP-HPC).
69
3.3
When is the Direct Internet path degraded? The direct path between hosts in the Internet is usually chosen to minimize the number of hops
(both AS and IP), which also often leads to minimizing delay. Hence, using a one-hop overlay path will usually increase delay, and so only makes sense if the current delay on the direct path is much more than the expected delay on the overlay paths. However, in some instances the Internet path itself may be inflated as shown by a previous study [104]. In such cases, a one-hop overlay path may be likely to provide a lower delay path when the direct Internet path is not actually degraded. However, using one-hop overlay paths in such manner whenever available can lead to oscillations/instability as explained in Chapter 2. Hence, we might want to add some hysteresis to reduce the switching frequency as we explain later. We use the same definition of a path anomaly as used by [6]. We define an anomaly as occurring when path metric (delay) exceeds its average value by a factor ( k ) of the standard deviation ( σ ) of the delay values in the previous 60 epochs, one hour for AMP and 30 minutes for RIPE:
Path Delay > Path Delayaverage + kσ
(3-1)
where k =1,2,3.. is a tunable parameter to trigger an anomaly for small to large delay variations with increasing values of k , respectively. These values for k and one-hour window in determining a path anomaly are typical of those used by Chua et al. [8] and Fei et al. [6]. Chua et al. [8] worked with the Abilene network; the authors collected their network path delay measurements using NLANR AMP project measurements since a subset of AMP hosts are from the Abilene network. Similarly, Fei et al. [6] worked with RIPE dataset. Note that the criteria for flagging a path anomaly on direct paths does not affect the relative goodness or badness of one-hop overlay paths that will be chosen to improve performance. Fei et al in [6] conjectured, “…which paths are good alternates to avoid delay degradations is relatively insensitive to the exact definition of delay degradation”. In the remainder of this thesis, we refer to particular degradation considered as kσ degradations based on the value of k used. We only select anomalies for which the immediately previous 60 epochs window do not contain any missing data. We select k = 3 to emulate performance failures and k = 10 to emulate path outages.
70
Figure 3.7 shows probability plots for some paths on AMP and RIPE networks with thresholds for performance failures and path outages. (The averages and standard deviation are computed over the entire path delay profile). The probability of a performance failure is approximately 1-3% while the probability of a path outage is less than 0.5%.
0.9995 0.999
0.9995 0.999
0.995 0.99
0.995 0.99
0.95
0.95 0.9
Probability
Probability
0.9 0.75 0.5 0.25
0.75 0.5 0.25
0.1
0.1
0.05
0.05
0.01 0.005
0.01 0.005
0.001 0.0005
0.001 0.0005 0
5
10
15
20
25
30
35
40
0
0.995 0.99
0.995 0.99
0.95 0.9
0.95 0.9
0.75
0.75
0.5 0.25
0.01 0.005
0.01 0.005 20 30 Delay (ms)
15
20
25
30
35
40
0.25 0.1 0.05
10
10
0.5
0.1 0.05
0
5
Delay (ms)
Probability
Probability
Delay (ms)
40
50
109.5 110 110.5 111 111.5 112 112.5 113 113.5 Delay (ms)
Figure 3.7 Probability plots for paths to show incidence of path outages and performance failures. (RIPE (top) and AMP).
71
72
PART II SCALABLE HEURISTICS FOR SELECTING DISJOINT PATHS IN OVERLAY NETWORKS
73
74
4 AN ARCHITECTURE FOR SELECTING DISJOINT PATHSGLOBALLY SCALABLE RON SERVICE 4.1 Introduction In this Chapter, we first provide evidence of path diversity in the Internet at both the IP and AS level but show that fully edge (or node) disjoint paths are often not possible between end hosts even using overlay networks. This makes it necessary to choose wisely amongst the available partially disjoint paths. We then proceed to describe an architecture for a best-effort RON service, Destination Guided RON (DG-RON); which simplifies the path exploration problem by finding topologically diverse detours, using small candidate detour sets. We also present three offline heuristics which complement each other under different spatial distributions of failures in finding available paths via DG-RON with a high probability. We show that landmark based heuristics can work well for power-law networks like the Internet for finding topologically diverse alternate paths. Our analysis using real Internet datasets, shows that it is possible to find alternate paths with a high probability while incurring low measurement and maintenance overheads. Before we proceed any further, we give a brief overview of this Chapter. The initial sections describe some findings which lead to the motivation for developing a scalable architecture of DGRON. In Section 4.2, we look at the relationship between overlay network size and the path diversity it offers. Section 4.3 discusses if some overlay hosts are better than others to mask Internet path failures. Sections 4.4-4.6 describe the architecture of DG-RON based on these observations. In Section 4.7 we present scalable landmark based heuristics in selecting an overlay host based on disjointness criteria. In Section 4.8 we evaluate the performance of the proposed architecture using trace based simulations using real Internet datasets. Section 4.9 concludes the section by discussing the findings from this study. Section 4.10 concludes the chapter.
4.2 Relationship between Overlay Network size and path diversity it offers The Internet topology evolves as a power-law network [72, 105]. In power-law networks, the outdegree d v of a node v is proportional to the rank of the node rv , to the power of a constant R i.e. d v α rvR [105] where rv is the index of a node in a sequence when nodes are sorted in decreasing outdegree sequence (ties in sorting are broken arbitrarily) and a typical value for R is
− 0.8 [105]. This means that there is a very small minority of well connected nodes which have a
75
huge outdegree while the majority of the nodes have a very small outdegree. This power law topology phenomenon is visible in the AS level topology of the Internet; there are a few tier-1 ASes which alone constitute the majority of the inter-AS links in the Internet [105]. Customer networks are unit degree ASes (i.e. only connected to their immediate ISPs if not multi-homed) typically located at the outward fringes of the network with sparse connectivity. We next see the impact of selecting a small subset of Internet hosts for tapping into this path diversity as opposed to the billions of hosts possible. Figure 4.1 shows the AS degree distribution of a large number (3828) of ASes from [106] and the degree distribution of ASes sighted on overlay paths (using traceroutes) in average sized overlay networks consisting of a few tens to hundreds of AMP hosts. Notice that when even as few as 20 overlay hosts are selected to comprise an overlay network, the overlay paths already pass through the largest tier-1 AT&T network (AS 7018 with a degree of 2351). This shows that even small overlay networks can offer a substantial amount of path diversity provided the overlay hosts are in diverse ISPs to enable as much connectivity to the tier-1 & 2 networks to expose them to the AS level path redundancy in the Internet. Physically the ASes comprising the overlay network contribute to a topology that resembles a micro model of the Internet with a densely connected core and sparse connectivity at the edges. However, due to the power-law model of the Internet only a few tier-1 ASes with high connectivity are present; a majority of the customer networks are stub networks with degree of just one, i.e. only connected to their immediate ISPs which in turn rely on the large tier-1 and tier-2 ASes for connectivity to different parts (IP blocks) of the Internet. It is obvious to see as the number of hosts comprising the overlay network would increase, the network layer topology of the overlay network would tend towards the crude Internet model depicted in Figure 4.1. From AMP-20 to the crude Internet model, the percentage of ASes with high degree grows smaller and smaller, a reduction of two orders of magnitude in ASes with degree greater than 1000. This has the effect of stretching the graph towards the left. Due to the larger number of hosts in AMP dataset we presented the results for AMP here; RIPE would produce similar results.
76
AS degree
10000 1000 100 AMP-20-30/Jun/2006
10
AMP-40-31/Aug/2006 AMP-146-30/Jun/2006 Crude Internet Model-3828 ASes
1 0.0001
0.001
0.01
0.1
1
ASes sorted according to degree (normalized) Figure 4.1 Relationship between size of an overlay network and AS degree distributions. X-axis depicts ASes sorted according to their degree-(descending order) normalized by total number of ASes.
4.3 Are some overlay paths preferred more often than others? One previous study [61] has shown that some overlay paths are preferred more often than others. In their particular case, the considered overlay network was in Japan, with overlay hosts attached to geographically separated ISP’s. They found that only 25% of overlay hosts were preferred more often than others, alleviating around 90% of the total failures. Similarly, Kawahara et al. [60] develop an approach for reduction in the number of transit overlay hosts based on their frequency of selection. This approach can help in selecting the optimum overlay path that provides the maximum performance benefit in a cost effective and scalable manner. We performed the same analysis on our North American and European datasets to see if this trend continued for other geographically diverse overlay networks. Let the source node be denoted by ν i and the destination node be denoted by ν j ( i, j = 0,1,2,..., N ; i ≠ j ) where N is the total number of hosts in the overlay network connected in a mesh-topology. Let us define the intermediate overlay hosts (i.e. detours) from ν i to ν j through ν z ,( z = 0,1,2,3,..., N ; z ≠ i, j ) at time th t as ν t , z ,i , j where z denotes the z relay node and t the time at which the direct-path between ν i
77
and ν j becomes degraded according to the criteria explained above. These paths are ranked by descending order of their delay gain metric as shown below :
Delaygain
=
Delay
Direct − path
Delay
− Delay
n th Overlay
− path
(4-1)
Direct − path
where DelayDirect − path refers to the delay on the direct Internet path between ν i and ν j and
Delayn th Overlay − path refers to the delay on the nth one-hop overlay path between ν i and ν j through an intermediate overlay host ν z . We computed the frequency with which a particular AMP or RIPE host in AMP-40 and RIPE-40 respectively, was the best relay node for a source-destination pair whose path was degraded. We use 3σ degradations for AMP-40-31/Aug/2006 and 10σ degradations for RIPE-40-05/Sep/2007 to emulate performance failures and path outages, respectively, for the results presented next based on the definition in Eq 3-1, Section 3.2. Similar, values have been used by the authors of [8], for the Abilene Network. Most AMP hosts considered in this dissertation are from North America, and are on networks with connection to the Abilene network [8]. Let us define by H = (t , i, j ) , the set of those source-destination pairs (ν i ,ν j ) whose paths were degraded at time t according to our earlier definition and denote the frequency of an overlay host node being selected as
f z of
ν z ( z = 0,1,2,..., N ) between ν i and ν j as shown below.
fz =
∑
( t , i , j )∈ P
I ( v t , z ,i , j = v z ) PD
, ( z ≠ i, j)
(4-2)
where PD is the total number of path degradations observed during the 24-hr periods the datasets were collected ( | H |= PD ), and
⎧1 if vt , z,i,j = v z ( z ≠ i , j ) I ( vt , z , i , j = v z ) = ⎨ ⎩ 0 otherwise
78
(4-3)
F[z]
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
RIPE-40-05/Sep/2007 AMP-40-31/Aug/2006
0
10
20 z
30
40
Figure 4.2 Overlay hosts sorted in descending order ‘z’ (x-axis) according to percentage of failures masked, and failures masked as Cumulative function ‘F[z]’ (y-axis)
In addition, let us define the arrangement of f z in descending order of value by f[ z ] ( z = 0,1,2,3,..., N : N = 39 for AMP-40 and RIPE-40.). Then the cumulative value of f[ z ] is defined
by:
F[z] =
z
∑
x=0
f[ z] ,
(4-4)
where F [ N ] = 1 holds for AMP-40 and RIPE-40 respectively.
We find that F [0] is about 0.1 for both AMP-40 and RIPE-40 (Figure 4.2). This study indicates that 10% of the optimal routes can be found using only one transit node. Furthermore, 50% of the optimal routes can be found using only 8 and 6 hosts in AMP-40 and RIPE-40. Around 90% of the failures can be masked using only 50% of the overlay hosts.
These results are although a little less astounding, are consistent with the results of Uchida et al. [61] and our findings in Figure 4.1. This is attributed to the greater ISP diversity inside the larger geographical regions of North America and Europe as compared to Japan, allowing for more overlay hosts to participate in better routes. They also prove even in large overlay networks, due to clustering of overlay hosts in the same BGP atoms [48], several overlay hosts provide similar levels of path diversity. This will also be addressed in Chapter 5.
79
4.4 DG-RON Clients and Services We assume that DG-RON clients subscribe to the service from the nearest DG-RON edge node stipulating services required e.g. connectivity to popular destinations but use the services on ‘pay per use’ basis where packet detouring requests are only made once the default path suffers a performance or path failure. This is to ensure that overlay based path switching does not affect nonoverlay traffic or cause oscillations by frequently swapping paths for minor performance gains [107]. We assume that the packet to be routed enters the overlay via its nearest overlay proxy after encapsulation and departs at another that is chosen by the path selection algorithm.
4.5 Overlay Infrastructure The purpose of a resilient routing overlay is to provide improved connectivity between any two arbitrary hosts on the underlay network in the face of failures. Such a service should be scalable, provide satisfactory performance guarantees, be able to handle overlay churn and provide good load balancing on underlay links. Keeping these global objectives in mind we start with a bottom-up approach in overlay construction. BGP has demonstrated the importance of hierarchy for global scalability. We choose to use architectural hierarchy to meet this objective. The architecture uses n landmarks to divide the overlay network into n logical zones and at the same time into an n dimensional co-ordinate space for inter-host distance estimation (Figure 4.3). Each of the landmarks is responsible for the bootstrapping of new hosts in its own logical zone. Landmark hosts only play a role in forming the infrastructure of the overlay but do not participate in routing. It must be ensured that the landmarks are sufficiently spaced apart for accurate distance estimation and binning of hosts. We choose landmarks based on based on topological diversity (Section 4.8). In our simulations we set n =7, i.e. 7 landmarks, which results in optimum results for inter-host distance estimation [108-109]. The landmarks could become a potential performance bottleneck in the system so a single landmark could actually be a logical abstraction of a group of machines collocated together or in close proximity of each other [110].
80
Each overlay node measures its distance from each of the landmarks as RTT (in milliseconds) between a ping request and reply; and stores the result as an n -dimensional network vector
[RTT1
RTT2
RTT3
... RTTn ] (where n is the number of reference landmarks used in
simulation). Such network coordinate mechanisms embed a network into a continuous space which is Euclidean [109]. Each overlay node then contacts its nearest landmark node to join its logical zone and to request a detour set. The members in the detour set of each overlay node are selected in the DG-RON architecture using the binning technique proposed in [110]. Each peer requests a total of T relay hosts from its nearest landmark (as explained previously). In this peer selection method a landmark returns x short distance (intra-zone) hosts from its own logical zone, and the remaining
y (= T − x) are long distance (inter-zone) hosts requested from other landmarks. The distance estimation function used by landmarks is similar to the Cartesian Distance estimation method IP2GEO [109]. The network distance is estimated between the network vectors of different hosts in different zones for each of the landmarks and the network vector of the requesting node. The network distance in terms of RTT metric between two arbitrary hosts a and b is estimated from their network vectors as shown by equation below.
Dist = {(RTTa1-RTTb1 )2 + (RTTa2 -RTTb2 ) 2 + …+ (RTTan-RTTbn )2 }
(4-5)
Destination RON node Source RON node
MIDWAY
MIN
MAX
Landmark
Default Underlay Path
RON nodes
Overlay Paths (One hop Indirection via nodes in the detour set)
Nodes maintained as detour set Source and Destination RON nodes
Figure 4.3 Finding Topologically diverse detours for underlay destinations.
81
where RTT an is the round trip time of node a in milliseconds from landmark n Selecting relay hosts using the binning heuristic ensures that the overlay connectivity is maintained and the average routing latency on the overlay is low [110].
4.6 Online Path Selection-Dynamic Path Monitoring To achieve scalability we propose using both online path probing and offline path selection heuristics. By using a small detour set the path monitoring overheads are reduced from O ( N 2 ) to constant overheads O(Nd ) per overlay node where d is the average node degree (the size of the detour set). Some additional overheads like churning of the overlay network (hosts joining and departing) also need to be catered for, since this would require reformation of zones, redistribution of hosts to landmarks and updating to detour sets. However, such events are not very common; the associated overhead is very low and there are established scalable gossip protocols for this [36, 65]. We propose scalable offline mechanisms to find alternate paths where we do not need to address the actual composition of the original underlay path suffering from a performance failure event. Once a peer obtains its detour set from the landmark it probes this detour set only. Instead of probing aggressively it may used a randomized probing scheme, e.g. monitoring paths that are more prone to changes (degradations) than others. As we highlighted earlier, the motivation of our design is scalability at Internet proportions. The randomized probing scheme does not require that overlay hosts probe aggressively for detection of path outages and failover mechanisms like [4]. If a peer from the detour set is deemed as failed for a considerable interval then a new peer can be eventually requested from the landmark. However, in the simulations we do not implement any repairs to the detour set of the overlay hosts and investigate only the static resilience of the overlay using only the live members of the detour set. This assumption is reasonable in a real deployment of DG-RON with non-aggressive probing epochs. The landmark based decentralized architecture eliminates the need for any information flooding in the network as required in link-state protocols making the design scalable for large overlay networks. Online link probing techniques such as those used by [4] are still required for performance measurements to determine dynamic performance; however such overheads are significantly reduced owing to the distributed architecture. Overlay links between an overlay peer and hosts in its detour set are monitored for performance characteristics such as latency, throughput and loss rates. Note that we only probe overlay links to candidate detours; [32] shows that predicting good detouring nodes can yield acceptable upperbounds for end-to-end path metrics. We conjecture that the underlying reason for this is the small
82
probability for many Internet links on spatially diverse paths to undergo congestion at similar times. Moreover, unlike [32] we can combine disjointness criteria (discussed next) with absolute performance merits to optimize the selection of candidate detouring nodes. To improve scalability further, we propose that probing could be replaced by passive monitoring of traffic traversing overlay links between a peer and its detour set to improve dynamic estimation of path performance without introducing any probing traffic and subsequent probing overheads. Techniques for both active network probing and passive traffic monitoring have been studied in the past e.g. [4].
4.7
Offline Path Selection- Landmark Based Heuristics
Several papers [4, 16, 32] showed that in most cases a performance failure can be bypassed using single hop indirection using an overlay node. We use the Maximum Divergence Heuristic to find such one hop detours, in which the peer chooses the next hop based on the Cartesian distance [108, 111-112] of the destination from the eligible next hop candidate relay hosts. The underlying idea is similar to the Earliest Divergence Rule (EDR) [6], which aims to select a path which diverges from the default path near the source and converges near the destination in order to avoid a failed or congested link. However, EDR assumed the availability of complete AS path information between the source-destination pair and candidate alternate paths. This is sometimes challenging as this requires accurate information from non-overlay components e.g. routers. Our architectures only relies on end to end path metrics which requires only the cooperation of the overlay hosts. Our divergence criterion is to select with good probability an alternate path that diverges from the defunct portion of the default path near the location of the failure, e.g. a congested link. Eligibility of such overlay paths may further be based on underlying network characteristics, e.g. loss rates, latency or throughput through monitoring (as explained in the next section). We need to capture the entire spectrum of disjoint paths possible from amongst the detour set overlay hosts. The first heuristic we use in searching for such divergent paths is MAX , in which we choose an overlay peer which has the maximum network distance from the destination. The underlying reason for using
MAX is to select with high probability an overlay peer which leads to a topologically diverse alternate path to reach the destination. We also search for alternate paths using MIN where we use overlay hosts close to the destination as detours. The underlying heuristic for this rule in contrast to the previously mentioned MAX rule is the observation of fact that many paths in the Internet violate the triangle inequality due to routing policies [113]. Thus, it is also possible to find a disjoint path using a peer in proximity to the destination. Instead of choosing the detours based on their distance from the destination we could similarly use their distances from the source, since the underlying idea is to exploit the whole spectrum of available disjoint paths. We refer to the heuristic
83
Input : Network Coordinates for Source S , Destination D and detour set = T1 , T2 ,..., Tn Output : Candidate AlternatePaths for Destination D from Source S Algorithm : (Find _candidate_alternate_paths) Define :Cost = Distance(ν (A),ν (B)) (where ν (A) and ν (B) are network vectors for any arbitrary hosts A & B, in the network coordinate space) for i = 1 to | T | Cost D (i ) = Distance(ν ( D),ν (Ti )), Cost S (i ) = Distance(ν ( S ),ν (Ti )) endfor A = arg min Cost D (i ), ∀i ∈ 1,2,..., | T | i
B = arg max Cost D (i ), ∀i ∈ 1,2,..., | T | i
C = arg min | 1 − Cost S (i ) / Cost D (i ) |, ∀i ∈ 1,2,..., | T | i
If MIN, NextHop = TA , If MAX, NextHop = TB , If MIDWAY, NextHop = Tc Figure 4.4 Offline Detour Selection based on Maximum Divergence Principle.
where we choose an overlay peers roughly midway between the source and destination as
MIDWAY . Figure 4.3, shows the underlying idea in the selection of detours. There may be other landmark based heuristics which we may have neglected here and may work better than the ones presented here; our main objective is to investigate if such schemes can work to select disjoint paths when the cause and location of the path failure on the primary path is not known in advance. The generic algorithm for offline detour selection is presented in Figure 4.4. The offline heuristics for selecting topologically diverse detours only require that the destination be mapped into the network co-ordinate space. This mapping can easily be managed by the landmarks for popular destinations to which DG-RON clients have subscribed. For unfamiliar destinations the landmarks could extrapolate the approximate co-ordinate vector using vectors of other hosts from its nearest landmark optionally utilizing services of a third party e.g. WHOIS servers. Only the knowledge of the destination IP address is required for both and should suffice to find the overlay based detour. Such information could also be cached as frequent requests to
84
popular destinations are made so the peer can incrementally learn about these. Next we describe three offline schemes for selection of an overlay ‘detour’ node once the position of the destination has been determined in the co-ordinate space. The offline methods based on network co-ordinates (discussed in the previous section) can embed only latency but not failure or congestion information; and thus may not adapt well for dynamic performance estimation on alternate paths. Thus, to supplement the offline path selection process online path monitoring is necessary in DG-RON.
4.8 Performance Evaluation We use trace-based simulation driven by real-world Internet datasets to validate the DG-RON architecture we present in this Chapter. We investigate the performance-benefits of DG-RON for finding QoS enhanced paths. For this study, we use measurement data between 146 and 133 AMP hosts (mainly from North and South America) from two, 24 hr datasets [114] which were obtained in 2006: June 30 and August 31. The details of these datasets have been explained earlier in Chapter 3. We deliberately choose to avoid the RIPE measurement data here because of the small number of hosts for which we collected data as downloading bulk dataset is not available and the purpose here is to investigate the performance of a proposed architecture that aims to enable RONs to scale beyond 50 hosts [4]. We let the AMP networks behave as a virtual RON. The selection of landmarks ( n = 7 ) done by selecting 7 AMP hosts which are topologically diverse to enable good network distance estimation and have delay measurements to all other AMP hosts. Accurate distance estimation is not the goal here, it is just to predict network distance with sufficient accuracy for selecting topologically diverse detours. We cluster AMP-HPC hosts as belonging to 7 geographical regions: North, North East, North West, South, South East, South West and Central US. Each of the 7 landmarks is chosen randomly from these 7 clusters so that they are spread throughout the continental United States. Hosts forming part of AMP-International network are not connected as a full mesh with all other AMP hosts; these are deliberately neglected from being chosen as landmarks. Detour-set hosts and computation of Cartesian-distance are done exactly as explained before (Section 4.5), but using the RTT measurements from the trace files; the only differences are: (1) the size of the overlay comprising of all the nodes in the AMP datasets; e.g. N = 139 (146 - 7) hosts (for AMP-14630/Jun/2006); (2) we pick a third of the detour set host from the short distance (intra-zone) overlay hosts, another third from the long distance (inter-zone) overlay hosts and the remaining third are chosen randomly from the set of overlay hosts which were responsible for alleviating the majority of the total failures (as discussed in Section 4.3).
85
Path failures are defined as given in Equation 3-1 (Chapter 3). We pick k = 3 to identify performance failures and k = 10 to identify path outages as before. Due to the way we define failure, instead of observing only the fraction of underlay failures successfully masked by both schemes; we use the delay gain metric (Section 4.3) to quantify the delay reduction when using the alternate paths in the DG-RON architecture.
4.8.1
Impact of Detour Set Size
Figure 4.5 shows the results for the delay gain metric comparison between the best possible path and the best possible path selected from amongst the detour set of a DGRON node as the detour set size is varied.
For both datasets, when there are path degradations (e.g. on 19321 (=139*139)
possible paths for AMP-146) there is at least one QoS optimized indirect (one-hop overlay) path in the RON. Using only a carefully selected detour set, as we outlined earlier, each overlay node can find a QoS optimized path for all path outages and performance failures encountered. The results are more impressive for path outages than performance failures. As explained in Section 3.2, onehop overlay paths normally have delays much larger than direct Internet paths. If the magnitude of path degradation is larger (path outage), it increases the number of one-hop overlay paths which provide better delay. Consequently, even a small detour set of 6 overlay nodes can provide exceptionally well delay gains (Figure 4.5a); delay gain of 40% or more for 90% path outages. Figure 4.5b shows that at least 12 detouring options are required for being able to select a path providing delay gains of 40% or more when direct Internet paths suffer from performance failures. As the detour set size is further increased to 48, the performance gains are marginal.
86
100 90 80
Delay Gain (%)
70 60
RON DGRON (|T|=6) DGRON (|T|=12)
50
DGRON (|T|=48)
40 30 20 10 0 0
20
40
60
80
100
Path Outages (%) 100 90 80
Delay Gain (%)
70 60 50
RON DGRON (|T|=6) DGRON (|T|=12) DGRON (|T|=48)
40 30 20 10 0 0
20
40
60
80
100
Performance Failures (%)
Figure 4.5 Delay Gain Comparison between DGRON and RON with variation in detour set size. (AMP-146-30/Jun/2006 (top) and AMP-133-31/Aug/2006.)
87
4.8.2 Evaluation of Offline Path Heuristics We also evaluate the efficacy of our offline heuristics. We select three overlay relay hosts using each heuristic; MAX , MIN and MIDWAY . We compare the characteristics of the best-of-three paths i.e. three paths selected using each of MAX , MIN and MIDWAY after sorting based on distances. We first measure physical path stretch on the QoS enhanced one-hop overlay paths selected by each of the offline path selection heuristics.
Path Stretch =
Router level hops (one - hop overlay path selected by offline path heuristic) Router level hops (direct Internet path)
(4-6) We find that MAX may look for longer paths with average path stretch of 2.2 compared to 1.8 for MIN and MIDWAY for a detour set size of 12 (Table 4.1). Note that physically longer onehop overlay paths can still provide lower delay alternate paths if the direct path is suffering from congestion -violation of triangle inequality [60]. We also evaluate the delay benefits obtained on paths selected by the offline path heuristics using Equation. 4-1, where the delay of the one-hop overlay path is through the overlay host selected by the offline heuristic. MAX accounts for finding about 45-60% of QoS enhanced paths for performance failures and path outages, respectively (Table 4.2). MIN and MIDWAY are most efficient in finding good QoS optimized paths with substantially higher delay gains than MAX , accounting for finding approximately 75-99% of QoS enhanced paths for performance failures and path outages, respectively. These QoS enhanced paths provide delays gains of 40% or higher in all cases. This shows that landmark-based heuristics can aid in selection of disjoint alternate paths and thus filter good paths from bad ones. In situations where monitoring of all paths is not desirable or feasible due to scalability issues, such heuristics can predict alternate path availability with a very high probability.
88
Table 4-1. Path stretch incurred by selecting overlay paths based on offline path heuristics (|T|=12). Path Stretch Standard Deviation
MAX 2.18 1.53
MIN 1.78 0.50
MDW 1.84 0.56
Table 4-2. Average Performance of offline path heuristics in masking failures (|T|=12). Path outages for AMP-146-30/June/2006 and Performance Failures for AMP-133-31/Aug/2006.
Average Delay Gain Percentage Failures Masked
4.8.3
MAX MIN MDW Performance Path Performance Path Performance Path Failures Outages Failures Outages Failures Outages 41.86 57.13 46.11 61.15 40.29 57.60 43.72 59.15 89.37 95.31 74.32 99.66
Comparison with SPAD
To investigate the effectiveness of the landmark based heuristics in the construction of DG-RON for selecting geographically diverse detours, we compare DG-RON with SPAD [115] (Super-Peer based Alternate Path Discovery). Several related works [8, 21] investigate lowering of path monitoring overheads by monitoring small number of paths and predicting performance on the unmonitored paths thereby still emulating RON. Very few works e.g. SPAD considers the problem of selecting a subset of peers for finding QoS enhanced paths using a landmark based distributed architecture similar to DG-RON. To emulate SPAD, we follow a similar scheme as used by the authors of [115]. A new overlay host contacts a super-peer (nearest landmark) for bootstrapping which gives it a list of 50 candidate hosts (selected randomly from all overlay hosts). From these the new overlay host selects 12 overlay hosts which are closest to it in terms of RTT. This is done based on minimum network distance in the network coordinate space. For comparison of DG-RON with SPAD we compare the performance of the best path from the detour set of each whenever a path outage or performance failure occurred. From Figure 4.6 it is evident that DG-RON can find paths with better delay gains than SPAD owing to its selection of more geographically diverse detouring options for both path outages and performance failures.
89
100 90 80
Delay Gain (%)
70 60 50 40 30 20
RON
10
DGRON SPAD
0 0
20
40
60
80
100
80
100
Path Outages (%) 100 90 80
Delay Gain (%)
70 60 50 40 30 20
RON
10
DGRON SPAD
0 0
20
40
60
Performance Failures (%)
Figure 4.6 Delay Gain Comparison between DGRON and SPAD (|T|=12). (AMP-146-30/Jun/2006 (top) and AMP-133-31/Aug/2006.)
90
4.9 Discussion The simulation results presented in the previous section reveal that landmark based offline path searching methods can work well in power-law topologies such as the Internet which can supplement or reduce the overheads of aggressive online path selection algorithms. The results in this section show there is ample opportunity for finding alternate paths even if overlay hosts are not connected in a full mesh. Considering that performance failures are short duration events, making it highly unlikely for a large fraction of links to undergo congestion or suffer from other performance degradations at the same time, DG-RON can predict good alternate paths among candidate hosts in the detour set. The proposed design for offline path selection does have some obvious caveats; the most glaring of all is the fact that the path exploration could incur some delay in alternate path discovery. We argue that to achieve scalability, this problem is unavoidable. BGP has taught us that scalability only results by marching through all possible alternate paths post-detection of a failure. The landmark based architecture can effectively predict availability of good alternate paths.
4.10 Conclusion As the Internet continues to grow, so does the diversity of the connectivity between the hosts. In this chapter we presented the first contribution of this thesis investigating the possibility of a globally scalable RON service for discovering infrastructural redundancy and robustness potentially present in the Internet. RON unnecessarily searches through a large path exploration space and the subsequent overheads associated with aggressive path monitoring pose scalability issues.
To
address this issue several previous works [6, 24] have focused on topology aware heuristics in overlay construction and link monitoring which make it possible to both monitor and select alternate paths using distributed approaches. Our work is similar to such approaches in that we aim to lower both path monitoring overheads and reduce the candidate path exploration space. In addition our work presents a platform for harnessing the findings of previous literature [6, 16] .
91
92
5 DISJOINT PATH SELECTION IN OVERLAY NETWORKS USING TOR GRAPHS 5.1 Introduction In Chapter 4 we highlighted the fact that path diversity in the Internet and overlay networks exists at both the IP and AS levels. IP level paths inside ASes are totally under the domain of the AS. However, tapping into AS level path diversity can also allow us to exploit the IP level path diversity. This chapter presents the second contribution of this thesis, namely the selection of maximally disjoint alternate paths at the AS level by using Type-Of-Relationship (ToR) graphs [116]. We again validate our findings using real-world Internet-data from the Active Measurement Project (AMP) [2] to quantify the benefits of choosing paths that are disjoint in terms of the ASes they traverse. First, Section 5.2 briefly describes ToR-graphs. In Section 5.3 we present a greedy-approach for finding maximal AS-disjoint overlay paths. In Section 5.4 we evaluate the performance of this approach using real-world Internet data. Section 5.5 summarizes the key findings of the study.
5.2 ToR (Type-of-Relationship) Graphs The Internet is composed of a large number of autonomous networks (ASes). Each AS is independently administered. To route a packet from one host to another it must pass via several different ASes. ASes can be characterized into two broad categories, transit ASes and stub ASes (Figure 5.1). Stub-ASes are located on the edges of the Internet and typically have few connections to neighboring ASes (usually one, perhaps a few if multi-homed) whereas transit ASes usually have more connections to neighboring ASes. Each sub-network learns about global reachability to different hosts in the network by exchanging route advertisements with immediate neighbors. Gao [63] first showed that within the ‘generic’ transit-stub architecture, three dominant types of commercial-relationships occurred between ASes, namely customer-provider (C-P), peer-peer (P-
P) and sibling-sibling (S-S). Customers depend on their respective provider networks for connectivity (the providers acting as a transit for them), usually in exchange for a fee. Peers (and siblings) are networks which are similar in scope and can exchange traffic (destined for each other’s customers) between each other without a fee, for mutual benefit. C-P, P-P and S-S relationships are all relative in that a particular AS can have different relationships with different adjacent ASes. Gao [63] found that the percentage of C-P, P-P and S-S relationships are roughly 90.5%, 8% and 1.5% respectively.
93
Gao [63] also showed that the Internet uses “valley-free” paths between hosts which are defined by policies. The term “valley-free” refers to the hierarchy formed by customer-provider relationships between ASes (as explained below). All ASes are classified into five tiers, with each level of tiers numbered and lower numbers denoting higher tiers (more central ASes). Tier-1 included ASes belonging to global ISPs and Tier-5 includes ASes from local ISPs. Intuitively, a customer AS belongs to a higher tier than its provider. ASes with a CP relationship should ideally be on different tiers though in actuality it is not always even possible to create a consistent model of AS relationships which achieves this simple structure. Traffic is permitted to pass up the hierarchy from customers to their providers (i.e. from higher tier ASes to lower tier ASes), at the source end of the path, but can only pass down the hierarchy (i.e. from lower tier ASes to higher tier ASes) in order to approach the destination; a provider cannot use one of its customers to connect to another provider, since that would form a valley. This favors the commercial-relationships between providers and customers so as to: (a) maximize the provider profit; and (b) avoid routing loops. Figure 5.2 shows some examples of valid valley-free paths. Formally, let Tier ( ASi ) denote the tier number of AS i , then an AS path ( AS 0 , AS1 ,..., AS n ) is said to be valley-free iff there exists i, j (0 ≤ i ≤ j ≤ n) satisfying: TransitTransit-
AS
AS
Stub - AS
Stub - AS Core
Source
Destination
Figure 5.1 Network layer paths between source-destination at AS level topology.
94
Tier ( AS 0 ) ≥ ... ≥ Tier ( AS i −1 ) > Tier ( AS i ) = ... = Tier ( AS j ) < Tier ( AS j +1 ) ≤ ... ≤ Tier ( AS n ).
(5-1)
The maximal uphill path is then ( AS 0 , AS1 ,..., AS i ) and the maximal downhill path is
( AS j , AS j +1 ,..., AS n ) . The AS(es) in the highest tier ( AS i ,..., AS j ) are called top AS(es). Type-of-Relationship (ToR) graphs [116-117] show the customer/provider/sibling relationship between adjacent ASes, using directed edges for C-P relationships (directed from from customer to provider Figure 5.2) and undirected edges for P-P and S-S relationships [63]. For consistency, and without loss of generality, P-P and S-S relationships can be represented by two directed edges by introducing a virtual-provider node in between them [5]. We adopt this technique to map P-P and
S-S edges in the ToR-graph (Figure 5.2). Note the ToR graph only depicts whether ASes are connected and (if so) their relationship (C-P, P-P or S-S) and it does not depict any performance metrics of the connection, such as delay.
C-P, P-P and S-S relationships are never explicitly revealed because of commercial-agreements. By accessing BGP advertised routes (BGP dumps), one can access AS paths (described above) which can help in inferring the type-of-relationships between adjacent AS-pairs using simple intuitive rules specified by the valley-free routing model. For example, previous works [63, 117118] use simple rules to identify valley-free paths as those having either (a) an uphill path, a P-P edge, and a downhill path in order; or (b) an uphill path and a downhill path in order (Figure 5.2). Existing research finds that intuitive approaches like the Earliest Divergence Rule [6] can help in finding disjoint-paths using the knowledge of AS paths between hosts (through trace-routes). We find that only by mapping such AS information into a ToR-graph we can use more elegant algorithms for computation of AS-disjoint paths that can give non-negligible improvement over such approaches.
5.3 Maximally-Disjoint Path Computation Using a Greedy approach 5.3.1
Finding Valley-Free Edge-Disjoint Paths
To bypass a failure affecting a path, we need an alternate path which is physically-disjoint from this primary path. Given a ToR-graph G = (V , E ) and two hosts s and t , disjoint paths between s and t can be either vertex-disjoint or edge-disjoint. Our focus is on computing edge-disjoint valley-free paths, since this problem is shown to be solvable in polynomial-time while the corresponding vertex variant of the problem is NP-hard [116]. Our main purpose is to identify ASes not used on the shortest valley-free paths (selected by
95
BGP) and thus explore alternate disjoint paths. Finding edge-disjoint paths in graphs is a well known problem and the focus of several previous works [119-120]. Computing all edge-disjoint paths between all possible pairs of vertices in a graph is a NP-complete problem [8]. However, if we are only interested in computing edge-disjoint paths between two hosts s and t, then the problem becomes tractable [8]. To search for valley-free edge-disjoint paths in a ToR graph, Erlebach et al [116] proposed a twolayer graph ( H ), constructed from a ToR-graph G = (V , E ) and s, t ∈ V (see Figure 5.4). H is a directed graph obtained by making two copies of the original graph G , called the lower and upper layers. In the upper layer all edge directions are reversed. Every node in the lower layer is connected with ‘ n ’ artificial edges to the corresponding copy of that node, denoted by v’, in the upper layer. These edges are directed from v to v' . The justification of Erlebach et al’s two-layer graph is as follows, and comes from the previously stated view of valid valley-free paths as being the concatenation of a set of forward edges (uphill-path) and a subsequent set of backward edges (downhill-path). A valid path p = ν 1 ,....,ν r in G with ν 1 = s and vr = t is equivalent to a path in the directed graph H in the following way. The forward part of p , i.e. all edges (vi ,ν i+1 ) ∈ p that are directed from vi to ν i+1 , is routed in the lower layer. Then there is a possible switch to the upper layer (there can be at most one such switch, enforced by directed artificial links between G and its reverse). The backward part of p is routed in the upper layer (see Figure 5.3). The n parallel artificial edges of type (ν , v ' ) going from each node of the lower layer to its corresponding copy in the upper layer have been added to H so as to ensure that an arbitrary number of paths arising from
Tier-1 Tier-2
al xim a M
lp hil up
valid
u0
Tier-2
a im ax M
Tier-3
u0
u2
u1
Tier-3
Tier-1
M ax im
ath
ill ph lu
th pa
u2
u1
valid
Tier-1 al do wn Tier-2 hi u3 ll p at Tier-3 h u4 M ax im al do u3 wn hi ll p u4 at h u5
u2
u1
u6 u3
u4
u0
u7
valley
u8
invalid
Customer
Figure 5.2 Example of valid and invalid valley-free paths in ToR-graphs [63, 118]
96
u5
Provider
s
t G= Original ToR graph
A’ Rev G s
(layer 2)
t
A G (layer 1)
Figure 5.3 (Top) Example of valid valley-free path in the original ToR-graph (G). Dotted lines show concatenation of a set of C-P (forward) and P-C (backward) edges forming a valley free s-t path. (Bottom) Relaxation using the 2 layer model consisting only of forward edges.
edge-disjoint paths in G can switch from the lower layer to the upper layer. The two-layer graph has twice the number of vertices and edges (excluding edges between the layers) compared to the original ToR–graph. This may lead one to believe that the cardinality of the solution could be twice the optimal solution i.e. two approximation solution. Erlebach et al [116] show that the two-layer model yields an optimal solution to finding the maximum number of valley-free edge-disjoint paths. We mention the proof briefly in this dissertation and refer the reader to [116] for the detailed proof. Assume two edge-disjoint paths p1 and p2 and the edge-cut comprises of a forward edge e and its copy backward edge e' (Figure 5.4). Since e and e' form the edge-cut, their removal should
make the graph between s and t disconnected with no valley-free paths between them. However, if we remove e and e' , there is still a valley-free path using the forward-edges in path p1 from s to u , and backward-edges from u to t ; this contradiction concludes the proof .
97
v e’
Rev G (layer 2)
p1
u s
p2
p2 e
p1
t
v
u
G (layer 1)
Figure 5.4 Optimal solution to the Edge-Disjoint Path problem in the Two-Layer ToR-graph
5.3.2
Finding Maximally-Disjoint Valley-Free Paths
To identify maximally-disjoint paths valley-free paths between any two hosts using the ToRgraph, we use a greedy-approach. The aim of the greedy approach is identification of paths passing through ASes not used by the default Internet path aiding selection of disjoint overlay paths. The greedy-approach finds shortest valley-free paths between hosts (in each iteration) by initiating an expanding-ring search around the source node towards the target node. Since, the Internet selects shortest valley-free paths (dictated by routing policies) between hosts, by eliminating shortest paths first, the path found in the last iteration is most likely to be maximally disjoint from the primary path and identifies ASes not used on the direct path. One point of concern is that selecting overlay paths based on ASes on the most disjoint valley-free path will select more circuitous paths. However, this is not true, as the ToR-graph is constructed using Customer-Provider relationships between ASes which are sighted on paths between overlay hosts. Consequently, the number of disjoint paths between any two hosts is the ToR-graph is not very large (two to three) (Section 5.4.2, Figure 5.6). Computing the shortest path in the AS graph to approximate the shortest Internet path is a challenging problem as argued by [19]. This is due to two facts; sometimes the Internet does not select shortest paths due to BGP policies and that there may be more than one shortest-path with
98
same number of AS hops. However, these issues can be resolved as suggested in [19] by using additional criteria such as making use of the fact that AS-paths are transitive and that 70% of ASpaths are symmetric. Since, in this dissertation the ToR graph is constructed using only AS-paths between overlay end-hosts instead of reading BGP dumps, the ToR-graph is sparse and hence the number of paths between any pair of hosts is not large. Also, note the aim of the greedy-approach is
not to predict the shortest-path between hosts likely used by Internet but on the contrary to only identify the ASs on the most-disjoint valley-free path. We briefly formalize our technique for searching for edge-disjoint valley-free paths between source-destination hosts. Given a directed ToR-graph G = (V , E ) (where ν ∈ V , e ∈ E ) and two hosts s (the source) and t (the destination); the search-algorithm starts out with an empty solution set S and in each subsequent iteration, the shortest available path is found between s and t . Once a path px is found, it is added to S and the edges used in the current path are deleted and the process is repeated on the remaining graph until no further s − t path can be found. The path found in the last iteration is taken as the candidate path which is maximally disjoint from the primary (direct) path between hosts. The time complexity of the implementation of this greedy-approach follows that of finding the maximum number of edge-disjoint paths between any two given hosts s and t in a graph
G = (V , E ) through the Max- flow/Min-cut algorithm [17] and is O(| E | × | V |) , where | E | is the total number of edges and | V | the total number of vertices in a graph. To quantify | E | and | V | , we assume an overlay network with N hosts; the number P of AS-paths between hosts is N 2 . Also, assuming that the average number of ASs traversed on AS-paths between each overlay host is
n (equivalently n − 1 AS hops); n is a small number typically three to seven, since most end-hosts are within three to five AS-hops of the so-called Tier-1 ISPs in the core of the network (Figure 3.5). The worst-case time-complexity would be when all such N 2 AS paths between overlay hosts are completely vertex-disjoint (excepting terminal hosts) and hence would be O ( P 2 ) = O( N 4 ) . In practice, it is much less because of the power-law model of the Internet [18] which shows sparse connectivity for a large number of hosts in the Internet; only about 1-2% hosts are well connected at the AS level. Chen et al. in [21] show that the number of paths k which can be used to monitor the quality of all N 2 paths in a N -host overlay network are O ( N lg N ) . Thus, the worst-case timecomplexity of the greedy-approach for finding a maximally-disjoint alternate-path between a source-destination pair becomes, O(k 2 ) where k = N log N . We consider this topic further in Chapter 6.
99
AS-path information can also be obtained by reading BGP dumps [20]; a strong motivation for the approach we propose here since we do not want to trade one type of overhead (probing) with another (trace-routes). As this information is already distributed by routers in the network it will not introduce additional traffic in the network. Also, such AS-path information needs to be updated at infrequent intervals since the majority of Internet paths are stable [51].
5.3.3
Comparison with Earliest Divergence Rule (EDR)
Fei et al. [6], showed that an Earliest Divergence Rule (EDR) (Chapter 2, Figure 2.5) can work well by selecting from a list of potential alternate paths, an alternate path from the source to the destination which diverges at the earliest point from the default-path near the source. This technique assumes availability of AS level path information (from source overlay hosts to detouring overlay hosts). To show how finding maximally disjoint paths by using ToR graphs can yield better performance than EDR, we use anecdotal evidence from one of the datasets (AMP-14630/Jun/2006). The details of this Internet dataset have been described in detail in Chapter 3. Here we consider the direct path and the possible 120 one-hop overlay paths between two AMP monitors installed at the two extreme ends of the continental US; amp-ucb (at University of California, Berkeley) and amp-uvm (at University of Vermont). The direct AS-level path between amp-ucb and amp-uvm is:
Src - - - - - - - - - - - - - - - - - - - - - -Dst 25 2152 2914 19548 19094 1351 This path has an average delay of 123 ms. Using the ToR graph, we find two disjoint (at ASlevel) paths between amp-ucb and amp-uvm.
a.) 25 2152 3356 19094 1351 b.) 25 2153 11537 10578 1351 Note that the two paths are of equal length in this case, i.e. five AS hops. Also the direct AS level path is longer than both of the disjoint paths found by the greedy approach. We especially present this case to show that even when the underlying assumption about the shorter Internet paths is not met, a greedy strategy can still work. If we use the EDR in selecting an one-hop overlay path, we would normally go for paths diverging at the second AS, i.e. paths using AS 2153 instead of AS 2152 which is used in the direct Internet path. However, this turns out to be bad as only 13 paths go through AS 2152 at the second AS hop and the remaining 107 go through AS 2153 at the second
100
AS hop. However if we further distinguish amongst paths based on the second disjoint path shown above and start filtering paths which go through ASes 11537 and 10578. This reduces our candidate path set to 7 down from 107. Since these paths are disjoint there is a very high probability that the percentage of good paths would be good comparing to EDR where we tend to choose almost all one-hop overlay paths. For example, the one-hop overlay path between amp-ucb and amp-uvm via amp-mit (in MIT) is one of these 7 paths. The average delay between amp-ucb and amp-uvm through amp-mit is 127ms, just 4 ms greater than the (shorter) direct path delay! Thus, it can be expected to provide a good backup path should the direct path become congested.
5.4 Performance Evaluation 5.4.1
Methodology used to construct ToR-graph
For this study, we use path and delay measurements collected between AMP [2] hosts. The details of this Internet dataset has been described in detail in Chapter 3. While the aim of an overlay network may only be to optimize the one-way delay, which may differ for different directions due to asymmetric Internet paths, two-way delay-measurements, such as RTTs, have been shown [121] to be strongly correlated (with a correlation-coefficient of 0.87) to one-way delays, and so form a reasonable basis for inferring one-way delays. To construct the ToR for AMP dataset graph, we first identify all ASes used by paths between all possible AMP hosts in the AMP-146-30/Jun/2006 and AMP-133-31/Aug/2006 virtual RONs. Note we used the trace-route information between hosts for the purpose of this study, but it is also possible to obtain this information by reading BGP dumps as explained earlier; the only requirement is to have reasonably good number of vantage points. AS-Paths not found by this method can also be deduced indirectly using the fact that AS-paths are transitive [19].
We
identified a total of 4400 unique IP-addresses from the IP trace-route information. Only a small fraction (7%) of total paths had incomplete or partially-complete trace-routes in the dataset. The next step was to map these IP addresses to AS numbers for which we use the IP-to-ASN Whois Service from Cymru [122], which can provide mappings for user-specified dates using the GNU netcat utility [123]. Using the results from this service we identified a total of 275 unique ASNs. RIPE dataset records path both at the IP and AS level. We identified a total of 118 unique ASNs for the RIPE dataset. To find the relationships between these ASs (C-P, P-P, or S-S); we used the ASrelationships data from CAIDA [106] which is based on RouteViews [124]. We obtained the AS relationship from dates close enough to match the datasets. For AMP the AS relationship data used
101
was obtained on 5th June 2006; For RIPE the used AS relationship data was obtained 2nd August 2007. To construct the ToR-graph, we identify all observed AS pairs in the AS-level paths between AMP hosts, and mapped edges between them based on C-P, P-P and S-S relationships. We use similar procedure when computing the ToR graphs for the RIPE dataset except for the extra AS to IP mapping step because IP addresses are included within the dataset. One important source of concern is the accuracy of Customer-Provider (and Peering) relationships inferred from [106] as used in the ToR-graph. The methodology to obtain customerprovider and peering relationships is based on collecting AS level paths through looking glass servers recording BGP path advertisements and assigning customer, provider and peering/sibling relationships to adjacent AS pairs so as to minimize anomalous paths (paths that violate the valleyfree routing principle) as shown by Gao et al. [63] and Battista et al. [117]. However, we note that our ToR graphs are very sparse; they are constructed using customer-provider and peering relationships between only 275 ASes for AMP and 118 ASes for RIPE. This minimizes the source of such errors.
5.4.2
Network layer path characteristics inferred from ToR-graph
Since we use a heuristic approach for finding maximally-disjoint overlay paths, we first look at AMP and RIPE data to evaluate the effectiveness of our proposed techniques. Chiefly, we are interested in network layer path-characteristics between AMP and RIPE hosts such as the impact of routing-policies on path- inflation and path-diversity using only the data that can be inferred from the ToR-graph. To see the impact of routing-polices on path-inflation; i.e. to see if shortest paths were selected more often than not, we measured path-inflation on direct paths. We compute the shortest-paths between AMP and RIPE hosts in the ToR-graph and compare them with the actual number of AS hops on the direct-path using the trace-route information from the dataset. We find that the majority of paths between AMP hosts (53%) and RIPE hosts (58%) were shortest-possible AS paths. Only 27% of AMP paths and 31% of RIPE paths were inflated by one AS hop (Figure 5.5).
102
We also measure the total number of edge-disjoint paths found per source-destination pair (Figure 5.6). Around 60% of AMP host pairs and RIPE host pairs have two or more edge-disjoint paths. Note that these figures are very conservative estimates when we observe that about 10% of the source destination pairs of the AMP dataset and 20% of source-destination pairs of RIPE dataset do not have complete trace routes and so may have more than one edge disjoint path. The ToR-graph may have some missing peering links or erroneous customer-provider links as discussed in the previous section. Consequently, our results for path inflation and number of disjoint paths between source-destination AS pairs may be slightly skewed in certain cases. For example, some source-destination pairs may have shorter paths than those indicated due to missing peering or customer provider links. Likewise, some source-destination pairs may have more disjoint paths than those identified. However, we reiterate that the source of such errors is minimized due to the sparse nature ToR-graphs formed with customer-provider-peering relationships between only 275 ASes for AMP and 118 ASes for RIPE.
AMP-146-30/Jun/2006
% Total number of Paths
60 50 40 30 20 10 0 0
1
2
3
4
5
6
Path Inflation (AS hops) RIPE-40-05/Sep/2007
% Total number of Paths
70 60 50 40 30 20 10 0 0
1
2
3
4
Path Inflation (AS hops)
Figure 5.5 Path inflation between (a) AMP and (b) RIPE hosts (AS-hops).
103
6
7
6
7
5
4
3
2
1
45 40 35 30 25 20 15 10 5 0 Incomplete Traceroutes
Percentage sourcedestiantion pairs
AMP-146-30/Jun/2006
No. of Disjoint Paths
5
4
3
2
1
30 25 20 15 10 5 0 Incomplete Traceroutes
Percentage sourcedestiantion pairs
RIPE-40-05/Sep/2007
No. of Disjoint Paths
Figure 5.6 Number of disjoint paths between (a) AMP (top) and (b) RIPE hosts using ToR-graph.
5.4.3
Performance-Evaluation of the Greedy-Approach
Selection of Alternate Paths The greedy-approach selects alternate-paths between source-destination pairs by ranking them on the basis of their degree-of-disjointness from direct-paths. For this, we first use the traceroute information on all possible one-hop indirect paths and compare the number of ASes which are common between the indirect path and the candidate-path selected by our algorithm.
104
We define the degree of disjointness (σn) of the nth overlay path as being the ratio of the number of ASes that are common in the candidate valley-free disjoint-path computed by the greedyapproach (cdp) and the nth overlay path. We use this degree of disjointness to rank overlay paths. Thus, given the candidate-disjoint-path (cdp) between two AMP-hosts (s and d) selected by the greedy-approach, using the ToR-graph as set AScdp=[ASs ASw ASx ASy…ASd] and the corresponding one-hop indirect-path between the same host-pair as another set ASn1-hop (for the nth indirect-path)= [ASs ASp ASq ASr …ASd], the degree-of-disjointness coefficient (σn) is given by (1):
σn =
| AS1n− hop ∩ AScdp | | AS1n− hop |
(5-2)
where | X | denotes the number of elements in a set X.
An alternate path n is selected by the greedy-approach if the partial disjointness is greater or equal to some threshold value σ , i.e. n th alternate-path is selected if σ n ≥ σ . (Note that σ used here is different from σ in Equation 3-1 ). We found that most σ n values were in the range of 0.20.7. An interesting observation is that if there is only one edge-disjoint path in the ToR-graph between a given source-destination pair (Figure 5.6), the greedy-approach may actually choose the shortest path (if the direct-path is also not inflated) as opposed to more circuitous disjoint-path; greedyapproach will thence select less-circuitous indirect-paths with better delay characteristics. Note that this does not invalidate the effectiveness of the greedy-approach, since even selecting a shorterpath between AMP hosts can still yield a path that is disjoint from the primary-route if the directpath is inflated (Fig 5.5); if the direct path is not inflated then it will admit almost all overlay paths. Interestingly, we found out that for such source-destination pairs showing little or no path diversity, even the most intuitive strategy like the EDR [6] was unable to select a small number of candidate alternate-paths because a large number of alternate-paths diverged at the same AS hop. In such situations, [6] proposed selecting paths based on additional path-performance criteria such as delay constraints; the focus of this chapter is not to investigate such criteria; the performance is evaluated strictly under the disjointness criteria mentioned previously.
105
Delay Gain of Selected Paths We designate a direct-path as degraded using the definition of a path anomaly introduced in Section 3.5. We next carried out simulations to analyze the fault-tolerance properties of maximallydisjoint paths when the direct-path undergoes an outage. For this final performance evaluation we consider the AMP dataset because as mentioned earlier in Chapter 3, RIPE datasets only provide routing vectors as aggregate summary (number of times sighted between time intervals etc) so it is difficult to ascertain what paths were exactly being used at specific time intervals between RIPE hosts when the anomaly occurs on a direct path. Knowing this path information is very crucial for the framework highlighted. For all AMP hosts, we observed intervals when the path between them suffered from outage or path degradation. We consider k = 10 to emulate outages and k = 3 to emulate performance failures as before (Chapter 4). We investigate which indirect-paths offer better performance during the entire period when the direct path is degraded by using the time-stamps in the RTT trace files in the AMP dataset [42]. We show the results in Figures 5.7 & 5.8. Figure 5.7 shows the reduction in the number of alternate paths selected and Figure 5.8 compares the delay gain metric (Chapter 3) of the greedy-approach and that of EDR. The first interesting observation is that EDR was unable to find a better alternate path for 10% of the path outages and performance failures. This is because AS path information was not available between all pairs of AMP hosts due to asymmetric nature of path probing/measurements between some AMP-HPC and AMP-International hosts (Chapter 3). The greedy-approach reduces the number of candidate selected paths compared to EDR, as the disjointness threshold for σ = 0.5 for 60% of the degradations encountered (subtracting the 10% of the cases where no path is selected by both techniques because of incomplete/unavailable AS path information). These figures agree with our previous observations in Figure 5.5. We had observed that around 60-70% of the source-destination paths were shortest; inflated by at most one AS hop. Moreover, we also observed that around the same percentage of source-destination pairs had multiple (greater than one) edge-disjoint paths in the ToR graph. We plot the delay gain for the best path from amongst those selected using greedy-approach. For performance comparison, we also show the corresponding results of the EDR criteria [6] where those alternate paths are considered whose AS paths separate from direct path nearest to the source.
106
Path Outages
1 0.9 0.8
CDF
0.7 0.6 0.5 0.4 0.3 0.2 Greedy
0.1
EDR
0 0
20
40 60 80 100 120 No. of candidate paths selected
140
Performance Failures
1 0.9 0.8 0.7 CDF
0.6 0.5 0.4 0.3 0.2 Greedy
0.1
EDR
0 0
20
40 60 80 100 120 No. of candidate paths selected
140
Figure 5.7 Number of candidate paths selected by greedy-approach for path outages and performance failures in the AMP-datasets: (a) AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06.
107
100 90
Delaygain (%)
80 70 60 50 40 30
Best-Alternate-Path
20
Best-using-Greedy
10
Best-Using-EDR
0 0
20
40
60
80
100
80
100
Path Outages (%)
100 90
Delaygain (%)
80 70 60 50 40 30
Best-Alternate-Path
20
Best-using-Greedy
10
Best-Using-EDR
0 0
20
40
60
Performance Failures (%)
Figure 5.8 Delay gain of best path selected for path outages and performance failures in the AMP-datasets: (a)AMP-146-30/Jun/06 (top) and (b) AMP-133-31/Aug/06.
108
Overall we observe that selecting alternate indirect-paths on the basis of AS disjointness, not only reduced the number of potential choices drastically from 144 to fewer than 20 (Figure 5.7) in a large majority of cases but it also finds the paths offering better delay gains in up to 90% of path outages and performance failures (Figure 5.8). Note that both techniques were unable to find an alternate path for around 10-15% of the outages and performance failures (Figure 5.7) because of the incomplete/unavailable AS information (Figure 5.6a). One interesting point worth noting based on the results of Figures 5.7 and 5.8 is that both EDR and the greedy approach are able to find paths offering better delay gains for path outages emulated for AMP-146-30/Jun/2006 indicated by the greater convexity of the curves in (Figure 5.8a) but do not perform as well for finding paths for performance degradations (AMP-133-31/Aug/2006, Figure 5.8b). This is because both of these techniques tend to look for more disjoint, hence, more circuitous paths which may tend to have higher delay than the degraded direct path if the magnitude of degradation is small. Still, we observe that the greedy approach can select a path with a performance very close to EDR for 90% of the performance degradations encountered while selecting smaller number of candidate paths.
Correlation of best-selected path with direct path We also calculate the correlation of path delays of the best selected path using the greedy approach with the direct paths. We compute the correlation as:
COV ( X , Y ) VAR ( X ) VAR ( Y )
CORR ( X , Y ) =
(5-3)
where X and Y represent the random variable given by the path delays of direct path and best selected path respectively. For each pair of measured end hosts a and b , we define Z ab (t ) as path-delay between them at time t and Z acb (t ) as the delay of path between a and b through an intermediate-host c at time
t . If the total number of measurements is K , then we compute expected values as given below: E [ Z ab ] = E[Z
ab
Z
1 K acb
∑
(5-4)
Z ab ( t )
t
] =
1 K
∑
Z
ab
(t ) Z
acb
(t )
(5-5)
t
Since delay measurements between the direct-path and the selected alternate-path may not be perfectly synchronized, the computation of correlation may have some error. However, the AMP-
109
1 0.8 0.6 Correlation
0.4 0.2 0 -0.2 0
20
40
60
80
100
-0.4 -0.6 -0.8
Jun-06 Aug-06
-1 % Degradations (Normalized)
Figure 5.9 Correlation of path-delay characteristics between direct-path and best-alternate-path selected using Greedy Path Selection (Path Outages for AMP-146-30/Jun/2006 and AMP-133-31/Aug/2006).
datasets used have timestamps for each recorded value of delay between AMP-hosts, so we discard samples which are not within a window of 25 seconds. Figure 5.9 shows the correlation of path-delay characteristics between the actual direct-paths between AMP hosts undergoing degradation and the best-alternate-path selected using the greedy approach based path-ranking. Results are shown for path outages in AMP-146-30/Jun/2006 and AMP-133-31/Aug/2006. As can be seen from the figure, around 20% of the alternate-paths selected exhibit negative correlation with the path-delay characteristics of the direct-path and 80% of the alternate-paths show a correlation of less than 0.2. Only about 10% of the alternate-paths exhibit a correlation of 0.6 and higher, this is due to the fact that some one-hop overlay paths inevitably share underlay links with the direct path.
5.5 Chapter Summary This chapter presented the second contribution of this thesis, the analysis of computing maximally-disjoint paths in overlay networks using ToR graphs. Disjoint path computation can be used as an offline-heuristic to supplement measurement-based approaches [4] which are not scalable, or for alternate indirect-path computation when the direct path between two hosts is affected by a performance failure or an outage. We proposed and analyzed the performance of a greedy approach for computing such disjoint-paths using real world Internet datasets. Our results
110
show that such heuristics can be used to select alternate paths to bypass path outages or degradations.
111
112
PART III PATH MONITORING IN OVERLAY NETWORKS
113
114
6 ISSUES OF STATISTICAL PATH MONITORING IN OVERLAY NETWORKS 6.1 Introduction The previous section of this dissertation discussed scalable architectures that exploit the network layer overlay topology for disjoint path selection, thus reducing or eliminating path monitoring overheads. However, disjoint path selection may not be possible in some cases because the technique might not work in some cases. Recalling from the previous chapter, EDR and greedy selection did not work for about 10-15% of outages and performance failures in selecting a better alternate path when it was present. This is because the best path might not always be the maximally disjoint path. Path monitoring could be used as a fall back in these cases. Even if alternate paths are selected based on disjointness it normally leads to a smaller list of candidate possible paths (Chapter 5), then path selection has to be made again on the basis of path monitoring methods. Path monitoring can help in meeting dynamic QoS demands than merely ensuring path disjointness. For example, selection of a longer and less congested disjoint overlay path between a source and destination host may still give higher delay than a shorter congested direct Internet path. Also, path disjointness may vary with time because the underlay network has a mechanism of its own to rectify problems in the Internet by switching over to alternate paths (even if it does so lazily!). Consequently, overlay paths selected based on physical disjointness criteria could have already become congested due to underlay network switching traffic from the congested links to uncongested links available on the selected disjoint overlay path. Revisiting the problem, Andersen et al. [4] showed that when the direct-path between two Internet hosts fails, an alternate path between them can be established using an overlay host whose direct-paths to the source and destination host have not failed due to the spatial diversity of paths (Figure 1.1). We emphasized the importance of overlay path monitoring in the previous paragraphs. To recap, we go through a simple example which highlights the importance and possibility of scalable path monitoring in overlay networks as we will see later. An overlay can find good detours by aggressive path monitoring. This is because an overlay link is a logical abstraction of multiple underlay links. Two overlay links may seem disjoint at the application layer, yet share a link in the underlying IP layer. The shared IP link renders both useless in the event of failure. For example,
115
consider the network example in Figure 6.1(a). Assume that each link has unit weight and shortest paths are selected between two nodes. If link l fails, it disconnects source S from destination D . It also renders both overlay hosts R1 and R 2 useless for S to reach D using a single overlay hop as S needs l to reach R 2 and R1 needs it to reach D . In this case S can only reach D through
R3 or through the two hop overlay route S → R1 → R3 → D . This requires that overlay hosts constantly monitor individual overlay links to successfully detour the traffic via an appropriate overlay node in the event of failure on the underlay network. To be able to establish such alternate paths quickly in overlay networks it is important to monitor all such possible indirect paths through probing. However, when the size of the overlay network is large, probing generates excessive overhead [4]. Maintaining complete state about all overlay links requires in the ideal case, that all N hosts be connected as logical mesh or clique (Figure 6.1(b)). Subsequent probing for measurement of end-to-end path metrics between overlay hosts and its dissemination via a link state protocol incurs maintenance overheads of O( N 2 ) . The poor scalability of this limits the size of deployed overlay networks. On the other hand, maintaining complete overlay state without the knowledge of the topological diversity of individual overlay hosts may be counterintuitive when we consider that the locations of path and performance failures are not known a priori, are often correlated and vary on very small time scales. RON [4] aimed to bypass path failures using application specific metrics e.g. throughput, loss rate, latency and routing through any of the possible indirect overlay hosts which are probed aggressively incurring large overheads. Such path exploration techniques are not scalable above modest network sizes. Previous works [7, 125] showed that the large degree of underlay link sharing among paths enables an overlay to only monitor a carefully selected subset of the paths and then to statistically predict the path metrics of the remaining paths. D
R3
R1
l R2
S
Figure 6.1 (a) (left)How overlay resilience depends on topology of the underlay network. (b) Inferring maximum information about all virtual overlay links.
116
This chapter presents the third main contribution of this thesis, namely detecting and identifying the cause of statistical path prediction errors. First, Section 6.2 describes the related algebraic notation. Section 6.3 evaluates the degree of independence of paths in AMP and RIPE networks by determining the rank of their Routing Matrices (previously introduced in Section 1.1). In Section 6.4 we present the technique for monitoring a subset of paths and predict the remaining path metrics using Best Linear (BL) statistical prediction algorithm (proposed earlier [8]) and apply it on RIPE and AMP routing matrices. We find that BL statistical path prediction can suffer from errors that are due to inconsistencies in routing matrices. So in Section 6.5 we review what causes these Routing Matrix Inconsistencies (RMI), quantify the extent of RMI in RIPE and AMP datasets, and discover that RMI can be difficult to remove. Consequently, in Section 6.6 we introduce statistical prediction techniques that are robust against the effects of RMI. Section 6.7 reviews the practical improvement in anomaly prediction in the presence of RMI using our proposed technique. Section 6.8 summarizes the key findings of the chapter by providing a brief discussion. Section 6.9 concludes the chapter.
6.2
Algebraic Notation
We begin by establishing some relevant notation and definitions. Let G = (ν , ε ) be a strongly connected directed graph, where the vertices in ν represent network devices (routers and endhosts) and the edges in ε represent links between those devices. Additionally, let ρ be the set of all paths between end-hosts in the network (pre-determined by commercial Internet routing policies), and let nv =| ν | , ne =| ε | and n p =| ρ | denote, respectively, the number of devices, links, and paths. Many network path characteristics are additive of their constituent elements; e.g. path delays can be represented as the sum of its constituent link delays li (Figure 6.2).
117
Pd
l1
l2
l3
lm
Figure 6.2 Additive Network Metrics.
Path delay Pd =
m
∑l i =1
i
where l i ∈ P
(6-1)
Packet loss rates on the other hand are not additive but multiplicative in nature. If each of the constituent links on a path drop packets with a probability pi , then the probability Pr with which packets will be dropped on the path will be: 1 − Pr = ∏ im=1 (1 − pi ) . However, such multiplicative metrics can also be converted into additive metrics using logarithms on both sides, i.e. m
lg(1 − Pr) = ∑ lg(1 − pi ) . i =1
Other network characteristics can also be concave in nature; for example, bandwidth. Bandwidth available on a path is the bandwidth of the bottleneck link, i.e. the least bandwidth link and so cannot be expressed in the algebraic manner explained above. The statistical path estimation approaches outlined in this paper are primarily concerned with additive network characteristics where the sole objective is to be able to predict end to end network characteristics measuring only a subset of end to end paths. Non-additive network characteristics such as bandwidth, require measurements at finer granularity than simply observing end to end path measurements which is outside the scope of this chapter. Other studies, e.g. iPlane [126], have developed techniques for estimation of bandwidth on a path using vantage points inside the network that measure link attributes probing paths from the vantage points to intermediate routers in the network. If we use vector b ∈ ℜ ne to denote measurement of a metric on each edge j ∈ ε of the graph, then the vector y ∈ ℜ n p of path measurements is given by:
y = Mb
118
(6-2)
where M ∈ [0,1]
n p × ne
is a routing matrix in which:
M i , j = 1 if path i traverses link j M i , j = 0 , otherwise Figure 6.3 gives an example of a network and corresponding routing matrix and measurement vectors. The measurements could be of any performance metric such as delays, or loss rates. The column (or row) rank of a matrix, such as M is the number of linearly independent columns (or rows) in that matrix. If one measures r = Rank (M ) paths, then the path metrics of the entire network can be determined exactly. Section 6.6 will show that the routing matrices for large Internet overlay networks are ‘rank deficient’, in the sense that their rank is smaller than either dimension of their matrices, i.e. r < min(n p , ne ) . For such networks, it is only necessary to measure as many paths as the rank of the routing matrix [7]. When limited resources force measurement of less than r paths, then the performance of the other paths can be estimated statistically to predefined tolerance levels [125]. l1 l2 l3 101
211
M= 1 1 0
C
D= 1 2 1
011
112 β3
y1 Y= y2
y3 l3 β2
y3 β1 b= β2 β3
y1
l2
y2 β1
B
l1
Y= Mb A Figure 6.3 Algebraic method of path monitoring
119
Table 6-1 Dimensions and rank of AMP and RIPE routing matrices.
Dataset
Paths (np)
Links (ne)
Rank (r)
RD log(min(np,ne)-r)
6.3
RIPE-40-05/Sep/2007
1499
2690
673
2.92
RIPE-30-05/Sep/2007
622
1693
385
2.37
AMP-50-30/Jun/2006
1700
1239
485
2.88
AMP-40-31/Aug/2006
935
812
350
2.66
AMP-30-30/Jun/2006
594
747
249
2.55
Routing matrices and Eigen Spectra of AMP and RIPE data sets
We use path and delay measurements collected between AMP and RIPE hosts. For estimating the routing matrix from traceroutes, we use the virtual IP interface-pair links as real, router to router links. The details have been described in Chapter 3. The datasets considered in this chapter were collected during three 24-hr periods on June 30 and August 31, 2006 (AMP) and September 5, 2007 (RIPE). Since RIPE uses (i) one way path delay values owing to the provision of GPS synchronization in its hosts compared to RTT estimates for path delays in AMP, and (ii) dedicated software for estimation of the routing vectors (IP and AS level) compared to traceroute estimation in AMP, it yields far more superior results than the AMP datasets for prediction of unmonitored path properties giving conviction that ordinary traceroutes may yield less than satisfactory results in computing a routing matrix, as we will see later.
6.3.1
Extent of rank-deficiency
Table 6.1 shows the dimensions of the routing matrices in terms of the number of paths/underlay links and the ranks. The Rank Deficiency (RD) of a routing matrix is defined as:
Rank Deficiency ( RD ) = log(min(n p , ne ) − Rank (r ))
(6-3)
To get a feel for the extent to which the number of measured paths can be reduced below r , we can consider the eigen-spectrum of the routing matrix, which indicates the degree of linear dependence between the rows of a matrix. The eigen-spectrum is obtained through Singular Value Decomposition (SVD) of the matrix D = M T M and the spectra for two Internet datasets AMP and RIPE are shown in Figure 6.4.
120
The diagonal elements of the matrix D are precisely the number of paths routed over their respective links referred to as the betweeness of the links. Likewise the off-diagonal elements measure the number of paths routed simultaneously over pairs of links referred to as co-betweeness of the links. The co-betweeness Di , j of any two edges i and j will always be bounded above by the smaller of the two edges betweennesses; i.e. Di , j ≤ min( Di ,i , D j , j ) . Chua et al. in [8] showed that the behavior of the eigen-spectrum is related to the diagonal; the spectral decay of M at worst parallels the edge betweeness in the graph G . The rapid decay of the spectrum shows the degree of non-trivial link sharing amongst paths; the knee occurs when only 1% of the rank r paths have been included and it is interesting to note that only 20-50% of the rank r paths (note the log scale) can be used to draw meaningful inference about the path metrics. Also note that the eigen-spectra of AMP networks show faster decay than that of similarly sized RIPE networks. Subsets of AMP and RIPE hosts are selected to make the comparison more meaningful, as discussed earlier in Section 3.1. This means that the amount of linear dependence amongst paths on AMP networks is greater than RIPE. To further prove this point, we show in Figure 6.5 the degree of the ASes of the RIPE and AMP datasets considered on a normalized scale to cater for the differences in the number of ASes in both datasets. The AS degrees for AMP fall more sharply than that for RIPE showing that path sharing in AMP networks is more than in RIPE network. As we see later, routing matrix inconsistencies can amplify the effects of statistical path prediction errors in AMP networks due to the greater degree of path sharing as compared to RIPE networks.
121
Eigen Spectra
Eigen Values of M'M (Normalized)-log scale
1
0.1
RIPE-30-05/Sep/2007
0.01 0.001
AMP-30-30/Jun/2006
0.01 0.1 0.2 0.5 Fraction of rank-log scale
1
Eigen Spectra
Eigen Values of M'M (Normalized)-log scale
1
0.1
RIPE-40-05/Sep/2007
0.01 0.001
AMP-40-31/Aug/2006
0.01 0.1 0.2 0.5 Fraction of rank-log scale
Figure 6.4 Eigen Spectra of AMP and RIPE Networks.
122
1
AS degree
10000 1000 100 10 1 0.001
RIPE-4005/Sep/2007 AMP-4031/Aug/2006
0.01
0.1
1
ASes sorted according to degree (normalized) Figure 6.5 AS degree for RIPE and AMP networks.
6.4 Selecting a Subset of Paths for Monitoring and Predicting the Unmonitored Paths Using Best Linear Predictor As described in the previous section, in order to completely infer network performance one needs to monitor paths corresponding to the r largest (or all non-zero) singular values (the square roots of eigen-values).
To
save
monitoring
overheads,
we
can
monitor
a
subset
k
of
rank r paths( k ≤ r ) paths, corresponding to the k largest singular values. From this subset of paths we can estimate the link metrics vector, from which we can estimate the metrics for the remaining paths. Finding such a subset of paths is an NP-complete problem, however approximation algorithms [8, 127] exist for selecting paths approximating the k largest singular dimensions. Picking a subset of paths ( k < r = Rank (M ) ) involves selecting paths that have the highest singular dimensions, as explained earlier. We use the same algorithm as [8] which is an adaptation of the subset selection algorithm to select a subset of paths when the path metrics are a sum of link metrics. Denoting the routing matrix by M and the link covariance matrix by C in order to assign
123
higher weights to paths that are more variable. The algorithm first factorizes MC
n p × ne
using SVD
into two orthogonal matrices U and V .
SVD( MC ) = USV T
(6-4)
where CC = Σ ( Σ is the link covariance matrix.) T
U ∈ℜ
n p ×n p
& V ∈ℜ
ne × ne
such that,
U T ( MC )V = S = diag (σ 1 ,σ 2 ,...,σ p ) ∈ ℜ
n p × ne
,
p = min(n p , ne ) and
σ 1 ≥ σ 2 ≥ ... ≥ σ p ≥ 0 The left singular vectors (i.e. columns of U = [u1 , u2 ,..., un p ] ) form an orthogonal basis for the range of MC and the magnitude of their corresponding singular values indicates their relative importance. Note that these singular values are the square root of the eigen values of ( MC )T MC . The algorithm makes heuristic use of QR-factorization with column pivoting to find k (k ≤ r ) rows of M that approximate the span of the first k left singular vectors of MC .
U kT Pk = QR where U k ∈ ℜ
n p ×k
formed by the first k columns of U ; and Pk ∈ ℜ
(6-5) n p ×n p
is the permutation matrix.
M s is then the submatrix formed by the first k rows of PkT M . The complete algorithm is described in Algorithm 1. The GLS based estimation of the link metrics vector is used in the Best Linear (BL) prediction for unmonitored path delay as in [8]. We use the following equation from [8] to obtain the estimated delays on unmonitored paths, (see Appendix for its derivation from the estimated value link-metrics vector (A-7)).
E (lr yr | y s ) = lrT Vrs (Vss ) −1 ys
(6-6)
where lr is a column vector for selecting one particular unmonitored path, Vrs = M r ΣM sT and Vss = M s ΣM sT is the covariance between the unmonitored and monitored and between monitored
paths, respectively. ∑ is the link covariance matrix.
124
Algorithm 1 (Based on Algorithm 12.2.1 [127] ). Given a path matrix M ∈ [0,1]
n p × ne
and
corresponding path delay matrix y ∈ ℜ p where n p and ne are the number of paths and links n
respectively in the network; the following algorithm computes a subset M s of path matrix M to select the k rows that approximate the span of the first k left singular values vectors.
Compute the Singular Value Decomposition (SVD) of MC : where CC T = Σ ( Σ is the link covariance matrix.)
SVD( MC ) = USV T ( U and V are the left and right singular vectors and S is a diagonal matrix whose diagonal elements hold the singular values in sorted order.)
for k=1:1:rank r Apply QR factorization with column pivoting of U Tk where U k = U (:,1 : k ) (i.e. first k columns)
QR = U kT Pk M new = PkT M and ynew = PkT y M s = M new (1 : k ,:) and M r = M new (k + 1 : n p ,:) where ys and M s refer to the monitored paths/path matrix rows
ys = ynew (1 : k ,:) and yr = ynew (k + 1 : n p ,:) where yr and M r refer to the unmonitored paths/ path matrix rows endfor
Using only path information obtained from traceroutes it is difficult to infer second order link characteristics such as link covariance or link correlation. We present in Figure 6.6, the link correlation matrices for AMP-30 for all links exhibiting a correlation of 0.25 or more. Figure 6.6a shows the correlation matrix for intraAS links (with links inside one AS grouped together). Figure 6.6b shows the correlation between interAS links; besides the main diagonal where each element is one, due to insufficient traceroute information links in different ASes and the interAS links (the offdiagonal elements) seem to erroneously show sufficient correlation. There is more correlation between intraAS links than between interAS links. RIPE datasets only reports routing vectors so a 125
similar analysis of RIPE is not possible. Thus, the performance of the BL predictor is evaluated under identity link covariance matrix for both AMP and RIPE datasets. Chua et al. [8] find that using an identity link covariance matrix to give satisfactory results.
link j
Correlation Matrix- link (i,j)
link i
link j
Correlation matrix- link(i,j)
link i
Figure 6.6 Problems in estimating of second order link metrics from traceroutes; link correlation matrices for AMP-30-30/Jun/2006. (a)(top) intra AS links; (b) interAS links
126
To quantify the accuracy of BL path prediction (Equation 6-6)), we use the L1 error metric which is defined as:
L1 − error =
actual delay vector − predicted delay vector 1 actual delay vector 1
(6-7)
where . 1 represents the l1 -norm of a vector.
Figure 6.7 shows the L1 error for RIPE and AMP networks as the number of monitored paths are increased. While L1 error for RIPE appears as a monotonically decreasing function, AMP shows anomalous behavior in the form of erratic spikes as the number of monitored paths are increased contrary to expectations. This is due to errors in the estimation of routing matrices for AMP networks which we explain in detail in the next section.
127
1 RIPE-40-05/Sep/2007
0.9 0.8
RIPE-30-05/Sep/2007
L1-error
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.2
0.4
0.6
0.8
1
Num ber of m onitored paths 1
AMP-30-30/Jun/2006
0.9
AMP-50-30/Jun/2006
0.8
L1-error
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.2 0.4 0.6 0.8 Number of monitored paths
Figure 6.7. L1 error for RIPE and AMP networks as a function of monitored paths
128
1
Figure 6.8 Load balancing inside an AS.
6.5 6.5.1
Routing Matrix Inconsistencies How RMI occurs?
Traceroutes are the most simple and common tool to infer topological information about the network. However, they are also notorious at the same time for revealing inaccurate or even false information about the topology of the IP network as found by previous Internet topology mapping projects Skitter (now Ark) [52], RocketFuel [128] and Mercator [55] . The development of specialized probing combined with heuristics such as MaxDelta [55] and Maximum Likelihood Estimation [54] can resolve many of the topology mapping errors but requires intensive network measurements. We also note that estimating a topology is a different problem to estimating a routing matrix due to the fact that mismapping of even a few links can cause algebraic/statistical methods for path prediction to return large prediction errors as we show later. Consider two simple examples. In Figure 6.8, consider an AS with six routers employing load balancing. This AS sends probes between two edge routers S and D , either using the path
SABD or the path SXYD according to its internal routing policies based on internal link congestion. In Figure 6.9, the traceroute infers the incorrect path SAYD . This is attributed to load balancing decisions by routers inside the AS. While the probes with TTL=1 & 3 are sent on one path, a probe with TTL=2 is sent on a different path. This leads to an insertion of a false link AY in the routing matrix. The load balancing decisions are typically based on packet headers; traceroutes are known to modify the Destination Port field when sending the UDP probes and the Sequence Number field when sending ICMP Echo probes so that it can match the router response with the probes which elicited them and some newer routers, e.g. Juniper allows up to 16 equal cost paths to
129
incorporate load balancing inside ASes [129]. The path inference problem in the presence of routers using load balancing is further exacerbated when traceroutes use multiple TTL probes per hop; Augustin et al. [129] found that up to 79% of the paths were incorrectly inferred in their study due to the effects of multiple probing. We refer to all such issues as Routing Matrix Inconsistencies
(RMI ) in the remainder of this chapter. Figure 6.9 shows the frequency of path changes observed at 10 minutes intervals over a 24 hr period in AMP networks. While most paths are stable, around 30 and 100 paths exhibit high variation for AMP30 and AMP-50 respectively. Note that the AS level paths do not vary here, it is only hops inside one (or more) of the ASes that vary. This shows that load balancing may be employed in some of the networks.
S
A
Y
D
Figure 6.9 Incorrect path inference: some links are missed while other false links are added.
130
Number of path variations
40
AMP-30-30/Jun/2006 AMP-50-30/Jun/2006
35 30 25 20 15 10 5 0 1
10
100
1000
10000
Paths (log scale) Figure 6.10 Frequency of path variation in AMP networks over 24 hr period
Figure 6.11 shows anecdotal evidence of RMI from the AMP June dataset. Consider the first example where a RMI can occur on the path between amp-upenn and amp-hawaii. Possibly due to some load balancing mechanism inside AS 11537, the total number of hops decreases from 17 to 16. At the same time, we notice the path delay decreasing from 154 ms to 122 ms. This 32ms decrease could be attributed to selection of a better delay path inside AS11537 or a different egress point from AS11537 towards AS7575. Note that we could not ascertain the AS number of the IP hop 207.231.240.4 that may be a router inside either AS. This example illustrates that when using path measurements to infer link measurements; if path changes, then delay can change, which may lead to incorrect inference of link measurements until it is recognized that the path has changed (by traceroute every 10 minutes). This further shows the case of a diamond anomaly [129] caused by traceroute probes probing multiple paths between two routers inside a load balanced AS (Figure 6.9).
131
amp-upenn->amp-hawaii Fri Jun 30 12:18:04 PDT 2006 (Hop) (IP address) 1 128.91.40.1 2 128.91.240.37 3 128.91.10.2 4 128.91.9.1 5 198.32.42.249 6 216.27.100.221 7 216.27.100.22 8 198.32.8.82 9 198.32.8.77 10 198.32.8.81 11 198.32.8.13 12 198.32.8.1 13 198.32.8.94 14 207.231.241.4 15 202.158.194.109 16 128.171.64.102 17 205.166.205.222
(AS) 55 55 55 55 10466 10466 10466 11537 11537 11537 11537 11537 11537 ? 7575 6360 6360
(delay1) 0.453 ms 0.447 ms 0.500 ms 0.526 ms 0.711 ms 0.754 ms 2.918 ms 27.168 ms 26.731 ms 36.031 ms 51.321 ms 71.453 ms 83.078 ms 103.852 ms 154.747 ms 154.646 ms 154.480 ms
(delay2) 0.348 ms 0.417 ms 0.555 ms 0.551 ms 0.489 ms 0.752 ms 2.922 ms 22.923 ms 26.865 ms 36.413 ms 46.827 ms 78.490 ms 79.029 ms 103.839 ms 154.626 ms 154.824 ms 154.543 ms
(delay3) 0.245 ms 0.416 ms 0.574 ms 0.480 ms 0.604 ms 0.762 ms 2.884 ms 23.003 ms 26.859 ms 39.260 ms 46.684 ms 71.323 ms 78.806 ms 103.754 ms 154.645 ms 154.580 ms 154.520 ms
Fri Jun 30 12:28:02 PDT 2006 (Hop) (IP address) 1 128.91.40.1 2 128.91.240.37 3 128.91.10.2 4 128.91.9.1 5 198.32.42.249 6 216.27.100.221 7 216.27.100.22 8 198.32.8.82 9 198.32.8.77 10 198.32.8.81 11 198.32.8.13 12 198.32.8.49 13 207.231.240.4 14 202.158.194.109 15 128.171.64.102 16 205.166.205.222
(AS) 55 55 55 55 10466 10466 10466 11537 11537 11537 11537 11537 ? 7575 6360 6360
(delay1) 0.460 ms 0.588 ms 0.518 ms 0.596 ms 0.658 ms 0.687 ms 2.933 ms 23.459 ms 35.045 ms 35.984 ms 48.276 ms 72.286 ms 72.282 ms 123.036 ms 140.257 ms 122.984 ms
(delay2) 0.343 ms 0.466 ms 0.684 ms 0.496 ms 0.500 ms 0.698 ms 2.892 ms 22.920 ms 37.135 ms 39.350 ms 50.821 ms 72.243 ms 72.313 ms 122.996 ms 123.085 ms 122.960 ms
(delay3) 0.251 ms 0.629 ms 0.504 ms 0.900 ms 0.518 ms 0.804 ms 2.947 ms 35.553 ms 26.826 ms 36.221 ms 46.609 ms 72.234 ms 72.383 ms 123.036 ms 123.163 ms 122.990 ms
Figure 6.11 Adjusting path inside AS11537 causes significant delay reduction on path between amp-upenn and amp-hawaii
amp-fiu->amp-emory Fri Jun 30 03:50:37 PDT 2006 (Hop) (IP address) 1 131.94.191.2 2 131.94.192.10 3 198.32.155.77 4 198.32.155.5 5 198.32.155.65 6 198.32.155.66 7 170.140.14.37 8 170.140.127.97 Fri Jun 30 04:00:23 PDT 2006 (Hop) (IP address) 1 131.94.191.2 2 131.94.192.10 3 198.32.155.77 4 198.32.155.5 5 198.32.173.125 6 198.32.173.126 7 199.77.193.2 8 170.140.14.37 9 170.140.127.97
(AS) 3681 3681 11096 11096 11096 11096 10490 3591
(delay1) 0.428 ms 0.496 ms 0.775 ms 7.689 ms 7.700 ms 13.719 ms 13.893 ms 13.980 ms
(delay2) 0.630 ms 0.269 ms 0.709 ms 7.567 ms 7.648 ms 13.678 ms 13.839 ms 13.851 ms
(delay3) 0.271 ms 0.715 ms 0.705 ms 7.571 ms 7.598 ms 13.744 ms 13.827 ms 13.838 ms
(AS) 3681 3681 11096 11096 11096 11096 10490 10490 3591
(delay1) 0.436 ms 0.466 ms 1.049 ms 7.742 ms 7.781 ms 14.330 ms 13.700 ms 13.887 ms 13.855 ms
(delay2) 0.630 ms 0.270 ms 0.698 ms 7.565 ms 7.607 ms 14.068 ms 13.871 ms 13.824 ms 13.819 ms
(delay3) 0.271 ms 0.437 ms 0.684 ms 7.598 ms 7.594 ms 14.078 ms 13.692 ms 13.970 ms 13.798 ms
Figure 6.12 Load balancing inside AS11096 causes anomalous delay measurements at 6th and last hop on path between amp-fiu and amp-emory
132
Our second example in Figure 6.12 shows the traceroute snippet on the path between amp-fiu and am-emory. Here load balancing inside AS11096 introduces an anomalous measurement on the sixth hop which is greater than the round-trip delay to the seventh hop. This could be due to the case highlighted in Figure 6.9, Note the two IP addresses 198.32.173.125 and 198.32.173.126 represent a contiguous set and may have belonged to the same router here, but the large difference in delay measurements between the two suggests otherwise. The third example (Figure 6.13)
is a more classic case of dynamic load balancing inside
AS11537 where the path through this AS is different inside the same 10 minute window (12:20 to 12:30) used for conducting traceroute measurements. While several paths to amp-hawaii flipped from amp-bu, amp-upenn, amp-princeton etc to the newer paths at differing times (inside AS11537) and seemingly continued the same way for the remainder of the day based on traceroute data (at 10 min intervals), the path between amp-nyu and amp-hawaii seemed to be immune to this change. Apparently here the load balancing decision incorporates some routing policy.
133
amp-bu-> amp-hawaii
amp-princeton-> amp-hawaii
amp-upenn-> amp-hawaii
amp-nyu-> amp-hawaii
Fri Jun 30 12:12:31 (Hop) (IP address) 1) 128.197.160.1 2) 128.197.254.161 3) 128.197.254.122 4) 192.5.89.201 5) 192.5.89.10 6) 198.32.8.82 7) 198.32.8.77 8) 198.32.8.81 9) 198.32.8.13 10) 198.32.8.1 11) 198.32.8.94 12) 207.231.241.4 13) 202.158.194.109 14) 128.171.64.102 15) 205.166.205.222
Fri Jun 30 12:13:41
(Hop) (IP address) 1) 140.180.128.1 2) 128.112.12.6 3) 198.32.42.65 4) 216.27.100.22 5) 198.32.8.82 6) 198.32.8.77 7) 198.32.8.81 8) 198.32.8.13 9) 198.32.8.1 10) 198.32.8.94 11) 207.231.241.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
Fri Jun 30 12:18:04
Fri Jun 30 12:14:51
Fri Jun 30 12:22:26
Fri Jun 30 12:23:53
Fri Jun 30 12:28:02
Fri Jun 30 12:24:46
(Hop) (IP address) 1) 128.197.160.1 2) 128.197.254.161 3) 128.197.254.122 4) 192.5.89.201 5) 192.5.89.10 6) 198.32.8.82 7) 198.32.8.77 8) 198.32.8.81 9) 198.32.8.13 10) 198.32.8.49 11) 207.231.240.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
Fri Jun 30 12:32:30 (Hop) (IP address) 1) 128.197.160.1 2) 128.197.254.161 3) 128.197.254.122 4) 192.5.89.201 5) 192.5.89.10 6) 198.32.8.82 7) 198.32.8.77 8) 198.32.8.81 9) 198.32.8.13 10) 198.32.8.49 11) 207.231.240.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
(Hop) (IP address) 1) 140.180.128.1 2) 128.112.12.6 3) 198.32.42.65 4) 216.27.100.22 5) 198.32.8.82 6) 198.32.8.77 7) 198.32.8.81 8) 198.32.8.13 9) 198.32.8.1 10) 198.32.8.94 11) 207.231.241.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
(Hop) (IP address) 1) 128.91.40.1 2) 128.91.240.37 3) 128.91.10.2 4) 128.91.9.1 5) 198.32.42.249 6) 216.27.100.221 7) 216.27.100.22 8) 198.32.8.82 9) 198.32.8.77 10) 198.32.8.81 11) 198.32.8.13 12) 198.32.8.1 13) 198.32.8.94 14) 207.231.241.4 15) 202.158.194.109 16) 128.171.64.102 17) 205.166.205.222
No change!
(Hop) (IP address) 1) 128.91.40.1 2) 128.91.240.37 3) 128.91.10.2 4) 128.91.9.1 5) 198.32.42.249 6) 216.27.100.221 7) 216.27.100.22 8) 198.32.8.82 9) 198.32.8.77 10) 198.32.8.81 11) 198.32.8.13 12) 198.32.8.49 13) 207.231.240.4 14) 202.158.194.109 15) 128.171.64.102 16) 205.166.205.222
Fri Jun 30 12:33:44
Fri Jun 30 12:38:10
(Hop) (IP address) 1) 140.180.128.1 2) 128.112.12.6 3) 198.32.42.65 4) 216.27.100.22 5) 198.32.8.82 6) 198.32.8.77 7) 198.32.8.81 8) 198.32.8.13 9) 198.32.8.49 10) 207.231.240.4 11) 202.158.194.109 12) 128.171.64.102 13) 205.166.205.222
(Hop) (IP address) 1) 128.91.40.1 2) 128.91.240.37 3) 128.91.10.2 4) 128.91.9.1 5) 198.32.42.249 6) 216.27.100.221 7) 216.27.100.22 8) 198.32.8.82 9) 198.32.8.77 10) 198.32.8.81 11) 198.32.8.13 12) 198.32.8.49 13) 207.231.240.4 14) 202.158.194.109 15) 128.171.64.102 16) 205.166.205.222
(Hop) (IP address) 1) 192.76.177.177 2) 199.109.4.21 3) 199.109.7.97 4) 199.109.7.9 5) 199.109.2.2 6) 198.32.8.77 7) 198.32.8.81 8) 198.32.8.13 9) 198.32.8.1 10) 198.32.8.94 11) 207.231.241.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
(Hop) (IP address) 1) 192.76.177.177 2) 199.109.4.21 3) 199.109.7.97 4) 199.109.7.9 5) 199.109.2.2 6) 198.32.8.77 7) 198.32.8.81 8) 198.32.8.13 9) 198.32.8.1 10) 198.32.8.94 11) 207.231.241.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
No change! Fri Jun 30 12:34:58
(Hop) (IP address) 1) 192.76.177.177 2) 199.109.4.21 3) 199.109.7.97 4) 199.109.7.9 5) 199.109.2.2 6) 198.32.8.77 7) 198.32.8.81 8) 198.32.8.13 9) 198.32.8.1 10) 198.32.8.94 11) 207.231.241.4 12) 202.158.194.109 13) 128.171.64.102 14) 205.166.205.222
No change! Figure 6.13 Dynamic Load balancing inside AS11537 for paths to amp-hawaii seems to affect some paths at different times but not others
134
To demonstrate the effects of routing matrix inconsistencies we formulated the problem as a linear optimization problem to estimate the link metric vector as explained below.
Link-Metric Vector Estimation based on the l1 -norm minimization (Least Norm / Sparse Solution) Coates et al. [22] showed that estimating the link-metrics vector can be based on the underlying idea that only a few links in the network have significant delays and the remaining links have very insignificant delays close to zero. Previous works e.g. [130] showed that such combinatorial problems can be relaxed to an optimization problem and one approach to obtaining a sparse (least norm) estimate of β is to solve an l0 optimization problem of the form,
βˆ = arg min β 0 subject to ys = M s β β
(6-8)
where ys and M s respectively denote the rows of y and M to be monitored and β counts the number of the non-zero entries of β . It is well known that this problem is NP-hard, requiring one to enumerate all possible subsets of non-zero coefficients. Candes et al. [131] showed that if certain conditions on M s and β are met, the l0 optimization problem is equivalent to the following simpler l1 optimization problem.
βˆ = arg min β 1 subject to ys = M s β β
where β
(6-9)
n
1
= ∑ βi . i =1
Because the l1 optimization is convex, it is computationally tractable, and a solution can be obtained using linear programming. In addition to the constraints of (6-6) we also impose positivity constraints on the estimation of β i.e. β > 0 for 1 < i < ne . This is because if the routing matrix does not contain any inconsistencies then ideally the optimizer should allow for all links to attain positive values. In addition, Donoho [132] further comments on l1 -optimization, “ in “most” applications in science and technology, of course, the underlying model will not be perfectly correct and measurements will not be perfectly accurate. It is essential to use procedures which are robust against the effects of measurement noise and modelling error.” He further comments that when matrices underlying underdetermined systems have a sufficiently sparse near-solution, “…the nearsolution with minimal l1 norm is a good approximation to it”. Bruckstein et al. in [133] show that if
135
we further impose non-negativity constraints on the solution in addition to its sparsity, we get a solution that is unique. For AMP networks (Figure 6.14) we observe that as the number of monitored paths increases leading to more stringent constraints for the CO estimator, we see sharp spikes where the predictor yields high prediction error because the optimizer fails to assign non-negative delays to all links and terminates prematurely. Moreover, the L1 error does not reach zero even when all rank r paths are selected for monitoring and the algorithm diverges for AMP-50 after 150 paths are selected for monitoring. This adds conviction to our initial suspicion that it is due to the presence of routing matrix inconsistencies.
136
AMP-30-30/Jun/2006 1 0.9
L1-error
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50
100
150
200
250
Number of monitored paths
AMP-50-30/Jun/2006 1 0.9 0.8
L1-error
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50
100 150 200 250 300 350 400 450 500
Number of monitored paths Figure 6.14 Comparison of performance of CO estimator for AMP networks.
137
6.5.2 Can RMI be eliminated? The next question is can we identify the rows of the routing matrix M that are the source of RMI. However, finding such rows is a NP-hard problem, since it would require enumerating all possible subsets of rows to be plugged into CO estimator to find out the rows containing the inconsistencies. Since, it is very difficult to infer the actually topology by identification of RMI using only traceroute snapshot of the network. We propose a straw-man algorithm. Our algorithm for inferring a consistent routing matrix is centered around removal of false links as highlighted earlier. We first tabulate all link delays over each 10 min interval as recorded by the traceroutes. Of the three values for the nth hop, the link delay between the n-lth and the nth hop is calculated taking into view the least positive non-negative value. This is because it is well known that ICMP replies by routers to TTL expired packets sent by traceroutes are often rate limited. Similarly, when all three yield a negative value for the link delay we take the least negative value for the obvious reason. We then map the topology discovered by the traceroutes into a directed graph G=(V,E) where the vertex set V represents routers (using the router interface IP address) and the edge set E represents directed edges between two routers. We introduced the concept of false links in the preceding section (Figure 6.9). Some of these false links connect the source with a vertex in the graph by a false link for which there already exists a path albeit a different one; others connect two vertices for which there is no path in the actual graph (and the real network!). Such false links can be detected easily in situations when a link is sighted with seemingly negative value in the majority of the traceroutes and we can be almost sure that it is not due to a router delaying an ICMP response. Such negative delay links are removed by finding if there exists another set of links joining the same two vertices without encountering a negative delay link. This we call the Deletion With Replacement (DWR) heuristic. If not, then we simply delete the false link 〈ν i ,ν i +1 〉 and replace it with a new edge by inserting an edge between one previous vertex ν i −1 and ν i +1 to yield a longer link 〈ν i −1 ,ν i +1 〉
with a non-
negative delay as shown in Figure 6.15. We call this as the Deletion With Insertion (DWI) heuristic. We take care not to detect or delete any interAS link in this manner so as not to destroy the connectivity of the graph. We use an iterative greedy algorithm for the detection and removal of such false links exhibiting negative delay values, removing links in turn which lead to the most reduction of anomalous paths until all anomalies have been resolved. We find that this naïve algorithm only works for the smaller AMP-30 network but fails to work for AMP-50 (Figure 6.16). Statistical techniques will be introduced in Section 6.7 to mitigate the effects of RMI.
138
Figure 6.15 Removal of Routing Matrix Inconsistencies (RMI) using the DWI and DWR Heuristic for removal of false links
139
AMP-30-30/Jun/2006
L1-error
CO original
CO removal of RMI
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50
100
150
200
250
Number of monitored paths AMP-50-30/Jun/2006 CO original
CO removal of RMI
1 0.9
L1-error
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50 100 150 200 250 300 350 400 450 500
Number of monitored paths Figure 6.16 Comparison of performance of CO estimator before and after removal of RMI for AMP networks.
6.5.3 Quantification of RMI Rosen et al. [134] derived both necessary and sufficient conditions for estimation of the correct solution xc of an over-determined system of algebraic equations. Let
Ax ≈ b with
140
(6-10)
A ∈ ℜ m × n with full column rank n ( m > n ) and,
b ∈ ℜn and there are large errors in some rows of [ A b] with the underlying assumption being that there is a correct (but unknown) matrix Ac and bc . They find that the probability P that the calculated solution x* will be close to the correct solution xc depends largely on the magnitude of the size of the measurement data, the parameter m − n . Using an empirical model they find that as an upper bound;
P = 1 when
( m − n) 2k [134] ≥ n σ
(6-11)
where
k is the number of rows of [ A b] containing large errors (independent of the number of errors in any particular row of A); and, σ>0 is the lower bound on the singular values related to A . A probability P = 0.995 can be achieved with m − n ≥ 22 + 2k [134]. Although as highlighted earlier that the main goal is to be able to predict unmonitored paths as accurately as possible rather than the accurate estimation of the link metrics vector, estimating the correct link metrics vector helps towards this goal. We see later that most of our algebraic system of equations are underdetermined; this is due to partial network observations; we only select complete traceroutes for which each probe received a response (lack of response is often shown as stars) for our analysis. However, the routing matrix M is rank-deficient so m − n relates to the quantity n p − r ( r being the rank of the routing matrix M ) in our situation. The probability P is deeply related with the avoidance of selecting any row of M , b with a large error in the subset selected for monitoring for the ability to estimate the link metrics vector accurately. If careful techniques are not employed to cater for the mitigation of RMI, problems can be encountered in the estimation of link-metrics vector. We saw earlier that the BL predictor returns large prediction errors if the link metrics vector is not estimated carefully to cater for RMI. These measurement artifacts of the routing matrix estimation using traceroutes necessitate a procedure to infer the correct (or a more consistent) routing matrix as the methods described previously may break down completely or return large path estimation errors that could offset any benefits of monitoring fewer number of paths than the rank of the routing matrix. We analyze and propose methods to deals with mitigation of such errors.
141
Since, our knowledge of the routing matrix is only limited by the traceroute measurements conducted between the AMP and RIPE hosts with no external vantage points for measurements, it is not always possible to remove all inconsistencies from the routing matrix. In statistical systems involving large number of variables or a modeling errors due to RMI, collinear relationships can develop between correlated variables, a phenomenon often referred to as multicollinearity. Such problems can be mitigated by the regularization of the linear statistical model. This technique of regularizing statistical linear models has often been referred to as Ridge Regression or Tikhonov regularization. For example, in our case, collinear relationships could exist between parallel paths selected as a result of load balancing employed by large ASes (Figures 6.6 and 6.7). Thus, when variables in (Ms) are correlated amongst themselves, multicollinearity is said to exist [135]. In this case Vss = M s ΣM sT has a determinant that is very close to zero, and this will cause: (a) Round-off errors in the intermediate stages of the matrix calculations. These are especially serious when the number of predictor variables is large. (b) In the extreme case, the computations in intermediate stages of matrix calculations may break down if Vss becomes singular in terms of the precision of the calculation, making it impossible to compute its inverse i.e. (V ss ) −1 . Such errors also impact the accuracy of path predictions. Such effects can be mitigated by adding a small bias term to the equation for the BL prediction. Since the collinear-relationships between variables change as different subsets of paths are selected using Algorithm 1, we use regularization (ridge regression) of the statistical model estimate β so as to mitigate the effects of multicollinearity and RMI. Here a small bias term is added to the Vss matrix before taking its inverse as a diagonal matrix.
E (lr yr | ys ) = lrTVrs (Vss + cI ) −1 y s where
(6-12)
I t ×t is an identity matrix; 0 ≤ c ≤ 1,Vss ∈ ℜt ×t , and t = number of monitored paths
For the linear system of monitored paths, we estimate β R using:
β R = ( M sT )(Vss + cI ) −1 ys
(6-13)
To calculate the value of the constant c , we follow the normal judgmental procedure based on the analysis of ridge-traces [135-136]. We increment c from 0 to 1 in steps of 0.01. We select the
142
value of c that causes the coefficients β R of equation (6-13) to become stable, i.e. we stop increasing c once we reach the stop condition:
(
)
abs || β R old ||2 − || β R new ||2 ≤ 0.01 || β R old ||2
(6-14)
where ||.||2 represents the l2 -norm of a vector We find that c increases almost monotonically from 0.02 to 1 (for AMP-50) and only 0.02 to 0.56 (for RIPE-40) as the number of monitored paths increase beyond 10% and 50% of rank (r) paths respectively (Figure 6.17). This indicates that AMP networks suffered more severely from multicollinearity and RMI than RIPE networks as was observed from Figures 6.7 and 6.14.
RIPE-40-05/Sep/2007
AMP-50-30/Jun/2006
1
value of ridge-coefficient
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2
0.4
0.6
0.8
1
Number of monitored paths as fraction of rank (r) Figure 6.17 Computed value of c as the number of sampled paths increase for AMP50 and RIPE-40
143
6.6 Statistical Techniques to Mitigate the Effects of RMI Note that even measuring a subset of the paths reveals information of the end-to-end path metrics such as path delay/loss rates but does not necessarily reveal any information about the individual link metrics on those paths. Thus the problem is to estimate both the monitored and the unmonitored link metrics so as to minimize the prediction error on the unmonitored paths. Moreover, the rank deficient system of linear equations does not have a unique solution for the link metrics vector so it has to be estimated. Literature [8, 22] proposed several estimation techniques the
two
most
common
ones
are
based
on
the
minimum-norm
(sparse)
solution
( Min || β ||0 or Min || β ||1 ) and the minimization of l2 -norm of error, i.e. Min || ( ys − M s β ) ||2 by using the Least Squares (LS) method. Minimum norm (sparse solution) as used earlier can only help towards finding the optimum solution that reduces overall path prediction error but may not track individual path properties efficiently. For mitigating the effects of RMI, more robust estimation of the link metric vector is required.
Link-Metric Vector Estimation based on Iteratively Re-weighted Least-Squares method Statistical theory for Best Linear Prediction (BLP) [136] suggests estimating β by solving the following generalized least-squares problem.
Min ( y s − M s .β )T Vss−1 ( y s − M s .β ) β
(6-15)
where y s and M s respectively denote the rows of y and M to be monitored and Vss = M s ΣM sT the covariance between the selected paths where Σ is the link covariance matrix. The solution to the above is given by the Generalized Least-Squares (GLS) estimate βˆ is given by [136]:
βˆ = ( M sTVss−1M s ) − M sTVss−1 ys
(6-16)
−
where R denotes the generalized-inverse of matrix R One drawback of GLS based estimation in [8] of βˆ in BL prediction for ,
yS = M s β + ε
(6-17) is that it gives equal weight to all observations including the outliers thus penalizing each outlier equally. Robust regression techniques such as Iteratively Re-weighted Least Squares (IRLS) attempt to assign small weights to the outliers. Thus, instead of a GLS based estimation proposed in [8] for BL estimator, we use a weighted version of generalized least squares minimization which
144
can yield superior results in such cases. We use a variant of the specific method of Daubechies et al. [137] for IRLS in estimating the link metrics vector (Eq 6-18). The algorithm keeps reiterating (iterations are labelled t = 1,2,.. ,50) until it converges. Each iteration of the algorithm tries to find the new solution β t +1 at the t + 1th iteration:
β t +1 = Dt M sT ( M s Dt M sT ) −1 ys
(6-18)
where Dt is a ne × ne diagonal matrix at t th iteration. We denote the j th diagonal entry of Dt as
wtj . Once β t +1 is found, the new weight wt +1 is found by:
wtj+1 = (( β tj +1 ) 2 + ε t2+1 ) −1 / 2 j = 1,2 ,3,...,ne
(6-19)
r ( β t +1 ) K +1 ) ne
(6-20)
Here
ε t +1 = Min(ε t ,
and r ∈ ℜ ne , r ( β t +1 ) is the non-increasing rearrangement of the absolute values of the entries of
β t +1 . Thus, r (β t +1 )i is the i th largest element of the set | β t +1 | j , j = 1,2,3,.., ne . The algorithm terminates when ε t +1 = 0 or ε t +1 stabilizes at some non-negative value. At the start of the algorithm,
w0 = (1,1,1,...,1) and ε 0 = 1 . To initialize K , we compute the number of non-zero elements p in the initial solution β 0 (using w0 ) and set K = cp . We find that the algorithm converges better when 0.5 ≤ c ≤ 0.6. .
There are other robust regression techniques besides IRLS based estimation like LMS (Least Median of Squares) which aims to minimize the median of squares of the error instead of minimizing the sum (or average) of squares of the errors. However, unlike IRLS, LMS does not have any closed form expression and requires brute force search for evaluating combinational subsets of solutions (by removing rows from the set of linear equations which may be the cause of large overall estimation errors) and thus is not feasible for regression problems of large dimensions. We refer to the predictor using IRLS based on robust regression for link metrics prediction as the Robust Predictor in the remainder of this chapter to distinguish it from the BLP [8]. When estimating the link metrics vector using the IRLS based method, the estimated value for the link metrics vector as defined in Equation 6-18 is used. We call this the Robust Predictor as the method works iteratively based on minimizing the residual errors by improving on previous estimate of the link metrics vector and so removing larger outliers more aggressively. It mimics || β ||1 minimization albeit with no positivity constraints like the Convex Optimizer used earlier (Section 6.6) by computing a sparse solution.
145
Link-Metric Vector Estimation based on Least-Squares method after regularization of the statistical model Regularization of the statistical model (Section 6.5.3) can also act like a simple tool to mitigate the effects of large statistical errors. We use the estimate of link metrics vector using Ridge Regression (Tikhonov Regularization) for use in BL predictor. We call this predictor as BL-ridge to differentiate from the BLP [8].
6.7 Improvement in Path Prediction and Anomaly Detection for AMP and RIPE networks after application of Robust Statistical Techniques Figure 6.18 and 6.19 shows the L1 error for RIPE and AMP networks after application of robust statistical prediction Techniques. The L1 error for RIPE networks decreases more sharply. For AMP networks, overall L1-error is reduced as well as the spikes being diminished in magnitude when using the Robust estimator. The iterative nature of robust prediction using IRLS based estimate of link metric vector may be a cause of concern about its path tracking properties. We show that not only the robust prediction technique outlined lowers overall path prediction errors on unmonitored paths but also improves the individual path prediction. Figure 6.20 shows the improvement in the variance of the Relative Prediction Error (RPE), defined below as the number of monitored paths increases (for AMP-50 and AMP-30).
RPE =
abs (actual delay − predicted delay ) actualdela y
(6-21)
We next select the subset of paths which resulted in large prediction errors based on our results from Figures 6.7 and 6.14. Figure 6.21 shows sample variation of path delays on one unmonitored path and its prediction using the BL, BL-ridge and Robust Estimator. We observe that all three predictors (BL, BL-ridge and Robust) are good at tracking path anomalies showing peaks (in either direction) corresponding to major path variations even thought the granularity of path measurements on the monitored paths is of the order of 60 second intervals. Furthermore, since the path monitoring is not GPS synchronized in the datasets considered, we estimate ys (t ) ( y r (t ) ) as belonging to (one of the) windows of successive one minute intervals. Hence, the peaks of the predicted path metrics are sometimes offset by one such window-interval (either side) from the actual path anomaly. We see that BL-ridge and Robust estimator are more sensitive towards path anomalies than BL prediction.
146
L1-error
RIPE-30-05/Sep/2007 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
BL Robust
0
100
200
300
400
Number of monitored paths
RIPE-40-05/Sep/2007 1 0.9
BL Robust
0.8
L1-error
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
200
400
600
Number of monitored paths Figure 6.18 Comparison of the L1-error metric of BL and Robust predictor.
147
AMP-30-30/Jun/2006
L1-error
BL
Robust
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
50 100 150 200 Number of monitored paths
250
AMP-50-30/Jun/2006
L1-error
BL
Robust
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
50
100 150 200 250 300 350 400 450 500
Number of monitored paths Figure 6.19 Comparison of performance of BL and Robust estimator AMP networks.
148
Variance of Relative Predication Error
0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
AMP-30-30/Jun/2006 (BL) AMP-30-30/Jun/2006(BL-Ridge)
0.4
0.6
0.8
1
Number of monitored paths as fraction of rank (r)
Variance of Relative Prediction Error
0.1 0.09
AM P-50-30/Jun/2006 (BL) AM P-50-30/Jun/2006(Robust)
0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.4 0.6 0.8 1 Number of monitored paths as fraction of rank (r)
Figure 6.20 Improvement in Variance of Relative Prediction Error using BL-ridge and Robust estimator for AMP networks
149
100
Delay (msec)
90 80
Actual Path Delay BL BL-Ridge Robust
70 60 50 40 30 3.55
3.6 3.65 Time (sec since start)
3.7 4 x 10
Figure 6.21 Actual, BL, BL-ridge and Robust predictor delay profile for a selected (unmonitored) path in AMP-50-30/Jun/2006.
6.8 Discussion In this section we discuss the impact of routing matrix inconsistencies on algebraic and statistical path prediction methods and ask the question: Do inconsistencies in the routing matrices pose a real problem? The combined work of [7, 125] showed that one can determine completely, or estimate to predefined tolerance levels, the path metrics on all unmonitored paths by probing only a small subset S of paths because of extensive underlay link sharing of Internet paths. However, Chen et al. [7] showed that maximum benefits only occur when the number of overlay hosts N exceeds 100 so that | S | is in the range O ( N lg N ) . We have seen how statistical path prediction errors due to RMI begin to appear when the network size is much smaller (50 hosts).
150
We saw from Figure 6.7 & 6.14, that the effects of RMI are most pronounced after approximately 30% of the linearly-independent rank- r paths have been included in path monitoring set. From Figure 6.7 this roughly corresponds to an L1-error of 0.2 for both AMP and RIPE in spite of the rapidly decaying trend; this is clearly not good for accurate path prediction or anomaly detection. For example in AMP-50, to be able to estimate path-metrics to within 10% L1-error requires that at least 62% of the linearly-independent rank-r paths be monitored. The first major spike due to routing matrix inconsistencies occurs when monitoring a small fraction of the linearly-independent rank- r paths and this problem exacerbates as more paths are selected for monitoring. This shows that by the time we are able to achieve good path prediction, routing matrix inconsistencies begin to cause randomly large path prediction errors. These in turn can cause problems in predicting path anomalies (Figure 6.19), which is one of the prime objectives of RONs; to alleviate path outages/degradations before a user can detect these. Thus the techniques described in this chapter for removing RMI are essential in order to allow a subset of paths to be monitored and so reduce the monitoring overheads that would otherwise limit the scalability of RONs. This chapter concludes the third contribution of this thesis, namely an investigation of the practical problems in the area of algebraic and statistical path monitoring when applied to practical networks. We presented a constrained convex optimization technique to show how RMI can be identified and also showed how it is related with inaccurate routing knowledge on the network providing anecdotal evidence from network traceroutes. In addition we quantified the statistical prediction errors due to RMI through regularization of the linear model. We also studied the impact of RMI on path prediction; use of robust statistical techniques reduces the path prediction error (L1error) by 10-20% over BL estimation. Anomaly detection is also improved through robust statistical techniques.
6.9 Conclusion Research aimed at reduction of path monitoring overheads by leveraging topological knowledge seems to be the most promising area of research at the moment [7-8, 21-22] at first sight but unfortunately the performance benefits they claim to have are only based on limited deployment over a few selected ISPs, e.g. Abilene and Sprint [8, 22, 87], PlanetLab [26] or simulated topologies [5, 23-24]. The underlying assumption is that the routing (network layer) topology of the network is accurately known. These issues need to be addressed in detail using real heterogeneous overlay deployments in the Internet with limited topological knowledge [50] to fully ascertain their benefits beyond the theoretical claims. Our primary aim in this chapter was to investigate the source of practical problems in the area of algebraic and statistical path monitoring. These mainly stem from incorrect topology estimation due
151
to the measurement artifacts of traceroutes. These can result in inaccurate estimation of path metrics. More advanced topology estimation techniques using more robust route path tracing, e.g. [129], or exploiting techniques to correct such inaccurate path information [52, 138] can help towards improving such statistical path estimation techniques.
152
7 CONCLUSIONS AND PROPOSALS FOR FUTURE DIRECTIONS OF RESEARCH
7.1
Reviewing the Goal
BGP can suffer from delayed convergence after failure, and Internet flows seeking QoS guarantees may seek alternate paths to mask such failures. Resilient Overlay Networks can quickly provide such alternate paths. However, this requires large overheads for path monitoring to be able to select best alternate routes. The thesis of this dissertation is to investigate heuristics that make Resilient Overlay Network management more scalable. We established this thesis in terms of three intertwined yet competing aspects of scalability; architecture, path selection and path monitoring overheads.
7.1.1
Architecture
RON suffers from scalability problems. Aggressive path probing on all end-to-end paths between overlay hosts (overlay links) does not scale well beyond tens of hosts [4]. In Chapter 4, we showed a landmark based distributed architecture that can enable overlay networks to scale well while using a very sparse topology - O( N ) instead of O( N 2 ) overlay links. The sparse topology equates to an equal reduction in path monitoring overheads. We presented techniques for determining how overlay hosts should select a small set of geographically diversified detours. We showed that in spite of such a sparse topology, it can find a good working path with a very high probability.
7.1.2
Path Selection
Path selection in Resilient Overlay Networks is directly tied with path monitoring overheads. These path monitoring overheads could be traded with heuristics that enable disjoint path selection. Previous studies, e.g. [6], show that an intuitive method of selecting disjoint paths is just to select one which diverges earliest from the direct path. However, this should be based on AS level paths which are easier to obtain than IP level paths. In Chapters 3 and 5, we showed that that a significant percentage of one-hop overlay paths shared similar levels of path disjointness, thus making the process of path selection even more challenging. In Chapter 5, we presented a more elegant graph based algorithm to cater for this problem, i.e. to filter out a small set of disjoint paths to make path
153
selection easier. We then presented our technique of greedy selection using a ToR graph [27, 116]. We showed that not only the number of candidate paths could be brought down to a small number but also it filtered out the good paths by picking a path performing close to the best possible path in a large majority of the cases.
7.1.3
Path Monitoring
Previous research has shown the possibility of statistical techniques [8] of monitoring paths based on network tomography principles [7]. Such techniques depend on an accurate snapshot of the routing topology of the overlay network. Previous works [8] have investigated such statistical path prediction techniques for networks whose topology was well known e.g. Abilene. In Chapter 6, we highlighted how such an accurate snapshot of large networks using only commodity tools, e.g. traceroutes, is impossible to obtain. We then presented methods to reduce or eliminate the effects of such topology estimation errors by (a) identifying and fixing topology estimation errors; and (b) harnessing techniques in statistics, e.g. robust estimation, to deal with them when they cannot be identified and removed completely.
7.2
Future Research Directions
Future research in overlay networks will revolve around the same three aspects of enhancing and improving RON management described above.
7.2.1
More accurate overlay topology ‘modeling’
Research aimed at reduction of path monitoring overheads by leveraging topological knowledge seems to be the most promising area of research at the moment [7-8, 21-22] at first sight but unfortunately the performance benefits they claim to have are only based on limited deployment over a few selected ISPs e.g. Abilene and Sprint [8, 22, 87], PlanetLab [26] or simulated topologies [23-25]. The underlying assumption is that the routing (network layer) topology of the network is accurately known. These issues need to be addressed in detail using real heterogeneous overlay deployments in the Internet with limited topological knowledge [50] (Chapter 6) to fully ascertain their benefits beyond the theoretical claims.
7.2.2 Accurate depiction of Internet failure models Due to unavailability of real Internet failure information, some studies employ analytical models for generating failure scenarios on Internet paths; e.g. LM1 model [23] and exponentially distributed failures [139]. This may lead to an overestimation of the efficiency of overlay networks
154
in computing alternate paths. Naidu et al. [49] claim anomalies to be very rare events in the Internet than suggested by prior studies. Exploiting this fact could lead to non-negligible reduction of path monitoring overheads achieved by conservative methods of other researchers [8, 21]. Also, it would be difficult to compare results across different studies unless accurate modeling of Internet failure occurrence is not dealt with seriously.
7.2.3
Investigation of synergy between competing overlays
There have been overt criticisms the research community directed against selfish routing by overlay networks [39, 70] (Chapter 2, Section 2.2.5). However, again such claims have been made on emulated hypothetical situations, when path monitoring and path switching decisions in two or more overlays cause them to synchronize, i.e. switch traffic to the same path simultaneously. There is an urgent need for large scale deployment of multiple overlays to see the impact on the underlay network mechanisms when competing for bandwidth on same set of underlay links. For example, previous studies [140-141] have shown that content distribution overlays that are locality aware do not hurt ISP objectives as they are optimized to fetch content from the nearest location, e.g. within the ISP, a thing an ISP would also prefer from a commercial point of view. Such RONs will try to shift traffic within a certain radius in the network, e.g. choosing a relay node (detour) very close to the source or destination; thus it may not cause appreciable harm to other traffic flows. In addition, routing overlays such as RONs could be made to sense the presence of other overlays around them by monitoring the behavior of its frequently occurring path switching cycles and employing a randomized hysteresis algorithm to vary its anomaly detection and path switching algorithms to prevent any type of synchronization with other overlays. There is also an urgent need to study the business models that will evolve out of competing overlays. A large RON operator may actually be willing to provide its services to smaller RON operators in exchange for a fee.
155
156
APPENDIX We introduce some matrix notation before deriving the equation for the BL-estimator. We first sort the rows in the matrix M, according to the largest singular values using a row-permutation (detail later). The values of the column vector y are similarly sorted. Let us denote the new matrix and column vector as M new ∈ [0,1]n p × ne and ynew ∈ ℜn p respectively. Let M s represent the rows (paths) of M new which are selected for monitoring because they can approximate the largest singular dimensions to approximate the complete path matrix M well enough for reasonably predicting the unmonitored paths.
M s = (1 : k , :)M new (A.1)
ys = (1 : k , :) ynew
where the notation (a : b,:) J and (:, a : b) J refer to rows a through b and columns a through b ( a and b inclusive) respectively of matrix J . Similarly, the unmonitored paths and path metrics are the remaining rows of M new and ynew , as shown below.
M r = (k + 1 : n p , :)M new (A.2)
yr = (k + 1 : n p , :) ynew
The vectors y, ys and yr will vary over time so references to ys and yr relate to the values of ys (t ) and yr (t ) at some instant t . If we let β and Σ be the mean and covariance of link delays respectively, then the mean (ν ) and covariance ( V ) of y can be expressed as:
⎡ν ⎤ ⎡ M β ⎤ ν = ⎢ s⎥ = ⎢ s ⎥ ⎣ν r ⎦ ⎣ M r β ⎦ ⎡V Vsr ⎤ ⎡ M s ΣM sT V = ⎢ ss ⎥=⎢ T ⎣Vrs Vrr ⎦ ⎣ M r ΣM s
(A.3)
M s ΣM rT ⎤ ⎥ M r ΣM rT ⎦
(A.4)
Chua et al [125] found the link-covariance matrix Σ to be dominated by the diagonal elements for the considered Abilene network (the variance of the link delay values), with other elements mainly zero. For the datasets we consider, we find that the link covariance cannot be calculated efficiently by using traceroutes alone to infer link delays, as some traceroutes anomalously report smaller path delays to n + 1th hop than the nth hop, implying a negative link delay at n + 1th IP hop. Due to these measurement artifacts, we assume Σ to be an identity matrix like [142]. However, we
157
find that some nontrivial interrelationships between link properties can arise in practical situations as we discuss in the next section. The BL estimator for an unknown parameter y given x [136] is:
E ( y | x) = µ y + ( x − µ x )c*
(A.5)
where µx=E(x), µy=E(y), c* is the solution to Vxxc=Vxy (Vxx=Cov(x), Vxy=Cov(x,y) [136] (Section 6.3)) Similarly the BL-estimator for path metrics on the unmonitored paths (yr) given the path metrics on monitored paths (ys) is given by:
E (lrT yr | ys ) = lrT M r β + lrT c* ( ys − M s β )
(A.6)
(where c* is any solution to c*Vss=Vrs, and lr is a column-vector with the one element set to 1 (and others to 0) so as to select one row of Mr corresponding to a particular unmonitored path.) Since, the BL-estimator in (7) cannot be realized without knowledge of β; one natural solution is to estimate it from the data. Statistical theory [136] suggests estimating β by minimizing the following generalized least-squares problem.
Min( y s − M s .β )T Vss−1 ( y s − M s .β ) β
(A.7)
And the generalized least-squares estimate βˆ is given by [136]:
βˆ = ( M sTVss−1M s ) − M sTVss−1 ys
(A.8)
where R- denotes the generalized-inverse of matrix R And after substituting βˆ in (A.7) and simplifying, the BL estimator becomes:
E (lr yr | ys ) = lrTVrs (Vss ) −1 ys
158
(A.9)
REFERENCES: [1] [2]
[3]
[4] [5]
[6] [7] [8] [9] [10]
[11] [12]
[13] [14] [15]
[16] [17] [18] [19] [20] [21] [22] [23]
The AS Number Report see http://www.potaroo.net/tools/asn32/. C. Labovitz, et al., "Delayed Internet routing convergence," in SIGCOMM '00: Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, 2000, pp. 175-187. X. Yang and D. Wetherall, "Source selectable path diversity via routing deflections," in SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, 2006, pp. 159-170. D. Andersen, et al., "Resilient overlay networks," in SOSP '01: Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001, pp. 131-145. C. Tang and P. K. McKinley, "Improving multipath reliability in topology-aware overlay networks," in Distributed Computing Systems Workshops, 2005. 25th IEEE International Conference on, 2005, pp. 82-88. T. Fei, et al., "How to Select a Good Alternate Path in Large Peer-to-Peer Systems?," in Infocomm 06, Barcelona, Spain, 2006. Y. Chen, et al., "Tomography-based overlay network monitoring," in IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, 2003, pp. 216-231. D. B. Chua, et al., "Network Kriging," Selected Areas in Communications, IEEE Journal on, vol. 24, pp. 2263-2272, 2006. D. Andersen, et al., "Best-path vs. multi-path overlay routing," in IMC '03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, 2003, pp. 91-100. A. Akella, et al., "A comparison of overlay routing and multihoming route control," in SIGCOMM '04: Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, 2004, pp. 93-106. D. G. Andersen, et al., "Improving Web Availability for Clients with MONET," in 2nd Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA 2005. L. Subramanian, et al., "HLP: a next generation inter-domain routing protocol," in SIGCOMM '05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, 2005, pp. 13-24. W. Xu and J. Rexford, "MIRO: multi-path interdomain routing," SIGCOMM Comput. Commun. Rev., vol. 36, pp. 171-182, 2006. X. Yang, "NIRA: a new Internet routing architecture," in FDNA '03: Proceedings of the ACM SIGCOMM workshop on Future directions in network architecture, 2003, pp. 301-312. R. Teixeira, et al., "Network sensitivity to hot-potato disruptions," in SIGCOMM '04: Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, 2004, pp. 231-244. K. Gummadi, et al., "Improving the Reliability of Internet Paths with One-hop Source Routing," in OSDI '04, 2004, pp. 183-198. S. Savage, et al., "Detour: a Case for Informed Internet Routing and Transport," IEEE Micro, vol. Vol 19, no 1 pp. 50-59, January 1999. Z. Li and P. Mohapatra, "The Impact of Topology on Overlay Routing Service," in Infocom, Hong Kong, 2004. A. Nakao, et al., "Scalable routing overlay networks," SIGOPS Oper. Syst. Rev., vol. 40, pp. 49-61, 2006. S. Han Hee, et al., "NetQuest: a flexible framework for large-scale network measurement," SIGMETRICS Perform. Eval. Rev., vol. 34, pp. 121-132, 2006. Y. Chen, et al., "Algebra-based scalable overlay network monitoring: algorithms, evaluation, and applications," IEEE/ACM Trans. Netw., vol. 15, pp. 1084-1097, 2007. M. Coates, et al., "Compressed network monitoring for ip and all-optical networks," in IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, 2007, pp. 241-252. C. Tang and P. K. McKinley, "On the cost-quality tradeoff in topology-aware overlay path probing," in Network Protocols, 2003. Proceedings. 11th IEEE International Conference on, 2003, pp. 268279.
159
[24]
[25]
[26] [27] [28] [29]
[30] [31] [32]
[33] [34] [35] [36] [37] [38] [39]
[40]
[41] [42] [43] [44]
[45]
[46] [47]
160
C. Tang and P. K. McKinley, "A distributed approach to topology-aware overlay path monitoring," in Distributed Computing Systems, 2004. Proceedings. 24th International Conference on, 2004, pp. 122-131. C. Tang and P. K. McKinley, "Improving Multipath Reliability in Topology-Aware Overlay Networks," in Proceedings of the Fourth International Workshop on Assurance in Distributed Systems and Networks (ADSN 2005) (in conjunction with IEEE ICDCS), Columbus, Ohio, USA, 2005. H. H. Song, "Scalable and Flexible Network Measurement (Masters Thesis) ", Department of Computer Science, University of Texas at Austin, 2006. S. Qazi and T. Moors, "Using Type-of-Relationship (ToR) Graphs to Select Disjoint Paths in Overlay Networks," in GLOBECOM 2007, pp. 2602-2606. Y. Zhu, et al., "Dynamic overlay routing based on available bandwidth estimation: a simulation study," Comput. Networks, vol. 50, pp. 742-762, 2006. G. Kwon and K. Ryu, "BYPASS: topology-aware lookup overlay for DHT-based P2P file locating services," in Parallel and Distributed Systems, 2004. ICPADS 2004. Proceedings. Tenth International Conference on, 2004, pp. 297-304. B. Y. Zhao, et al., "Brocade: Landmark Routing on Overlay Networks," in IPTPS '02, MIT Faculty Club, Cambridge, MA, USA., 2002. B. Y. Zhao, et al., "Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays," in IEEE International Conference on Network Protocols (ICNP 2003), Atlanta, Georgia, USA, 2003. A.-J. Su, et al., "Drafting behind Akamai (travelocity-based detouring)," in SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, 2006, pp. 435-446. M. Faloutsos, et al., "On Power-Law Relationships in Internet topology," in Sigcom 99, Cambridge, MA, USA, 1999. B. Eriksson, et al., "Network discovery from passive measurements," SIGCOMM Comput. Commun. Rev., vol. 38, pp. 291-302, 2008. S. Ratnasamy, et al., "A Scalable Content Addressable Network," in SIGCOMM '01, San Diego, USA, 2001. I. Stoica, et al., "Chord: a scalable peer-to-peer lookup protocol for Internet applications," Networking, IEEE/ACM Transactions on, vol. 11, pp. 17-32, 2003. S.-J. Lee, et al., "Bandwidth-Aware Routing in Overlay Networks," in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE, 2008, pp. 1732-1740. T. Rakotoarivelo, et al., "A Super-Peer based Method to Discover QoS Enhanced Alternate Paths," in Communications, 2005 Asia-Pacific Conference on, 2005, pp. 454-458. B.-G. Chun, et al., "Characterizing Selfishly Constructed Overlay Routing Networks," in Proceedings of the 23rd IEEE International Conference on Computer Communications (INFOCOM 2004), 2004. J. Han, et al., "Topology aware overlay networks," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2005, pp. 2554-2565 vol. 4. "RIPE, Test Traffic Measurements (TTM) Home Page. See http://www.ripe.net/projects/ttm/data.html." Active Measurement Project (AMP). see http://watt.nlanr.net/. D. Anderson, et al., "Best Path Vs Multi-path Overlay Routing," in IMC’03 . Miami Beach, Florida, USA, 2003. D. Antonova, et al., "Managing a portfolio of overlay paths," in NOSSDAV '04: Proceedings of the 14th international workshop on Network and operating systems support for digital audio and video, 2004, pp. 30-35. H. Madhyastha, et al., "iPlane: an information plane for distributed services," in OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, Seattle, Washington, 2006, pp. 367-380. H. V. Madhyastha, et al., "A Structural Approach to Latency Prediction," presented at the IMC 2006, 2006. . H. V. Madhyastha, et al., " iPlane Nano: Path Prediction for Peer-to-Peer Applications. ," in NSDI 2009, 2009.
[48] [49] [50] [51] [52] [53] [54] [55]
[56] [57] [58] [59]
[60]
[61] [62]
[63] [64] [65]
[66] [67] [68] [69] [70] [71] [72] [73]
A. Broido and k. Claffy, "Analysis of RouteViews BGP data: policy atoms " presented at the Network Resource Data Management Workshop, 2001. K. V. M. Naidu, et al., "Detecting Anomalies Using End-to-End Path Measurements," in INFOCOM 2008. The 27th Conference on Computer Communications. IEEE, 2008, pp. 1849-1857. S. Qazi and T. Moors, "Practical Issues of Statistical Path Monitoring in Overlay Networks with Large, Rank-Deficient Routing Matrices," in Broadnets, London, UK, 2008. Y. Zhang and N. Duffield, "On the constancy of internet path properties," in IMW '01: Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement, 2001, pp. 197-211. The Skitter Project (CAIDA) 2002. http://www.caida.org/tools/measurement/skitter/. C.-M. Cheng, et al., "Path probing relay routing for achieving high end-to-end performance," in Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE, 2004, pp. 1359-1365 Vol.3. M. Coates, et al., "Maximum likelihood network topology identification from edge-based unicast measurements," SIGMETRICS Perform. Eval. Rev., vol. 30, pp. 11-20, 2002. R. Govindan and H. Tangmunarunkit, "Heuristics for Internet map discovery," in INFOCOM 2000. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 2000, pp. 1371-1380 vol.3. F. Viger, et al., "Detection, understanding, and prevention of traceroute measurement artifacts," Comput. Netw., vol. 52, pp. 998-1018, 2008. M. Luckie, et al., "Traceroute Probe Method and Forward IP Path Inference," presented at the Internet Measurement Conference (IMC '08), Vouliagmeni, Greece, 2008. A. Nakao, et al., "A Routing Underlay for Overlay Networks " in SIGCOMM’03 Karlsruhe, Germany, 2003. W. Cui, et al., "Backup path allocation based on a correlated link failure probability model in overlay networks," in Proceedings of 10th IEEE International Conference on Network Protocols (ICNP’02), Paris, France, 2002, pp. 236-247. R. Kawahara, et al., "On the Quality of Triangle Inequality Violation Aware Routing Overlay Architecture," in INFOCOM 2009. The 28th Conference on Computer Communications. IEEE, Rio de Janeiro, 2009, pp. 2761-2765. M. Uchida, et al., "QoS-Aware Overlay Routing with Limited Number of Alternative Route Candidates and Its Evaluation," IEICE Trans Commun, vol. E89-B, pp. 2361-2374, 2006. N. Hu and P. Steenkiste, "Exploiting internet route sharing for large scale available bandwidth estimation," in IMC '05: Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement, Berkeley, CA, 2005, pp. 16-16. L. Gao, "On inferring autonomous system relationships in the internet," IEEE/ACM Trans. Netw., vol. 9, pp. 733-745, 2001. F. Dabek, et al., "Designing a DHT for Low Latency and High Throughput," in NSDI '04, 2004, pp. 85-98. S. Ratnasamy, et al., "A scalable content-addressable network," in SIGCOMM '01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, pp. 161-172. (2000) Fast Internet Content Delivery with FreeFlow, Akamai see www.cs.washington.edu/homes/ratul/akamai/freeflow.pdf Z. Li and P. Mohapatra, "QRON: QoS-aware routing in overlay networks," Selected Areas in Communications, IEEE Journal on, vol. 22, pp. 29-40, 2004. S. D. Patek, et al., "Enhancing aggregate QoS through alternate routing," in Global Telecommunications Conference, 2000. GLOBECOM '00. IEEE, 2000, pp. 611-615 vol.1. L. Subramanian, et al., "OverQoS: offering Internet QoS using overlays," SIGCOMM Comput. Commun. Rev., vol. 33, pp. 11-16, 2003. R. Keralapura, et al., "Race Conditions in Coexisting Overlay Networks," Networking, IEEE/ACM Transactions on, vol. 16, pp. 1-14, 2008. H. Tangmunarunkit, et al., "Network Topology Generators: Degree based vs Structural," in Sigcomm '02, Pittsburgh, Pennsylvania, USA, 2002. S. Zhou and R. J. Mondragon, "The rich club phenomenon in internet topology," IEEE Communication letters, vol. 8, pp. 180-182, March 2004. PlanetLab. see http://www.planet-lab.org/. Available: http://www.planet-lab.org/
161
[74]
[75]
[76] [77] [78]
[79]
[80] [81] [82] [83] [84] [85] [86] [87] [88] [89]
[90] [91] [92] [93]
[94] [95] [96] [97] [98]
162
H. Chang, et al., "Internet connectivity at the AS-level: an optimization-driven modeling approach," in MoMeTools '03: Proceedings of the ACM SIGCOMM workshop on Models, methods and tools for reproducible network research, 2003, pp. 33-46. S. Jaiswal, et al., "Comparing the structure of power-law graphs and the Internet AS graph," in Network Protocols, 2004. ICNP 2004. Proceedings of the 12th IEEE International Conference on, 2004, pp. 294-303. S. Agarwal, et al., "OPCA: robust interdomain policy routing and traffic control," in Open Architectures and Network Programming, 2003 IEEE Conference on, 2003, pp. 55-64. A. Bremler-Barr, et al., "Improved BGP Convergence via Ghost Flushing," in Infocom '03, San Francisco, USA, 2003. W. Xu and J. Rexford, "MIRO: multi-path interdomain routing," in SIGCOMM '06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, 2006, pp. 171-182. J. Chandrashekar, et al., "Limiting path exploration in BGP," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2005, pp. 2337-2348 vol. 4. J. Chandrashekar, et al., "Fixing BGP, one as at a time," in NetT '04: Proceedings of the ACM SIGCOMM workshop on Network troubleshooting, 2004, pp. 295-300. D. Pei, et al., "BGP-RCN: improving BGP convergence through root cause notification," Comput. Netw. ISDN Syst., vol. 48, pp. 175-194, 2004. O. Bonaventure, et al., "Achieving sub-50 milliseconds recovery upon BGP peering link failures," IEEE/ACM Trans. Netw., vol. 15, pp. 1123-1135, 2007. C. Labovitz, et al., "Delayed Internet routing convergence," Networking, IEEE/ACM Transactions on, vol. 9, pp. 293-306, 2001. J. Luo, et al., "An Approach to Accelerate Convergence for Path Vector Protocol," in Globecom '02, Tapei, Taiwan, ROC, 2002. N. Kushman, et al., "R-BGP: Staying Connected in a Connected World," in 4th USENIX Symposium on Networked Systems Design & Implementation 2007, pp. 341-354. B. Quoitin, et al., "Interdomain traffic engineering with BGP," Communications Magazine, IEEE, vol. 41, pp. 122-128, 2003. M. Motiwala, et al., "Path splicing," in SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communication, Seattle, WA, USA, 2008, pp. 27-38. M. Shand and S. Bryant, "IP Fast Reroute Framework," draft-ietf-rtgwg-ipfrr-framework-10, work in progress, Feb 27 2009. P. Francois and O. Bonaventure, "An evaluation of IP-based fast reroute techniques," in CoNEXT '05: Proceedings of the 2005 ACM conference on Emerging network experiment and technology, Toulouse, France, 2005, pp. 244-245. S. Singh, et al., "Asynchronous Transfer Mode (ATM) over Layer 2 Tunneling Protocol Version 3 (L2TPv3), RFC 4454," May 2006. A Path Computation Element (PCE)-Baed Architecture, IETF RFC 4655, 2006. M. Yannuzzi, et al., "On the challenges of establishing disjoint QoS IP/MPLS paths across multiple domains," Communications Magazine, IEEE, vol. 44, pp. 60-66, 2006. I. v. Beijnum. (2002 A Look at Multihoming and BGP. See http://www.oreillynet.com/pub/a/network/2002/08/12/multihoming.html. Available: http://www.oreillynet.com/pub/a/network/2002/08/12/multihoming.html G. Huston. (2004, BGP Routing Table Analysis Reports, http://bgp.potaroo.net/ Available: http://bgp.potaroo.net/ T. Bu, et al., "On characterizing BGP routing table growth," Comput. Netw., vol. 45, pp. 45-54, 2004. C. De Launois and M. Bagnulo, "The paths toward IPv6 multihoming," Communications Surveys & Tutorials, IEEE, vol. 8, pp. 38-51, 2006. O. Antonova, "Introduction and Comparison of SCTP, TCP-MH, DCCP protocols," 2004. S. Tao, et al., "Exploring the performance benefits of end-to-end path switching," in Network Protocols, 2004. ICNP 2004. Proceedings of the 12th IEEE International Conference on, 2004, pp. 304-315.
[99] [100] [101] [102] [103] [104]
[105]
[106] [107] [108]
[109]
[110] [111] [112] [113] [114] [115]
[116]
[117]
[118]
[119] [120] [121]
[122] [123] [124]
J. Han, et al., "An Experimental Study of Internet Path Diversity," Dependable and Secure Computing, IEEE Transactions on, vol. 3, pp. 273-288, 2006. G. Huston. The growth of the bgp table - 1994 to present. http://bgp.potaroo.net Available: http://bgp.potaroo.net CAIDA , The Cooperative Association for Internet Data Analysis see http://www.caida.org/home/. NLANR-AMP, "Location of AMP monitors. see http://watt.nlanr.net/," ed. RIPE-NCC, "Location of RIPE monitors. see http://www.ripe.net/projects/ttm/Plots/locations.cgi," ed. S. Savage, et al., "The end-to-end effects of Internet path selection," in SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, 1999, pp. 289-299. M. Faloutsos, et al., "On power-law relationships of the Internet topology," in SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, 1999, pp. 251-262. CAIDA AS Relationships Dataset, see http://www.caida.org/data/active/as-relationships/. R. Keralapura, et al., "Can ISPs Take the Heat from Overlay Networks?," presented at the HotNets (04), San Diego, CA USA 2004. T. S. E. Ng and H. Zhang, "Predicting Internet network distance with coordinates-based approaches," in INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 2002, pp. 170-179 vol.1. V. Padmanabhan and L. Subramanian, "An investigation of geographic mapping techniques for internet hosts," in SIGCOMM '01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, 2001, pp. 173-185. S. Ratnasamy, et al., "Topologically-Aware Overlay Construction and Server Selection," in Infocom, New York, NY, USA, 2002. M. Costa, et al., "PIC: practical Internet coordinates for distance estimation," in Distributed Computing Systems, 2004. Proceedings. 24th International Conference on, 2004, pp. 178-187. P. Francis, et al., "IDMaps: A Global Internet Host Distance Estimation Service," ed, 2000. L. Tang and M. Crovella, "Virtual Landmarks for the Internet," in IMC’03, Miami Beach, Florida, USA, 2003. G. Mohan, et al., "Efficient algorithms for routing dependable connections in WDM optical networks," Networking, IEEE/ACM Transactions on, vol. 9, pp. 553-566, 2001. T. Rakotoarivelo, et al., "A structured peer-to-peer method to discover QoS enhanced alternate paths," in Information Technology and Applications, 2005. ICITA 2005. Third International Conference on, 2005, pp. 671-676 vol.2. T. Erlebach, et al., "Cuts and Disjoint Paths in the Valley-Free Path Model," presented at the Proceedings of the First Workshop on Combinatorial and Algorithmic Aspects of Networking (CAAN), 2004 G. Di Battista, et al., "Computing the types of the relationships between autonomous systems," in INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications Societies. IEEE, 2003, pp. 156-165 vol.1. J. Xia and L. Gao, "On the evaluation of AS relationship inferences [Internet reachability/traffic flow applications]," in Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE, 2004, pp. 1373-1377 Vol.3. R. E. T. J.W. Suurballe, "A Quick Method for Finding Shortest Pairs of Disjoint Paths," Networks vol. Vol. 14, pp. pp 325-336, 1984. J. Kleinberg, "Approximation Algorithms for Disjoint Paths Problems, PhD thesis," PhD thesis, Dept. of EECS MIT 1996. T. Rakotoarivelo, et al., "Enhancing QoS Through Alternate Path: An End-to-End Framework " in ICN 2005, 4th International Conference on Networking ReunionIsland, France, 2005, pp. 125-132. Cymru IP TO ASN Whois Service. http://www.cymru.com/. GNU netcat. see http://netcat.sourceforge.net. RouteViews. Available: http://www.routeviews.org/
163
[125]
[126]
[127] [128]
[129] [130] [131]
[132]
[133] [134] [135] [136] [137]
[138] [139]
[140] [141]
[142]
164
D. B. Chua, et al., "Efficient monitoring of end-to-end network properties," in INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2005, pp. 1701-1711 vol. 3. H. V. Madhyastha, et al., "iPlane: An Information Plane for Distributed Services," in In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, Seattle, WA, 2006, pp. 367-380,. G. H. Golub and C. F. V. Loan, Matrix Computations, Third ed.: John Hopkins, 1996. N. Spring, et al., "Measuring ISP topologies with rocketfuel," in SIGCOMM '02: Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, 2002, pp. 133-145. B. Augustin, et al., "Avoiding traceroute anomalies with Paris traceroute," in IMC '06: Proceedings of the 6th ACM SIGCOMM on Internet measurement, 2006, pp. 153-158. D. Dobson and F. Santosa, "Recovery of blocky images from noisy and blurred data," SIAM J. Appl. Math., vol. 56, pp. 1181-1198, 1996. E. J. Candes, et al., "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information," Information Theory, IEEE Transactions on, vol. 52, pp. 489509, 2006. D. Donoho, "For most large underdetermined systems of equations, the minimal â„“1-norm nearsolution approximates the sparsest near-solution," Communications on pure and applied mathematics, vol. 59, pp. 907-934, 2006. A. Bruckstein, et al., "On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations," IEEE Transactions on Information Theory, vol. 54, pp. 4813-4820, 2008. Rosen, et al., "Accurate Solution to Overdetermined Linear Equations with Errors Using L1 Norm Minimization," Computational Optimization and Applications, vol. 17, pp. 329-341, 2000. J. Neter, et al., Applied Linear Regression Models, Third ed.: Irwin, 1996. R. Christensen, Plane Answers to Complex Questions: The Theory of Linear Models, Third ed.: Springer, 2002. I. Daubechies, et al., "Iteratively Re-weighted Least Squares minimization: Proof of faster than linear rate for sparse recovery," in Information Sciences and Systems, 2008. CISS 2008. 42nd Annual Conference on, 2008, pp. 26-29. The Archiplego Project (CAIDA) http://www.caida.org/projects/ark/. W. Cui, et al., "Backup path allocation based on a correlated link failure probability model in overlay networks," in Network Protocols, 2002. Proceedings. 10th IEEE International Conference on, 2002, pp. 236-245. V. Aggarwal, et al., "Can ISPS and P2P users cooperate for improved performance?," SIGCOMM Comput. Commun. Rev., vol. 37, pp. 29-40, 2007. T. Karagiannis, et al., "Should internet service providers fear peer-assisted content distribution?," in IMC '05: Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement, Berkeley, CA, 2005, pp. 6-6. D. B. Chua, et al., "A Statistical Framework Fo Efficient Monitoring Of End-to-End Network Properties," CoRR, vol. abs/cs/0412037, 2004.