business is Internet measurement. On the other ... Is it a good time to download a video file from the site ... connection from the local computer to the web hosts.
TOOL to GENERATE a LOCAL INTERNET WEATHER REPORT Fabio González, Nivedita Sumi Majumdar, King-Ip Lin and Dipankar Dasgupta Division of Computer Science Department of Mathematical Sciences The University of Memphis, Memphis, TN 38152, USA {fgonzalz, nmajumdr, davidlin, ddasgupt}@memphis.edu
Abstract This paper reports the results of some experiments conducted to measure the Internet congestion, viewed from a local perspective. A user-friendly graphical tool was developed to reflect the “local internet weather”. For a typical user this information is more relevant than the global characteristics, however, the existing tools to measure Internet traffic are not designed with the individual non-technical user in mind.
1
Introduction
The Internet is a big, complex and continuously growing system. Its complexity is due to the fact that a lot of different elements are constantly interacting in different ways. This complex structure and behavior is analogous to a natural system like the weather. Understanding of the weather system has allowed the meteorologist to build complex models. These models in turn enable the understanding and prediction of some relevant variables like temperature, pressure, wind velocity, etc. While the accuracy is not necessarily high (in day-to-day basis), these models are very useful and they have become indispensable to every day life.
locate connectivity problems and to estimate the quality of the connections. These tools are very specific and technical background is needed to use them properly. Thus, currently there is not much a typical user can find between the very general information provided by the Internet weather or tomography initiatives and the very specific technical oriented network administration tools. Our goal was to develop an application to measure the traffic congestion of Internet from a local point of view, that is, to reflect the "local internet weather". For a typical user this information is more relevant than the global characteristics. He/she is interested in "how is Internet working today? Is it a good time to download a video file from the site X?" This application provides the user with information about the traffic on the part of Internet he/she is interested: the user's local computer and the connections from it to the web sites the user visit most frequently. The application uses traceroute (a free available tool for network monitoring) to analyze the current state of the connection from the local computer to the web hosts specified by the user. This information is processed and an index that reflects the overall congestion is calculated.
The same principle can be applied to the Internet. There have been some initiatives to study the Internet as a whole, trying to model, measure and predict some relevant metrics. The main goal of these projects is to characterize the data flow and traffic performance. As product of these projects, today there are tools to monitor the behavior of Internet: topology (Internet Tomography [1]), traffic (Internet Weather[3]), etc. These tools have been designed from a global point of view, that is, they try to reflect the global behavior of the net. This effort requires a big infrastructure that provides permanent measures at different points of a growing and already huge network. This effort can only be accomplished by large organizations with deep resources, such as ISPs, government organizations or by private companies whose business is Internet measurement.
2
On the other hand, there are tools that allow the user to monitor the quality of the connection between two computers: ping, traceroute, tracepath, etc. These tools are used daily by networks administrators in order to
The global monitoring demands more infrastructure since the data have to be gathered from different points on the net. This information is obtained directly from the routers, packet sniffers or dedicated hardware (passive
Previous work
The approaches to measurement of Internet traffic can be classified in two categories: local and global. In the local case, the main goal is to determine the characteristics of the traffic in a small subset of Internet that generally represents a local network or a small set of local networks. In the global case, the purpose is to characterize Internet as a whole, taking in account all the elements that constitute it. The local monitoring usually collects statistics like response time, packet loss, number and type of packets, etc. This is usually performed using network administration software and some tools like packet sniffers (tcpdump, netmon, patchar, traceroute etc) [4,5]
measurement) or injecting packets to measure the network behavior (active measuring). Examples of active monitoring are: HEPnet, I2(Abilane) and Skitter [4]. Examples of passive monitoring are: CoralReef and tcpdump [4].
Figure 1 provides the overall picture for the procedure. Each of the steps is described in the following subsections.
The amount of information generated by Internet monitoring could be astronomically big. This necessitates tools that help to analyze, visualize and understand this amount of information. A very important effort to develop the Internet measurement techniques is The Cooperative Association for Internet Data Analysis (CAIDA) [2].
In this step we determine the routes taking from the local host to each individual hosts on the list. First we obtain reachability information. We do this by trying to establish an HTTP connection to every member in our host list one by one. The success or failure of this attempt provide us with important information like whether a host name is valid; whether the host is alive and whether at all the host will allow connections to it. While this can be done by traceroute, using HTTP packets allow us to infer more information. For instance, HTTP packets would not be stopped by firewalls where as ICMP or UDP packets that are used by traceroute would be. This enables us to distinguish whether the host is behind the firewall or is actually unreachable. Therefore we used HTTP packets to obtain more complete information.
Another interesting project is the MIDS Internet Weather Report (IWR) [3]. It talks about conditions inside the Internet itself instead of real world meteorology. Their maps show round trip times from their local offices to approximately 4,500 domains worldwide, currently every four hours, six times a day, seven days a week, using ICMP ECHO (ping).
3
Proposed Solution
>traceroute nis.nsf.net
As stated, our goal is to measure the performance of the Internet from a certain viewpoint. In order to do that, we define a subnet of the Internet with the local host, a set of the most important hosts on Internet, and the intermediate connections (hops). This set of hosts represents the part of the Internet that a user is interested in and that is local to him in the sense that he is a regular visitor of those sites. The traffic conditions in the routes to those hosts are important information for him, affecting his decisions like whether to browse there or not, whether to download from those sites or not, at any given time. We apply techniques to characterize the topology of the connections from the local host to every member on that subnet and estimate the quality of those connections in terms of some standard metrics like bandwidth and latency. To begin, we outline the major steps of the entire procedure. In subsequent sections, we explain in more details the particulars of the indices calculation algorithm. 3.1 Overall procedure
Determination of Routes to the Hosts
3.1.1 Determination of Routes to the Individual Hosts
Internet Weather INDICES
Route Tree Construction Statistics Calculation Figure 1. Flow Chart for overall procedure
traceroute to nis.nsf.net (35.1.1.48), 30 hops max, 56 byte packet 1 helios.ee.lbl.gov(128.3.112.1) 19ms 19ms 0 ms 2 lilac-dmc.Berkeley.EDU(128.32.216.1) 39ms 39 ms 19 ms : 11 nic.merit.edu (35.1.1.48) 239ms 239ms 239ms
Figure 2. A sample call to traceroute and its output After connectivity is established, we use traceroute [5] to obtain round trip time information. It collects not only information of round trip time from source to destination, but also time information from each hop, which will be used in subsequent steps. Figure 2 shows a sample call to traceroute and its output. 3.1.2 Route Tree Construction We used the information obtained by traceroute to build a route tree. The route tree combines all the information about individual routes to form a single tree, describing all the paths that are encountered when a site is accessed from the local host. Nodes in the tree correspond to the intermediate hops that are visited along the route. Each edge of the tree contains information about the time taken to travel between the two nodes. The next section describes how the numbers are obtained. Notice that traceroute also returns three numbers in milliseconds as the measures of the round trip times of the ICMP or UDP packets sent out to a certain hop. By default traceroute makes three successive attempts and returns the value for those three attempts respectively. We build a conventional prefix tree with those paths and store the average round trip time values respectively at each node as returned by
traceroute.
Let us consider any route from Local Host to a certain other Host i. Diagrammatically it will look like Figure 4, where rtt is the round trip time. Round trip time for each node is the time taken for a packet to travel to and fro along that route with respect to the local host. Referring to the results returned by traceroute as illustrated in Figure 2, we see that for each host we have three estimates returned. We already averaged these three numbers and the artt considered here is exactly that average time measure.
Figure 3. Route taken by the ICMP or UDP packets may reach only as far as the network boundaries. One assumption we made is that in cases where the host is actually not reachable, we are able to trace the route up to the network boundaries. At that point the presence of firewalls prevent any further exploration filtering the UDP packets. This is a very common situation. This issue is illustrated in a schematic way in Figure 3. Recall that we use HTTP packets to confirm the existence (and liveliness) of the site. In such case traceroute collect information only up to the network boundaries. 3.2 Statistics Calculation 3.2.1 Weather Indices As mentioned, our goal is to represent the results of all our measurements and calculations in a very compact way to the Internet surfer. The weather indices have been designed to do just that. In this section we discuss how we calculate these indices and give some insights as to why we consider that these numbers will reflect how the Internet traffic situation is, at any given instant. Our graph of these indices versus time is drawn to give the user a good idea of the web traffic conditions over a certain period of time. 3.2.2 Algorithm for Construction of Indices
We denote segment time, S(a, b), as the round trip time from node "a" to "b".Then, segment times are calculated for each segment as: S(Local Host, a) = a rtt S(a, b) = b rtt - a rtt S(b, c) = crtt - b rtt :
and so on. Consider the routes to Host 1 and Host 2: Host 1: Host 2:
a a
b b
c e
d f
e g
Now, node "a" will clearly have two estimates for the segment time, from Local Host to node "a", obtained from the Host 1 and Host 2 routes respectively, and these values are not necessarily in agreement with each other. In fact, when the tree is completely built, any segment will have as many estimates of the segment time to it, as are the number of leaves in the sub-tree for the terminal node in that segment.Therefore, for each route i, that has the segment < a, b > in it, we obtain an estimate S(a, b) i . We can estimate Savg(a,b) as follows: m
S avg ( a , b ) =
∑
S (a, b)i
i=1
m The averaged values yield a better estimate of the segment times.
Note that sometimes a route may say that the time to reach node "b" in route 1 is more than the time to reach node "c" in the same route, which comes later then it! This leads to some negative estimates for the segment time. Our argument is that, since we will have a number of estimates of the segment times, obtained from different routes, on the average these oddities will be evened out. Now, we calculate the round trip time to each host with our improved estimates of the segment round trip times.
Figure 4. Round trip time for different hosts in the route to host i.
rtthost = S avg ( LocalHost, a) + S avg ( a, b) + S avg (b, c) + ... i
Now using these round trip values, we can calculate our weather indices. We have the following options.
1.We sum the round trip times to each host and divide it with the total number of hosts and the result is our Mean weather index. Mean Weather Index =
∑
n i= 1
Conclusions
A novel approach to measure the local Internet congestion was proposed. Here, “local” Internet is understood as the user computer, the websites that the user visits more frequently plus the intermediate hops that allow the connection.
i rtthost
n
But this is sometimes disadvantageous because an occasional high round trip time value to any one host, due to whatever reason, can pull up the index very much. This will mean a bad forecast even though the traffic conditions to most of the other hosts on the subnet is fine. 2.We calculate the median round trip time value and use that as an index. This eliminates the effect of any one large or small round trip time to affect the net results in any considerable way. This is a more reliable metric. 3.We have clustered the round trip time data. We found that some of the hosts are always taking abnormally high times. We have classified the hosts into three categories depending on the stability of their connection times. A host, which always gives high RTT values, might indicate that something is wrong on that particular connection route rather than the entire subnet. The user can be given the option to think twice about trying to connect to that site in particular.
4
5
Implementation Details
Conventional tools to determine the connection characteristics between two hosts were used and the information obtained was combined to produce the local Internet topology and connections metrics. This information was further used to calculate a set of indices that reflect the local Internet weather. Experimentation was performed, however, the space limitations prevented us from report it here. It showed that the proposed approach works and the results are consistent. However, further experiments are necessary to determine the real potential of the method. One important application of this technique for real world application seems to lie for the multicast community as a high level characterization of the “local subnet” or “the members of a particular multicast community” as it may be interpreted can be interestingly exploited. The following are some of the limitations we think need to be resolved in order to apply the technique to real working environments: § Multiple path problems. Route alternation. The Internet path between two hosts is not fixed, it changes dynamically oscillating over a small set of possibilities. § Multiple IP address. For big Internet sites (as www.yahoo.com, www.msn.com, etc.) the mapping is not unique to one IP address, which makes it difficult to measure the quality of the connection from our host to one of such sites. But we have left traceroute to deal with this and have not handled this issue ourselves. 6
Figure 5. Snapshot of application Interface The application was built in Java. This allows us to take advantage of the graphical capabilities and the easy portability to different platforms. The application calls the traceroute program for each element in the host list. The information returned is used to build the prefix tree, which is shown to the user in a graphical way. The statistics and the local weather index are calculated and presented to the user. A glimpse of how it looks is shown in Figure 5.
References
[1] K. C. Claffy, T. E. Monk and D. McRobb, Internet tomography, Nature, Web Matters, Jan 7 1999. [2] The Cooperative Association for Internet Data Analysis, http://www.caida.org/ [3] The MIDS Internet Weather Report http://www.mids.org/weather/ [4] R. Periakaruppan, CAIDA Internet measurement tool taxonomy. http://www.caida.org/tools/taxonomy [5] Van Jacobson, Traceroute source code and documentation, ftp://ftp.ee.lbl.gov/traceroute.tar.gz