Curing the amnesia: Network memory for the Internet

2 downloads 3679 Views 330KB Size Report
lated works such as web-caches, CDNs, and P2P applica- tions Section 3 ...... the recent-peer list, then it can jump-up the memory states between these two ...
Curing the amnesia: Network memory for the Internet Zhenyun Zhuang, Cheng-Lin Tsao, and Raghupathy Sivakumar Georgia Institute of Technology, Atlanta, Georgia 30332 [email protected], [email protected], [email protected]

ABSTRACT

considerable amounts of redundancy in Internet traffic content [6, 10, 17, 19, 20]. While these studies are admittedly narrow in scope in terms of the portion of the Internet they consider or the limited set of applications they consider, they nevertheless provide interesting insights into promises the Internet holds from being able to exploit redundancies. In other words, redundancies in content can and should be explicitly leveraged to reduce the actual overhead of communication. Generically, equipping the Internet with a Memory Control Protocol will enable memorization of content as it flows naturally (or by design) through the network, and more importantly using the memorized content to lower the actual cost of delivering any content to its intended destination. In this context, we propose to the design of a memory control protocol for the future Internet. Memory Control Protocol (MCP), in its simplest form, involves a memory at every network element that allows store/retrieve operations. The store operation is used when a particular data flows through the element the first time. The retrieve operation is used when the same data needs to be retrieved from that element at a later point. The primary goal of using such a protocol is to minimize the bit-hop measure for delivering any content from its server to any client. We propose the MCP as a new layer 3.5 for the future Internet, residing beneath the transport layer and above the network layer. Several questions abound with respect to the design of the MCP. A subset of the questions that we address through the work include the following:

The Internet today is for all practical purposes memoryless. The only forms of memorization the Internet performs are application-level caching used by solutions such as web-caches and content-distribution networks. Meanwhile, studies have shown the presence of considerable amounts of redundancy in Internet traffic content. Exploiting redundancies in content can potentially provide better network delivery performance and benefit end-users, network operators and contentproviders. In this context, we propose a MCP approach for the Internet. MCP, in its simplest form, involves a memory at certain network elements that allows store/retrieve operations. The store operation is used when a particular data flows through the element the first time. The retrieve operation is used when the same data needs to be retrieved from that element at a later point. We propose to realize MCP at layer 3.5 for the sake of application transparency, and present innovative realizations of several design components which constitute MCP. We evaluate the performance improvement of MCP using trace-driven evaluations.

1.

INTRODUCTION

The Internet today is for all practical purposes memoryless. The only forms of memorization the Internet performs is application-level caching used by solutions such as web-caches [7, 9], content-distribution networks (CDNs) [11, 12, 22], and peer-to-peer (P2P) applications [16, 18, 21]. We elaborate later in the proposal on why such solutions are limited in scope in what they exploit through their caching. However, the underlying fabric for the Internet consisting of the content servers1 , routers, and clients perform very little, if any, memorization and re-use of the memorized content. The focus of this work is to rethink this aspect of the Internet as we look forward to the (re-)design of the future Internet architecture. This is especially relevant today as the Internet is seeing tremendous increases in the amount of heavy traffic content such as Hi-Definition video. Meanwhile, several studies have inferred the presence of

• What is the granularity at which data is memorized in the MCP? Should it be sub-packet, packet-level, or even super-packet level granularity? • How to effectively utilize the memorized information to transmit new data? • How does the MCP work? Should it be a entirely new transport protocol? The rest of the work is organized as follows: Section 2 presents some motivating results highlighting the levels of redundancies in Internet traffic that can potentially be exploited by MCP, and the limitations of some obvious related works such as web-caches, CDNs, and P2P applications Section 3 introduces the concept of MCP and presents

9 1 We do not exclude traditional clients functioning as servers in the use of this term. 1

the framework of our design. Section 4 presents the solutions. Section 6 evaluates our design. Section 8 presents related works and Section 9 concludes the work.

2.

shots of the content available at the servers across time over a period of 30 days and study for different time intervals the degree of redundancy between the different snapshots. We use the wget tool to fetch the content available at the web servers. wget takes the depth to which content should be fetched as a parameter and automatically traverses html links up to that depth and fetches corresponding objects that are required by the html file. Figure 1(a) shows the redundancy levels between the snapshot on the tenth day and the first day. The results shown under “MCP” are obtained by performing a segment-level comparison where the segments are created using deterministic de-limiters which will be elaborated later. It can be observed that considerable levels of redundancy exist between the snapshots. Redundancy levels vary from just over 55% to over 85%. This directly represents the savings in communication that can be attained if memory of earlier communications is maintained. What is perhaps more interesting is that the redundancy levels that were actually exploited by web-caching and CDNs were substantially smaller. A closer inspection of the data revealed that a good portion of the redundancy levels were sub-object level and hence went undetected by the caching mechanisms. While we don’t show the trends across time in the figures, those trends were as expected - redundancy levels go down with time. However, even for the 30th day snapshot, redundancy levels were still non-trivial.

MOTIVATION: TRAFFIC REDUNDANCY

In this section we motivate our design of MCP. The use of MCP helps only when content stored in the memory will be referenced for “future” communications. Consequently, a necessary condition for MCP to provide benefits is redundancy in traffic content. Though an extensive study of the nature of redundancies in Internet traffic is a non-trivial task, we present some preliminary indicators of traffic redundancies that motivate the use of MCP. Moreover, we argue some obvious solutions that exist today do not necessarily exploit these redundancies. Briefly, the solutions that exist today that are relevant to MCP include web-caches, content-distribution networks (CDNs), and peer-to-peer applications. Each of these solutions rely on caching and using cached information to ostensibly improve performance. There are several key differences between MCP and these solutions and we present these differences later in this section. However, as we delve into the preliminary results and data in this section, we also discuss why these alternate solutions do not exploit or have too narrow a scope to benefit from the redundancies in content that exist. Thus, the goals of this section are two-fold: • to highlight the considerable amounts of redundancies that naturally exist in Internet traffic; • to show that a few solutions that exist today to leverage redundancies are merely at the fringes of achieving the true benefits of MCP, and hence either have too narrow a scope to harness redundancies along some of the dimensions, or simply do not exploit redundancies even when some of the dimensions fall within scope.

2.1

2.1.2

The application we use to study this dimension is video mash-ups over the web. Video mash-ups are one of the more popular applications that fall under the broader Web 2.0 umbrella, and involve users manipulating an original video and uploading the edited version of the video for other users to consume. A user might upload one edited version or more than one version. Some of the leading mash-up sites are eyespot.com, jumpcut.com. We study content uploaded by five different users from each site and for every video mash-up uploaded by those users study the redundancies between the original video and the edited video. This redundancy level represents the savings that could be attained every time the user previews an uploaded edited video, given that the user would have downloaded the original video in the first place. Figure 1(b) shows the results for the redundancy levels thus studied. The redundancy levels vary between 51% and 99% between the different users. Note that we don’t discuss webcaches and CDNs in the context of direction because such approaches do not apply when the objects are different as is the case in the scenario considered. Hence, those solutions will not be able to leverage the potential benefits.

Six dimensions of redundancy

We now briefly elaborate on the six dimensions: (i) Time: we consider redundancies in content from the content servers to the clients across time; (ii) Direction: we consider redundancies in content between upload and download directions from clients to the content servers; (iii) Content: we consider redundancies between different parts of the content at content servers; (iv) Applications: we study potential for traffic redundancies between different applications; (v) User-data: we identify redundancies between content prepared and manipulated by different users; and (vi) Space: we study potential for exploiting redundancies due to the same content being delivered to different points in space. For all the discussions, when presenting redundancy levels between two instances of content X and Y, we measure redundancy as Redundancy(X,Y ) . |Y |

2.1.1

Direction

2.1.3

Content

We use the web again as the application to study this dimension. We use the top ten websites for the analysis, but study the redundancy levels within a single snapshot to capture the redundancy between different portions of the content

Time

The application that we use to study the time dimension is WWW. We consider ten popular websites [1], gather snap2

110

Network Memory

Network Memory (Depth = 2) 100

90

90

90

80

80

80

70 60 50 40

Redundancy (%)

100

Redundancy (%)

Redundancy (%)

110

110

Network Memory Application−level Caching

100

70 60 50 40

70 60 50 40

30

30

30

20

20

20

10

10

0

My

CN N We Yo utu N ath Ytim be es er ce

spa

Im

db

Am

A Wa lma pple rt n

azo

0

Mic

ros o

ft

(a) Time dimension

10 rex joa hu go ma nnlee f lbu c rr

da nie

sez

zat

blo

me c car c00 kda tag herry ey5 kie en hl a 13 4

nd

(b) Direction dimension

0

My

CN N We Yo utu N ath Ytim be es er ce

spa

Im

db

Am

A Wa lma pple rt n

azo

Mic

ros

oft

(c) Content dimension

Figure 1: Redundancy at dimensions of Time, Content and Direction Site ID Content Applications 1 Mandriva Linux HTTP, FTP, P2P 2 Arch Linux HTTP, FTP, P2P 3 GoblinX HTTP, FTP, P2P 4 FreeBSD HTTP, FTP, P2P 5 Fedora HTTP, P2P 6 IP Database HTTP, FTP (a) Application dimension

File ID Size (MB) Versions 1 841 10 2 944 3 3 720 5 4 1,070 9 5 912 6 6 737 12 (b) User dimension

Video ID Size (MB) View Times 1 184 908,776 2 96 489, 567 3 15 183,108 4 83 172,606 5 130 153,394 6 173 117,404 (c) Space dimension (Ustream.tv)

Table 1: Redundancy at dimensions of Application, User, and Space within the same server. We monitor this redundancy for different snapshots of the sites, and present the average results in Figure 1(c). The results show that (surprisingly) there is a good amount of redundancy even across portions of content within the same website. The redundancy levels vary from 40% to over 90%. While we show the results only for a fixed depth of 2 for all the websites, redundancy trends with increasing depths were increasing. The redundancy levels shown are indicative of MCP benefits that can be achieved even for a single user accessing different portions of a website. Note that we do not discuss CDNs or web-caches in this context because such solutions will provide no benefit because of the intra-object nature of the redundancy. Thus far, we have discussed redundancy trends and hence performance benefits achievable when using MCP. We have also discussed why CDNs and web-caches do not necessarily leverage the redundancies available. In the following we consider redundancy at three more dimensions that are completely out of scope as far as web-caches and CDNs are concerned, but will still be harness-able by MCP.

2.1.4

We consider Linux and other OS download sites from where users can fetch different distributions. Table 1(a) shows the different sites. Interestingly, to make the content more accessible to users, the distributions are all offered for downloads using multiple applications, as seen in Table 1(a). However, traditional caching mechanisms including P2P applications, CDNs, and web-caching cannot leverage redundancy across different applications because of their restricted scope. As we discuss in the next section, MCP when realized as a platform solution will be application agnostic and can leverage redundancies even across applications.

2.1.5

User-data

The application we consider for this dimension is the popular peer-to-peer application BitTorrent. In BitTorrent, content is distributed in the form of bundles. A bundle will typically have the main content as a file, and a signature created by the user who created the bundle as a distinct file. However, the identity of content within the BitTorrent application is a function of the entire bundle and not the component parts. Hence, if two users have different bundles with the exact same data content, but different signatures cannot both assist in the delivery of the content to another user since their bundle identities are different. What this means is that even if a user changes the name of the content file, the new bundle will no longer be considered to have similar content. This is a reflection again of the application/user level information that P2P applications rely on to leverage redundancies. In order to study how prevalent such scenarios are we perform a search for bundles on BitTorrent for six popular movies (Casino Royale, Transformers, Meet the Spartans, Rambo, Untraceable, and Snow Buddies) and study the number of

Applications

Application level caching solutions (web-caches, P2P applications, CDNs fall within this broad category) by definition work within the scope of individual applications, and most often operate at the granularity of application level objects (e.g. a bundle in P2P, web objects in CDNs and webcaches). Hence, if there is a basis for expecting redundancy in traffic generated by different applications, a more fundamental strategy for leveraging redundancy becomes justifiable. We now identify one such scenario in which the exact same data can be served through a variety of applications. 3

“distinct” versions we are able to access. Table 1(b) shows the number of such “nearly identical” versions that are considered to be “distinct” by BitTorrent. Once again, MCP being a platform solution should be agnostic to application level semantics and be able to exploit redundancies even across such data.

2.1.6

Perhaps related to the aforementioned application-level caching property of the solutions, another property of webcaches, CDNs, compression, delta-encoding, CDNs and P2P is that they cache at the granularity of application-objects. For example, web-caches and CDNs cache http objects, while P2P applications such as BitTorrent cache at the granularity of bundles. While this design makes sense given that they are implemented within the design context of specific applications, it is not the best strategy to leverage all possible redundancies that exist within a network. As an example, Figure 1(a) show the differences in redundancy that is exploitable when using a lower-level memory unit than when using http-object-level memory unit for web traffic.

Space

We consider real-time live streaming video as the application of interest for this dimension. Specifically, we consider content from ustream.tv [2] that is being streamed live by different users and investigate the redundancy across space. In other words, we collect statistics on the number of users who consume a single stream. Note that a key difference between progressive download used by sites such as YouTube and streaming is that in the latter the information being sent to individual users can be “tailored” to the performance characteristics of those users, whereas in the former it is the same content that is being downloaded reliably to all the users. The “tailoring” can range from simple prioritized packet drops to more sophisticated multi-level transcoding of the content. The implication of the adaptive delivery of content is that traditional caching schemes such as CDNs, web-caches and P2P techniques can no longer be employed effectively. Table 1(c) shows the number of streams and stream size for different user-streams generated on ustream.tv. Even though the streams are being delivered to a large number of users, true streaming cannot leverage the spatial redundancy because of the reasons explained above.

2.2

2.2.3

Existing approaches such as compression and delta-encoding can exploit redundancy in some dimensions, but they often fail to exploit the redundancy in other dimensions. For instance, typical compression algorithms fail to address dimensions other than the content dimension. Specifically, they do not utilize the information in temporal dimension. Similarly, delta-encoding techniques can partly utilize the information in temporal dimension, however they still are unable to address the redundancy in dimensions such as userdata, applications and space. Web caching typically only works in the direction from the web server to web clients, but it does not help the opposite direction of from clients to servers. In summary, current solutions have limitations on leveraging redundancies in network traffic, application to a wider range of applications, and coupling of application intelligence location with the location of the data in exploiting redundancy. Thus far, we have seen multiple dimensions along which redundancies in traffic content exist, and that the levels of redundancies are substantial for the limited data set presented, ranging from 40% to 100% depending upon the dimension. This clearly motivates a need to have mechanisms that can leverage redundancy for enhancing communication performance. However, we further argue why schemes such as web-caching, CDNs, and P2P applications, while in principle try to leverage redundancy either do not harness the redundancy available or are too narrow in scope to even attempt leveraging redundancies along the dimensions. We thus argue for a fundamental rethinking of the solution strategy to leverage redundancies and propose the protocol of a MCP.

Limitations of current approaches

We now comment on the key limitations of related current solutions in Internet including web-caching, CDNs, P2P, compression and delta-encoding.

2.2.1

Application-layer caching

Web-caches, compression, delta-encoding, CDNs and P2P all fall under the category of application layer caching. The caching is essentially realized within the implementation of the application. There are three key limitations to such solutions. First, the solution is not broad-based to leverage the multiple dimensions of redundancy identified in Section 2 since not all of them fall within the scope of applicationlevel caching. Second, because they are application level approaches, they tend to have only the content server and the client (which actually run the application) as constituents in the caching solution. In other words, entities that don’t run instances of the application, be it routers or other peers, cannot participate in the caching. Third, the solutions will have to be realized independently for every single application that can benefit from exploiting redundancies, and our preliminary results show that redundancies are broad-based thus requiring a broad-based solution.

2.2.2

Failure to exploit all dimensions of redundancy

3. 3.1

CONCEPT OF MCP Concept

Memory Control Protocol (MCP) aims to improve network performance by reducing the amount of bits transmitted over the network. MCP achieve this by eliminating the redundancy exposed in various dimensions inside network

Application-object granularity 4

S

H

OP store

MT T1

Memory (D1,D2,) Data D1

Sender

(a) Time T1

C

S

Receiver

Sender

OP MT H retrieve T1

MCP Connection Managment

Memory (D1,D2,)

C

Knowledge learning

MCP Memory and State

Receiver

(b) Time T2

Data

Figure 2: Simple illustration of MCP traffic. MCP helps memorize information relevant to traffic redundancy on communication hosts. The operations of MCP in its simplest form consist of the following components: Storage at network elements that allows the elements to remember/store data that passes through them or are destined to them, and later recollect/retrieve the data; Intelligence engine that drives the use of MCP appropriately for performance enhancement and realizing new functionalities. Figures 2 illustrates the simplest fashion in which the MCP can be leveraged. In Figure 2(a), the sender S has to deliver certain data D1 to the client C at time Time1 . That data are memorized by S and C. Later at time Time2 , S sends another information which contains certain data piece present in the memory, S can retrieve the data piece from the memory without actually sending it from S. For the data pieces that are not available in memory, S sends them directly. Also, whenever the actual data is present in a datagram the store directive is turned ON for any network element that processes the datagram to remember the data contained within. The simplest form of the protocol header for the MCP layer would consist of an operation field (store/retrieve), a data field to notify the receiver what data piece to store if the operation field is set to store, and a memory tag that identifies the data piece if the operator field is set to retrieve. Thus, as shown in Figure 2(a) in the above illustration, the datagrams from S to the memory elements will have the format (H, OP=store, MT = T1 , Data), where H represents other datagram headers, T is the memory tag, and Data is the raw data. In Figure 2(b), the datagrams will have the format (H, OP=retrieve, MT = T1 ).

Redundancy Identification

Redundancy Reference

Figure 3: Framework of MCP support MCP can reduce their operating costs. (iii) Contentproviders: Content-providers, interestingly, will benefit both from being able to offer better performance to their consumers (leading to high user retention) and from lower costs on the access links to their service providers (most access links are again charged on a usage basis).

3.3

Framework and Focus

Though MCP can be realized at more than one layer in the network protocol stack, in this work we assume a design residing at layer 3.5 between the transport and network layers, and use a MCP header for one network element to use the store/retrieve capabilities of another. We justify such treatment from an application transparency standpoint. Realization at a layer below the transport layer can be transparent to applications and hence is less intrusive. The design framework of MCP is shown in Figure 3. As shown, operations of MCP requires five components to interact with each other. (i) MCP is bounded with layer-4 connection and has a connection management component to manage the connection and disconnection. (ii) MCP maintains certain memory and states. (iii) MCP contains a Knowledge learning component for the learning of the memory states on both communication sides. (iv) MCP needs a redundancy identification component recognize redundancy embedded in the data traffic. (v) MCP also needs a redundancy reference component to allow the retrieval of the redundant data on the remote side. Thus, the high-level operations of MCP 3.2 Benefits work as follows. MCP works on both sides of a communiThe overall benefits of using MCP are better network decation pair. It monitors the layer-4 connections and creates livery performance and higher network utilization levels, through MCP connections when layer-4 connections are observed. the exploitation of redundancies that naturally exist in InIt then synchronizes the memory and state information with ternet traffic. One of the primary objectives in using MCP the other side. After that, MCP will monitor the data bewould be to reduce the bit-hops taken to deliver any given ing sent out by layer-4 and identify the traffic redundancy. content from the content-provider to a client. Specifically, These identified redundancy, as well as the raw data, is enit benefits the following three types of users: (i) End-users: coded appropriately by referencing the memory entries on The sole benefit for the end-users in using MCP is the imthe other side. The encoded information is sent to the other proved performance in terms of throughput and responseside and decoded correspondingly. Meanwhile, the memory time. The reduced bit-hops will reduce the impact of botinformation is also updated so that later communication can tlenecks and hence improve performance. (ii) Network opbe optimized by utilizing appropriate memory information. Though all the components identified earlier are indeed erators: While public peering links in the Internet have free required for MCP to work, in this work we specifically protransit arrangements, most other peering links are subjected vide insights into the design of three of the components: Reto a usage based tariff. In reducing the amount of realdundancy identification, Redundancy reference, and Memdata that traverses these peering links, network operators that 5

ory updating. We will present the detailed design for these components in the next section. Briefly, (i) the redundancy identification component will be realized by a Adaptive Granularity (AG) design principle. AG can identify appropriate granularity of redundancy when heterogenous data are present. (ii) the component o Redundancy reference is realized by a Operator-Based Communication (OBC) design principle. OBC relies on operators to manipulate the data and memory entries so that the original data can be transmitted and recovered in a bit-effective fashion. (iii) The Knowledge learning component is realized intelligently with a TellMe-What-You-Have (TMWYH) design principle. TMWYH extends the default utilize-based model by backhaul knowledge dissemination so that more redundancy can be identified. The primary reasons for selecting only these three components for detailed presentations are that we believe our design will provide non-trivial contribution and other design components have comparatively straightforward realizations. But as necessary parts of any realization, the other two components are also presented in Section 5.

(a)

0 7 2 8 4 8 5 7 3 2 9 7 3 8 1 4 9 0 4 2

(b)

T2 T3 T4 T1 0 7 2 8 4 8 5 7 3 0 1 7 3 8 1 4 7 8 4 2

(c)

T2 D5 D6 T1 0 7 2 8 4 8 5 7 3 0 1 7 3 8 1 4 7 8 4 2

(d)

D7 T5 0 7 2 8 4 8 5 7 3 0 1 7 3 8 1 4 7 8 4 2 T6

D8

T7

D5

T8

Figure 4: Example of AG

The tuning of the delimiter to exploit the trade-off between the number of blocks (and hence the reference/addressing overhead) and the exploitation of redundancy (hit-rates) is non-trivial. For example, coarser delimiters result in larger blocks and smaller addressing overhead, but they might fail to identify finer-granularity redundancies. Similarly, finer delimiters can identify finer-granularity redundancies but incur higher addressing overhead. Existing value-based approaches simply use fixed delimiter probabilities and the block sizes are approximately the 4. CRITICAL COMPONENTS OF MCP same. Such fixed delimiters fail to achieve optimal perforWe now present the three critical design principles of Memmances due the following challenges. (i) Heterogeneity of ory Control Protocol (MCP): AG, OBC and TMWYH. Though applications. Applications expose heterogeneous patterns all the three design principles are run on both sides of a comof redundancy. For example, text-based web browsing exmunication, the functionalities that are performed on each poses very fine granularity of redundancies, while P2P appliside are different. For each design principle, one side of the cations may have coarse granularity of redundancies. (ii) Apcommunication performs the major set of functions of the plication evolvement. The properties of Internet data condesign principle, while the other side only performs a minor tinue changing with the evolvement of applications. On one set of functions that complements the major set. Specifihand, data properties of particular applications keep evolvcally, AG and OBC mainly operate on the data sender side, ing. For instance, though initially only is used to transmit whereas TMWYH mainly operates on the data receiver side. plain text messages, SMTP today is increasingly being used We refer to the major set of functions as principle-Body and to carry large attachments such as pictures and video pieces. the complementary set of functions as principle-Shadow. On the other hand, new applications that have different data properties always emerge. One of the relatively new applica4.1 Adaptive Granularity (AG) tions is Video Mashup, which exhibits different redundancy Determining the ideal granularity of data that should be granularities from other existing applications. (iii) Supermemorized by MCP to provide the best possible effectivepackaging. In addition to the above two challenges, there ness is an important issue. As shown in Section 2, addressing is another issue of super-packaging. In order to effectively this issue can have a significant impact on the performance handle different granularities of redundancy the delimitation achievable using the protocol. mechanism needs to be able to accommodate redundancies To effectively identify redundancies, appropriate units of not just in sub-packet granularities, but also in super-packet data pieces need to be used by the protocol. We denote granularities spanning multiple packets. For example, if the such units of data as blocks, which are essentially sequences same 10MB file is downloaded twice using the FTP applica2 of bytes. Blocks are extracted by a delimitation protion, the ideal solution should tag the entire 10MB file as a cess using delimiters, which decide the block boundaries. block with a single memory tag. Though it is straightforward to use fixed-size blocks and We present a simple example scenario to illustrate the nethe boundaries of the blocks are location-based, it has been cessity of adaptive granularity. As shown in Figure 4(a), a shown that value-based boundaries have many advantages byte-sequence of 20 bytes is first sent from a sender A to over location-based ones. With value-based boundaries, whether receiver B 3 . Later, a new packet which is similar to the prea byte is a boundary or not depends on the values of itself and vious packet is sent again as shown in Figure 4(b). Delimnearby bytes. 9 2 A brute force approach is to treat every possible byte sequence as a block, but the processing/storage overhead prevents this approach from being practically usable.

9 3 Though we use bytes as the base units in the illustrative example, in practice, the base units can be much larger than bytes. 6

itation using fixed delimiter probability generates totally 4 blocks. Two of the blocks are not changed, while the other two are changed. For this particular example, a delimitation algorithm using fixed granularity is not effective in two ways. Firstly, the two consecutive blocks are treated as two separate redundant blocks and require two tags to represent them. If, however, a coarser granularity is used, these two separate blocks can be viewed as one single redundancy and represented using only one tag, as shown in Figure 4(c). Secondly, with fixed granularity, the two changed blocks are not identified as redundant data and thus are sent as is. However, finer redundancies occur at sub-block level. A finer granularity may be able to characterize such redundancies and perform more effectively, as shown in Figure 4(d). Motivated by such necessities, we propose a design of Adaptive Granularity, which addresses the challenges described above by both adapting delimiter probability and applying super-packaging. Briefly, the delimiter probability pd is not fixed, but adaptive according to the traffic properties. AG also intelligently enables the capability of performing super-packaging.

4.1.1

pd value is optimal in terms of the amount of identified redundancies. Specifically, the mechanism keeps track of the statistics of three delimiter probability values, pd , pd /2, and 2pd . Whenever data come, the mechanism not only perform delimitation based on pd , but also based on the other two values. Then it obtains the effectiveness values when using these three probability values. The effectiveness can be in the form of the percentage of saved bits when compared with the original data size. If the largest effectiveness is significantly more (e.g., 10%) than the value corresponding to the current pd , then pd changes to the optimal delimiter probability value. The probing mechanism is performed periodically and can be triggered by either certain amount of data Dt . In addition, there is a design issue of the initial value of pd . Due to the heterogeneity of applications, it is hard to choose an universally optimal value for all applications. Instead, AG determines the first value of pd based on the statistics of the first certain amount (e.g., 10 packets) of data. It selects the best pd value from a set of pre-defined delimiter values. This pre-defined value set can be the historical optimal values for all connections. Super-packaging The third mechanism of AG is Superpackaging, which is needed if the redundancy spans across packets. Given a packet, if the last consecutive blocks are identified as redundant, then it is possible that that redundancy may span into the succeeding packet. AG enables super-packaging capability by explicitly waiting for succeeding packets. Specifically, if the second half of the current packet belongs to an identified redundancy unit, then AG waits Tw for a possible redundancy extension. If no packet comes after Tw , then the first packet will be processed without further waiting. If within Tw , a second packet comes, then the redundancy is expanded into the second packet in a best-effort fashion. If the entire second packet falls into the expanded redundancy, then AG waits for the third packet by setting a timer whose value is half of the previous timer. This process continues until any timer expires, which implies that no succeeding packet comes promptly. The value of Tw requires careful considerations. Introducing the timer practically inflates the RTT values experienced by the above applications. Thus, Tw should not be set too large. On the other hand, if Tw is too small, then it might fail to wait for the succeeding packet. In practice, we assign 5% of current RTT value to Tw , hence the maximum waiting time for any packet before transmission is bounded to 10% of RTT 4 . Memory operations AG also performs several memoryrelated operations. Firstly, AG enqueues the corresponding blocks into the memory. Note that all the blocks extracted using three delimiter probabilities (i.e., pd , 2pd , and pd /2) are enqueued. This is performed right after the data are delimited. Secondly, AG also performs block relationship measurement on both super-blocks and sub-blocks. These relationship information will be used by other components, OBC and TMWYH. The details of these operations will be

Operations

To help understand the operations of AG, we begin by describing the delimitation process of using a fixed delimiter probability pd . Delimitation with current delimiter probability pd The value-based delimitation process uses Rabin delimiters. Delimiters occur with a probability pd . Specifically, given a data packet of N bytes, for byte Bi , we first compute its Rabin value Ri using Ri = Bi K r + Bi+1 K r−1 + . . . + Bi+k (Mod M), where r is a small integer, M a modulus, and K a prime (e.g. 11). Then comparing Ri to pd M, if Ri < pd M, byte Bi is a delimiter, otherwise it is not. An example of delimitation of using different delimiter probabilities is shown in Appendix 9. The value of pd controls the average block length. Blocks then are extracted after the delimiters are identified. Specifically, if Bi is a delimiter, then it serves as the starting byte of a new block. The block contains all the bytes from Bi to the byte right before the next delimiter. The first and last blocks of a data packet contain the bytes left by other blocks. Thus, with d delimiters, the number of blocks is d + 1. Consequently, with a delimiter probability pd and a packet of size m, the number of blocks, denoted by s, is approximately mpd + 1. Delimiter probability updating The necessities for an adaptive granularity design are accommodated by MCP in three mechanisms. Firstly, MCP is designed to be connectionbased and each connection is associated with a separate delimiter probability. The details of the connection management will be described in Section 5. Secondly, within a MCP connection, the delimiter probability is adaptable in response to the significant changes in granularity pattern. AG identifies such significant changes by using an adaptation probing mechanism. The mechanism tests whether the current

9 4 10% > (1 + 1/2 + 1/4 + · · · ) × 5%). 7

elaborated when the corresponding component is presented. So far, we have presented the main operations on the data sender host. All these operations comprise the AG-Body component. On the data receiver host, a subset of these operations (referred to as AG-Shadow are performed. Specifically, the enqueueing memory operations are performed. These operations are necessary for the data receiver to maintain the same memories as the sender. AG can efficiently and effectively identify redundancies. By efficiency, AG’s processing overhead and storage overhead is limited. Quantitatively, for a data stream of n bytes, AG requires npd tag computations, while a brute force algorithm requires n(n + 1)/2 tag computations. By effectiveness, AG can identify as much redundancy as possible. By adapting the granularity, it performs better than an approach with fixed delimiter probability, and its upper bound of effectiveness is the same as the brute force approach.

4.2

Operator-based Communications (OBC)

The data on the sender host need to be transmitted to the receiver. On the sender side, the data need to be represented in a bit-efficient way so that the bit-hop metric and in turn application performance can be improved. On the receiver side, the same raw data need to be recovered. Such an information encoding and decoding process utilizes both the sender and receiver’s memory states. The design of the above encoding and decoding operations needs to address the question of how to represent/recover the original data on the receiver side in an byte-efficient way? With a simple memory elimination model, after the redundant blocks are identified, they will be represented by corresponding tags and transmitted to the receiver. The receiver will perform operations to replace these tags with raw data. Though such a simplified process is straightforward to understand and use, they only represent a very simplified approach and results in limited effectiveness. As we delve deeper into the problem, we believe that the process can be much optimized by using sophisticated operators to further improve the bit-saving performance. To this end, we propose an OBC approach which allows for more sophisticated remote operations to be performed on data, so that the sender only needs to send less bits for transmission. Our proposed OBC consists of 6 operators, namely, Store, Retrieve, Value, Concat, Extract, and Replace operators. Briefly, (i) Store operator tells the receiver to record the block in the memory; (ii) Retrieve operator extracts the raw data from the memory; (iii) Value operator is used to transmit raw data; (iv) Concat operator is used to merge consecutive blocks; (v) Extract operator is used to eliminate the partial redundancy from a block. If only part of a block is changed, then Extract operator can tell the receiver to extract the unchanged parts from a known block. (vi) Replace operator is used to replace part of the data in a block by certain new data. The usage of store/retrieve/value/concat operators can be illustrated with Figure 4. In Figure 4(b), the first two blocks 8

are represented with two tags, and the next two blocks are sent as raw data since they are not identified as redundant blocks. Thus, the encoding process will result in a command {Retrieve(T1,T2),Store(Value(D5)), Store(Value(D6))}. Similarly, Figure 4(c) can be represented as {Retrieve(T5), Store(Value(D7))}. However, using the Extract operator, Figure 4(c) can also be represented as {Retrieve(T5), Concat(Value(D8), Extract(T3,2,3), Extract(T4,0,2), Value(D9), Extract(T4,4,2))}. The powerfulness of Extract and Replace operators can be seen in the following example. Assume a byte-sequence that has m blocks, out of which n blocks are changed to new blocks. Without using Extract and Replace, OBC totally needs m−n Retrieve and n Value operators to represent this byte-sequence. With Extract operator, OBC may only need m − n Retrieve and n Extract operators. Since Extract operator (and the parameters) is more likely to be smaller than the Value operator (and its data), Extract can improve on the Value operator. If however, using Replace operator, OBC might only needs totally 1 Retrieve operator (representing the entire byte-sequence) and n Replace operator. The Replace operator has a similar size as a Extract operator, thus, Replace operator can further improve on the Extract operator. Note that Extract operator brings benefit by replacing Value operator since Extract transmits a super-block tag and some offset information rather than sending the raw data block. Also, Replace operator brings benefit only when more than one consecutive Extract operators use the same super-block, because the saved bits come from the reduced number of tags.

4.2.1

Operations

We now present the operations when using OBC. OBC takes the input of delimited data, and outputs the operators that can be used by the receiver to recover the data. The operations are performed in four steps. The first step involves the usage of retrieve/value, whereas the second and third steps optimize the output of the previous step by using a new operator (i.e. Extract and Replace). Finally, the Concat and store operators are used to form the appropriate granularity of blocks and notify the receiver to memorize the blocks. Retrieve, value After the delimitation, each block is represented by a tag, and the tag is looked up in the memory. If it is found, then a retrieve operator is formed for that particular block. If the tag is not found in memory, then a value operator is formed for that particular block. Extract, Replace Extract optimizes the output of the previous step in two scenarios. Firstly, Extract can potentially optimize any single Value operator. Extract takes a Value operator (and the corresponding data) as an input, it then looks for another block in memory that contains the particular data. If such a block is found, OBC changes the Value operator to Extract operator, and put the tag of identified block and certain offset information as the parameters of the operator. Secondly, Extract operator can potentially optimize multiple consecutive operators (Retrieve/Value/Extract). If the blocks that corresponds to any two consecutive opera-

(a) Adaptive Granularity Variables: pd : Current delimiter probability; (Needs more work.......) 1 Monitor packets received 2 Delimitation using three delimiter probabilities, obtain tags 3 Lookup memory for the tags 4 Decide to perform super-packaging or not 95 Computer utilities of three delimiter probabilities 6 Check for possible delimiter adjustment 7 Other operations Memory operations 1 Enqueue, replacement 2 Data Correlation operations 3 Sub-block to super-block pointers

(b) Operator-based Communication (Needs more work.......) 1 Retrieve, Value 2 Extract, Replace 3 Concat, Store 4 Sent out (c) Tell Me What You Have (Needs more work.......) Variables: B: Amount of knowledge for next period; 1 Obtain the initial value of B 2 Choose knowledge of size B 3 Transmit 4 Upon receiving new data 5 Update B

9

Figure 5: Pseudo-code : (a) AG, (b) OBC and (c) TMWYH tors are the same and the extracted data are consecutive inside the particular block, these operators can be merged into one single extract operator. Replace works by identifying a super-block that contains multiple blocks of current data stream, but also has multiple places of changes. For consecutive Extract operators used in the previous step, Replace checks to see if they both use the same super-block. If so, the consecutive Extract operators are changed to a new Replace operator and corresponding operations are performed. Both Extract and Replace operators work by extracting certain byte sequences from a block, and is only used when a super-block contains the current block. Hence it requires the capability of locating a super-block that contains the current block. We propose to use pointers to opportunistically point to the most relevant blocks. Specifically, for each block, the most recent Mr super-blocks that contain the block are lined to the block. When later this particular block results a cache miss, these linked super-blocks will be located to perform new cache lookup. Concat, store If a blocks is intended for memorization on the receiver side, then a store operator is formed. After receiving the Store operator, the receiver simply enqueues the block using the encoded tag. Given multiple consecutive blocks that are redundant ones, concat combine them and form a larger redundancy unit. Thus, when receiving concat operator from the sender, the receiver recovers the concat-ed data and enqueues the block.

4.3

raw data, because C is unaware of B’s memories, as shown in Figure 6(a). An obvious approach to improve the MCP performance is to have knowledge pre-population strategies that will help achieve better bit-hop performance for content delivery. Consider the same example described above and assume Host C is sending data to Host B. If B disseminates the knowledge of its memory to C beforehand, then Host C can better encode the data. Specifically, C learns that B already has D1 and D2, then it can retrieve these data pieces by transmitting the corresponding tags T 1 and T 2, as shown in Figure 6(b). A naive knowledge dissemination approach would be simply transmitting the memory (i.e., the cached data entries) of the receiver to the sender. However, such an approach will hardly bring benefit since the size of the transmitted memory will be larger than the size of saved bits when using these memory. Instead, a slightly better approach is to only transmit tags of the memory entries. Transmitting only the tags will achieve the same benefit as that of transmitting the memory since the data sender will use a tag only when the corresponding data is available. One important property of any pre-population approach is the non-intrusiveness because aggressively transmitting information on the reverse path might turn out to hurt the actual data transmission throughput. In this context, we consider strategies whereby unused Internet resources can be employed for pre-population of memory. The unused resources can be in the form of unused bandwidth or other protocol inefficiencies. For example, the upstream bandwidth for most broadband users remain under-utilized relative to their downstream bandwidth and hence can be used for propagating information about the memory contents of those users. Similarly, in uni-directional TCP connections, the ACKs on the reverse path are sent as pure ACKs. Pure ACKs tend to under-utilize bandwidth as the router capacities along the path are typically limited by packet processing rates (as opposed to bit transmission rates). This inefficiency

Tell Me What You Have (TMWYH)

The default memory elimination model can only utilize memory information built up during past communications for the particular two communicating hosts. Figure 6 illustrates an example. In the example, Host A sent to B certain data (e.g., D1,D2 before) which are memorized. Now assuming Host C is sending to B certain data that include D1 and D2. Even these data pieces are in B’s memory, by default, C cannot retrieve these data pieces and has to send the 9

can be leveraged by appropriately piggybacking information about the memory contents of the client. To this end, we propose a Tell Me What You Have (TMWYH) principle for improving throughput performance by utilizing under-utilized resources along the communication path to disseminate memory information. TMWYH intelligently disseminate memory information from the data receiver to the data sender, so that data sender can better encode the transmitted data by using these memory information. For clarity, from now on we refer to such memory information as “knowledge”. TMWHY contains two critical design components that help perform better than simple approaches. Firstly, it prioritizes the dissemination of more relevant knowledge over the dissemination of less relevant knowledge. Secondly, it disseminates the knowledge in a intelligent way so that the achieved data downloading throughput is maximized.

MCP Layer

5.1.1

IP Header Src port

Dst port

(0X0000)

(0X0000)

(0X0000)

(0X0000)

Other TCP header info. MCP

D1

5.1.2

...

B

D2

...

D1

...

C

Memory (D1,D2, ) D2

(b) TMWYH

...

D1

Host ID

TCP Dst port

...

Version

Type

Seq.

Data or knowledge TCP Src port

TCP Dst port

(b) MCP Data and Knowledge Packet

B

5.2 T2

...

T1

...

Data structure

MCP maintains both protocol-common states that is shared among all MCP connections and per-MCP states that is only relevant to a particular MCP connection. Specifically, there are mainly three types of protocol-common states: (i) Memory states that consist of a list of memory entries. Each memory entry is comprised of raw data which is a block, a tag which is used represent the block, a set of correlation pointers which point to correlated blocks, a set of super-block pointers which point to other entries that contain this particular block, the time recently used, the frequency of usage, and a list of peers that have this particular block. (ii) A list of recent contacted peers. Each peer contains the unique peer-ID, the most recent communication time. (iii) Other states including a unique host id, the data amount threshold Dt . The per-MCP states include the delimiter probability pd , and B. The underlying reason for each MCP connection is that different connections may by carrying different types applications and hence the redundancy granularities might differ. In addition, each side of a connection also maintains a separate pd because the two directions of a particular application might expose different granularities in traffic redundancy. The arguments for maintaining separate B values on each host and for each connection are similar.

(a) Default

A

Step

Other TCP header info. MCP

Figure 8: MCP Packet format

Memory (D1,D2, ) ...

IP Header

Dst port

Software structure

D2

Knowledge

Src port

(a) MCP Connection Setup Packet

The software architecture of MCP is shown in Figure 7. MCP works between layer-4 (i.e., TCP/UDP) and layer 3 (i.e, IP) in the protocol stack, hence we refer to it as layer-3.5 protocol. MCP is bounded up to layer-4 connections and is designed to be a stateful protocol. By working on layer 3.5, MCP understands the semantics of TCP/UDP. These semantics information including TCP connection setup and disconnection are used by MCP for the connection maintenance. The operations performed on the data sender and receivers are different in the sense that the sender encodes the data and the receive decodes the data. We show these two types of functionalities in Figure 7 with a MCP-body block and a MCP-shadow block. Specifically, MCP-body block encodes the incoming data from layer-4 and sends them to the remote host, whose MCP-shadow block will decode the data. Similarly, the remote host’s MCP-body block encodes its outgoing data, and the encoded data will be recovered by the MCP-shadow block on the local host.

A

TMWYH

Figure 7: Software Architecture

TCP Src port

Overview

Me mo ry

State

IP

Memory digest

DESIGN OF MCP

5.1

AG OBC Encoded Data

Version

5.

TCP / UDP

Data

C

Header format

MCP has two types of packets, and the header formats of them are shown in Figure 8. The first type of packets are relevant to MCP connection maintenence, as shown in Figure 8(a). These packets are sent to notify the other host some important information including MCP version (1 byte), step

Knowledge about D1, D2

Figure 6: TMWYH example 10

CLOSED

(1 byte), host ID (6 bytes) and memory digest. The second type of packets are MCP data/knowledge packets, the header formats of which are shown in Figure 8(b). MCP can piggyback on original layer-4 protocols, and the MCP messages can be either sent as separate MCP packets above IP or piggybacked on layer-4 packets. Specifically, MCP layer recognizes MCP messages by checking the layer4 src/dst ports. If piggybacking is needed, MCP sender modifies the src/dst ports to 0X0000, but stores the original port information in its payload. When the receiver receives the packet, it first recognizes the MCP packet and performs corresponding operations, then it restores the original layer-4 packets using the stored information. Note that except the cases of super-packaging, MCP does not perform packets partition and re-assembly for simplicity. In other words, given a layer-4 packet, if the tentatively transferred MCP packet is larger than or equal to the original packet, MCP will not transfer the packet, but keep the original packet unchanged. On the other side, MCP layer can recognize this by checking the TCP src/dst ports. The superpackaging cases require the receiver to restore the original data.

5.3

Layer-4 Connection Opened

ESTABLISH-WAIT AG/OBC/ TMWYH

Close

State Exchange, Jump-start

ESTABLASHED Layer-4 Connection Closed

CLOSE-WAIT

Close

Figure 9: MCP state machine

MCP Operations

The state transition and the high-level operations of MCP are shown in Figure 9. Briefly, MCP has four states: Closed, Establish-wait, Established, and Close-wait. The transitions are as follows. MCP monitors the layer-4 connection states and transit between states correspondingly. Specifically, when the layer-4 protocol is TCP, MCP monitors the TCP state transition and enter each state accordingly. For UDP connections, MCP checks the UDP 4-tuples, and infer the beginning of a new connection by monitoring the header information. Whenever a new connection is found, MCP will set up a new connection on layer-3.5, and transit from “Closed” state to “Establish-wait” state. It then may enter “Established” state if the layer-4 connection is successful, or it may go back to “Closed” state otherwise. Before moving from “Establishwait” to “Established” stat, MCP perform certain information exchange steps, one of which is referred to as Memory jump-start and will be described later. In “Established” state, MCP performs the normal data and knowledge-related operations. MCP also monitors layer-4 states and transits “Closewait” state if necessary. Specifically, if TCP FIN messages are detected or disconnection timer expires, it moves to “Closewait” state. The high-level operations include: (i) Memory jump-start; (ii) Connection establishment and disconnection; (iii) Datarelated operations (AG, OBC) and (iv) Knowledge-related operations (TMWYH). Memory Jump-start MCP enables a jump-start memory synchronization process to boost up the performance by using the previously accumulated memory states. This is achieved by recording a list of recent peers. The use of recent-peer list is to limit the storage overhead of communicated peers 11

in the past since a host may not be able to record all the information of all the past peers. If the new remote host is on the recent-peer list, then it can jump-up the memory states between these two hosts. Specifically, the two peers will exchange the memory states information by sending each other a digest of its memory states maintained for each other. If the received digest matches the its locally maintained memory state, then these two states are synchronized so that later communication can use these states to better encode the information. If the two digests do not match, then they will not use the memory states to avoid adverse effect of cache-miss. To ensure the synchronization of two communicating hosts, MCP employs an explicit acknowledgement mechanism in the design. Briefly, the data sender performs memory updating operations only after confirming the other host has received the data packet (and hence finished the corresponding memory updating operations). Similarly, the knowledge sender updates its memory state corresponding to the data sender only after the latter has done that. The confirmation method is by updating a special “sequence number” field in the MCP header of data and knowledge packets. After receiving a piece of data or knowledge, a host will acks back the received sequence number. To avoid increasing transmission-associated overhead, the ack will not be sent as separate packet, instead, they are piggybacked on other regular packets. Connection setup and disconnection After leaving “Establishwait” state towards “Established” state, MCP begins the connection setup operations. The MCP connection setup process can be piggybacked on layer-4 connection setup packets. Figure 8 illustrates the piggybacking of TCP packets. MCP modifies the TCP src/dst ports to all zero, and records the original TCP ports information in its payload. When the other host receives this packet, it can extract the MCP information and restore the original TCP packet. Thus, the abovelayer TCP connection is not affected. MCP connection setup consists of four steps. In Step 1, one host notifies the other some important information including MCP version, its host ID, and other parameters. In Step 2, the other host acks back with its host ID and the memory digest for this particular pair if the first host is on its recently communicated peer list.

In Step 3, the first host sends back its memory digest for this particular pair if the second host is on its recent peer list. If these two memory digests matches, then in Step 4, the second host acks back the memory matching, and the MCP connection is set up. The disconnections can occur either by monitoring corresponding TCP disconnection messages or timer timing out. Data-related operations Since layer-4 connections allow bi-directional data exchanges, MCP data-related operations are perform on both sides of the communications. On each side, MCP-Body first captures the incoming data, and then apply AG-Body operations. After that, OBC-Body will encode the data and sent to the MCP receiver. On the other side, the corresponding OBC-Shadow and AG-Shadow operations will be performed to recover the data and update MCP states. Knowledge-related operations Similarly, the knowledgerelated operations are also performed on both ends of a communication. TMWYH-Body works by determining the knowledge for disseminations and disseminating the knowledge appropriately to the other side, which will extract these knowledge and enqueue them for later use.

to model this behavior. Metrics: The performance metrics we primarily consider are bit-hop metric and throughput. Simulation Experiment: Firstly we generate the Internet topology. Then for each web site, servers which host the data is selected from all the nodes. Each request is modeled as described above. In the process, the four design elements function. We then collect the interested metrics.

6.

6.2.1

6.1

6.2

Experiment results and insights

We have three principles (AG, OBC, and TMWH), and integrated experiment. The variants can be: (i) Data related ( web sites, time); (ii) Network (Topology, flows, Queueing discipline type/length); (iii) NM deployment status (number of users) and (iv) parameters of design principles (such as the adaptation parameters); (v) Microscopic results. The baseline approaches are: Non-compression, Compression, Delta-encoding; We use three web sites to show the results: cnn.com (news), apple.com (software) and amazon.com (e-business). Each web site is shown in a figure, and each figure has four lines: three baseline approaches and NM.

EVALUATION

Number of flows

The queuing disciple is DropTail, 200 routers, all flows are NM-enabled. Shown in Figure 10.

Environment and methodology

The evaluation is performed with simulation. The reason for performing simulation is due to its large scale and ease of protocol incorporation. Prototyping on a Internet-scale test bed is a daunting task. Also, emulation test bed such as PlanetLab does not allow protocol incorporation required by NM. An evaluation of NM requires the substantiation of the following components. (i) Data (i.e., Information served from the server); (ii) Internet topology (e.g., the nodes, links, etc); (iii) Client Requests (e.g., asking what data.). Data: All evaluation environments use real data traces. The data used for evaluation come from 11 popular web sites. We primarily rely on the Web data because it is easier to obtain and the web browsing is the most pervasive activities when people access Internet. We collect more than 4 months of data, and the total data size is more than 700 GB. The collection of data is as follows. (Elaborate on How to collect the data.) Internet Topology: Primarily Transit-Stub. Client requests: We model the requests pattern by introducing probabilities to the user requests. Specifically, we assume a request accessing a web site always starts from the main html file. All the objects required for the display of the main html are also fetched. The links contained in the main html (referred to as level-1 links) are accessed by users with certain probabilities. Further links that are contained in these links (referred to as level-x links, where x > 1) are then accessed by users with decreasing probabilities. We introduce access probabilities PLi and let PLi = c PLi−1 , where c < 1

6.2.2

Impact of deployment status

The queuing disciple is DropTail, 200 routers. Shown in Figure 11.

6.2.3

Impact of queuing discipline

The queuing disciple is RED, 200 routers. Shown in Figure 12.

6.2.4

NM-enabled flows only

200 routers. Shown in Figure 13.

7.

ISSUES AND DISCUSSIONS

Other deployment models Though MCP presented in this paper is realized as a layer-3.5 pair-wise solution, we do envision the possibilities of other deployment models. One of these possible models is to deploy MCP at layer-4.5 or even application-layer, where MCP eliminates the traffic redundancy before the data are delivered to transport layer. The advantage of such a model is the data size is reduced even before transport layer, whereas the disadvantages include the violation of application-transparency and deployment difficulties. Another possible model is to deploy MCP as a multipoint solution that involves not only the data sender and receiver, but also other entities such as routers and other hosts. Interestingly, work [3] studies this possibility and proposes to cache redundant packets on Internet routers. We leave these possible deployment models for future work. 12

200 nodes; DropTail; All NM flows

200 nodes; DropTail; All NM flows

1200

80 Default NM

Default NM 70

Aggregate Throughput (Mbit)

Aggregate Bit−Hop (Mbit−Hop)

1000

800

600

400

60

50

40

30

20

200 10

0 20

40

60

80

100

120

140

0 20

160

40

60

80

Number of Flows

100

120

140

160

Numer of Flows

(a) Bit-Hop

(b) Throughput Figure 10: Impact of number of flows

200 nodes; DropTail;

200 nodes; DropTail;

1200

80 Default NM

Default NM

1100

Aggregate Throughput (Mbit)

Aggregate Bit−Hop (Mbit−Hop)

70 1000

900

800

700

600

60

50

40

30 500

400 10

20

30

40

50

60

70

80

90

100

20 10

20

30

40

NM Deployment (%)

50

60

70

80

90

100

NM Deployment (%)

(a) Bit-Hop

(b) Throughput Figure 11: Impact of deployment status

Consistency checking When redundant byte sequences are identified, they are represented by a corresponding tag. Most tag designs in existing works that exploit redundancy uses a hash digest (e.g. MD5) to serve as a tag. MCP also uses MD5 digest. However, there is always a concern regarding the consistency checking, that is, what if there is collision? In other words, if two data pieces result in the same MD5 digest, they will cause collision and may deliver incorrect data. Though a straightforward solution to this concern is to perform another step of consistency checking, for instance, adding another round hash checking for the entire packet, MCP does not explicitly perform that partly due to its layer-3.5 deployment model. Since layer-4 protocol will perform correctness checking, and MCP protocol will work below layer-4 protocol, MCP relies the above protocols to perform consistency checking. Moreover, qualitatively, the MD5 collision is extremely rare, thus we believe the cost of adding a separate consistency checking mechanism in MCP significantly outperform the benefit the mechanism can possibly achieve. Security With MCP, different pieces of data for any given information can be left stored in the memory of network elements for long periods of time. Also, the MCP maintains a single memory space which is shared by all MCP connections (and hence all layer-4 connections). Thus, security is an important issue that needs to be addressed. An impor-

tant aspect of MCP is that it does not violate any security mechanisms that might already exist. Any security loop that is already maintained will not be broken when using MCP as it operates at a layer beneath typical implementations of security functionalities. However, the performance benefits achievable through the use of network memory will be compromised when it comes to encrypted traffic since redundancies will no longer exist across different instances (sessions) of the same content as long as different keys are being utilized (which is the norm). We will explore opportunities for MCP and security strategies to be jointly realized to overcome the conflict. Adoption curves One important property of MCP is that it does not require complete deployment on all hosts simultaneously. Though MCP is a pair-wise solution and requires both the two communicating network elements to deploy MCP in order to achieve benefit, it does not violate the working of other protocol layers even if only one side has deployed MCP since MCP works transparently and will not be enabled unless both sides are MCP-ready. More importantly, the benefits of MCP will be inflated when more and more communicating hosts are MCP-enabled. As shown in the evaluation results, with more MCP-ready flows, the throughput achieved by each flow is larger.

8. 13

RELATED WORKS

200 nodes; RED;

200 nodes; RED;

1100

80 Default NM

70

Aggregate Throughput (Mbit)

Aggregate Bit−Hop (Mbit−Hop)

1000

900 Default NM 800

700

600

500 10

60

50

40

30

20

30

40

50

60

70

80

90

20 10

100

20

30

40

NM Deployment (%)

50

60

70

80

90

100

80

90

100

NM Deployment (%)

(a) Bit-Hop

(b) Throughput Figure 12: Impact of queuing discipline

200 nodes;

200 nodes;

600

80 DropTail RED

DropTail RED

Aggregate NM Throughput (Mbit)

Aggregate NM Bit−Hop (Mbit−Hop)

70 500

400

300

200

100

60

50

40

30

20

10

0 10

20

30

40

50

60

70

80

90

100

NM Deployment (%)

0 10

20

30

40

50

60

70

NM Deployment (%)

(a) Bit-Hop

(b) Throughput

Figure 13: Aggregate results of NM-enabled flows only • New Internet architecture Many work have been proposed to overcome the drawbacks of current Internet architecture over the past several years. A data-oriented network architecture [8] is proposed to provide an anycast service by naming the data in a way that requests can be routed to optimal locations that have the data copy. A flat-label routing architecture [5], based on virtual ring routing, shows that addressing and routing based on flat labels may still be scalable and efficient in the Internet scale. A new data naming architecture [4] is suggested achieve benefits such as persistence, mobility, multi-homing and middle boxes. Compared to them, network memory is motivated by an entirely different observation which is the pervasive traffic redundancy, and it explicitly decouples the data source from the information source. • Traffic redundancy Various approaches are also developed to eliminate the traffic redundancy for various applications (particulary HTTP) by identifying similar contents. [6] proposes an efficient selection algorithm for selecting similar objects as references. A value-based web caching [17] is motivated by the facts that web files may be changed gradually and aliased, and proposes to split files into blocks. [10] focuses on HTTP protocol and proposes an approach to eliminate the transfer on the sender side (the HTTP server). Also, a protocol-independent technique [20] proposes a mechanism to detect repetitive traffic on a communication link

and provides a protocol-independent idea to eliminate the repetitive segments. [19] uses digests for packets to directly suppress redundant transfers in networks by using a proxy on either end of a low bandwidth connection. Work [3] proposes to deploy packet-level memories on Internet routers and change routing protocols to explicitly remove redundancies. These work provide some observations into the traffic redundancy and proposes some mechanisms to eliminate the redundancy. However, the insight into redundancy and the solutions proposed are limited in various ways. • Cooperation caching Mainly motivated by the time dimension traffic redundancy, several approaches are proposed to exploit such redundancy and reduce users’ response time. Squirrel [7] provides a decentralized, peer-to-peer web cache by enabling web browsers on desktop machines to share their local caches and form an efficient and scalable web cache. A churn-resistant peer-to-peer web caching system [9] is designed to resist churn attacks. [18] develops a novel caching algorithm for P2P traffic. These work address the shortcomings of a single caching location by providing multiple locations and organizing them in a transparent fashion. The caching are performed on file-level, which significantly limit their effectiveness. • CDN and Server farms To optimize the data delivery from popular servers, particularly WWW servers, technologies such as CDN and server farms have been designed and

14

widely deployed. Researches related to them are also carried out extensively. Technologies such as CDNs aim at redirecting user requests to optimal (i.e. closest, low-overhead) servers. They do not fully decouple the information source from data source in that the servers are still mirrors of each other. Thus, the benefits they can achieve are very limited. In particular, they cannot effectively eliminate the traffic redundancy at dimensions such as direction and application. • Multiple source downloading Mostly apparent in P2P networks such as BitTorrent, multi-source downloading is one way to reduce the download time on the user side. Researches such as [14, 15] have studied the benefits and properties of multi-source downloading in the context of P2P applications. In addition, work [13] presents one way to identify similar or identical contents using “handprints”, and gives the idea of multi-source downloading to exploit the similarities.

9.

[5]

[6]

[7]

CONCLUSION

In this work, we study the nature of redundancies reside in today’s network traffic. We propose a solution suite called Network Memory which fully decouples the information source and data source and is transparent to applications. Network memory can help deliver better performance by exploiting the redundancies.

[8]

[9]

APPENDIX Rabin-based delimitation We show an example of Rabinbased delimitation. (the raw bytes, exact function, values, etc. )

Byte values

7 9 4 6 5 2 8 7 3 5 9 6

Rabin values

9 7 0 6 4 1 5 0 2 4 1 7

K=10, p_d=10%

9 7 0 6 4 1 5 0 2 4 1 7

[10]

[11]

[12]

K=10, p_d=20%

9 7 0 6 4 1 5 0 2 4 1 7

Figure 14: Rabin-based delimitation example

A.

[13]

REFERENCES

[1] Comscore media metrix top 50 online property ranking. [2] Ustream.tv. [3] A. Anand, A. Gupta, A. Akella, S. Seshan, and S. Shenker. Packet caches on routers: the implications of universal redundant traffic elimination. SIGCOMM Comput. Commun. Rev., 38(4):219–230, 2008. [4] H. Balakrishnan, K. Lakshminarayanan, S. Ratnasamy, S. Shenker, I. Stoica, and M. Walfish. A layered naming architecture for the internet. In SIGCOMM ’04: Proceedings of the 2004 conference

[14]

[15]

15

on Applications, technologies, architectures, and protocols for computer communications, pages 343–352, New York, NY, USA, 2004. ACM. M. Caesar, T. Condie, J. Kannan, K. Lakshminarayanan, and I. Stoica. Rofl: routing on flat labels. In SIGCOMM ’06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, pages 363–374, New York, NY, USA, 2006. ACM. M. C. Chan and T. Y. C. Woo. Cache-based compaction: A new technique for optimizing web transfer. In Infocom ’99: Proceedings of IEEE INFOCOM, New York, NY, USA, 1999. ACM. S. Iyer, A. Rowstron, and P. Druschel. Squirrel: a decentralized peer-to-peer web cache. In PODC ’02: Proceedings of the twenty-first annual symposium on Principles of distributed computing, pages 213–222, New York, NY, USA, 2002. ACM. T. Koponen, M. Chawla, B.-G. Chun, A. Ermolinskiy, K. H. Kim, S. Shenker, and I. Stoica. A data-oriented (and beyond) network architecture. SIGCOMM Comput. Commun. Rev., 37(4):181–192, 2007. P. Linga, I. Gupta, and K. Birman. A churn-resistant peer-to-peer web caching system. In SSRS ’03: Proceedings of the 2003 ACM workshop on Survivable and self-regenerative systems, pages 1–10, New York, NY, USA, 2003. ACM. J. C. Mogul, Y. M. Chan, and T. Kelly. Design, implementation, and evaluation of duplicate transfer detection in http. In NSDI’04: Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, pages 4–4, Berkeley, CA, USA, 2004. USENIX Association. G. Pallis and A. Vakali. Insight and perspectives for content delivery networks. Commun. ACM, 49(1):101–106, 2006. K. Park and V. S. Pai. Scale and performance in the coblitz large-file distribution service. In NSDI’06: Proceedings of the 3rd conference on 3rd Symposium on Networked Systems Design & Implementation, pages 3–3, Berkeley, CA, USA, 2006. USENIX Association. H. Pucha, D. G. Andersen, and M. Kaminsky. Exploiting similarity for multi-source downloads using file handprints. In Proceedings of the 4th USENIX NSDI, Cambridge, MA, Apr. 2007. Y. Qiao and F. E. Bustamante. Structured and unstructured overlays under the microscope: a measurement-based view of two p2p systems that people use. In ATEC ’06: Proceedings of the annual conference on USENIX ’06 Annual Technical Conference, pages 31–31, Berkeley, CA, USA, 2006. USENIX Association. D. Qiu and R. Srikant. Modeling and performance

[16]

[17]

[18]

[19]

[20]

[21]

[22]

analysis of bittorrent-like peer-to-peer networks. In SIGCOMM ’04: Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications, pages 367–378, New York, NY, USA, 2004. ACM. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker. A scalable content-addressable network. In SIGCOMM ’01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 161–172, New York, NY, USA, 2001. ACM. S. C. Rhea, K. Liang, and E. Brewer. Value-based web caching. In WWW ’03: Proceedings of the 12th international conference on World Wide Web, pages 619–628, New York, NY, USA, 2003. ACM. O. Saleh and M. Hefeeda. Modeling and caching of peer-to-peer traffic. In ICNP ’06: Proceedings of the Proceedings of the 2006 IEEE International Conference on Network Protocols, pages 249–258, Washington, DC, USA, 2006. IEEE Computer Society. J. Santos and D. Wetherall. Increasing effective link bandwidth by suppressing replicated data. In ATEC ’98: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 18–18, Berkeley, CA, USA, 1998. USENIX Association. N. T. Spring and D. Wetherall. A protocol-independent technique for eliminating redundant network traffic. SIGCOMM Comput. Commun. Rev., 30(4):87–95, 2000. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM ’01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications, pages 149–160, New York, NY, USA, 2001. ACM. A.-J. Su, D. R. Choffnes, A. Kuzmanovic, and F. E. Bustamante. Drafting behind akamai (travelocity-based detouring). SIGCOMM Comput. Commun. Rev., 36(4):435–446, 2006.

16

Suggest Documents