A Framework For Separating Server Scalability and ... - CiteSeerX

A Framework For Separating Server Scalability and Availability From Internet Application Functionality by Armando Fox B.S. (Massachusetts Institute of Technology) 1990 M.S. (University of Illinois at Urbana{Champaign) 1992 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA at BERKELEY Committee in charge: Professor Eric A. Brewer, Chair Professor David E. Culler Professor Paul K. Wright

1998

The dissertation of Armando Fox is approved:

Chair

Date

Date

Date

University of California at Berkeley 1998

A Framework For Separating Server Scalability and Availability From Internet Application Functionality

Copyright 1998 by Armando Fox

1

Abstract A Framework For Separating Server Scalability and Availability From Internet Application Functionality by Armando Fox Doctor of Philosophy in Computer Science University of California at Berkeley Professor Eric A. Brewer, Chair To meet the service demands created by the Internet's exponential growth, operators are scrambling to deploy application-level services, including Web caches, commerce servers, and intelligent transformation proxies for mobile \thin clients." On the one hand, the Internet's growth rate places unprecedented scalability and robustness demands on these services; on the other hand, that same growth rate demands that new services be developed, deployed, and evolved at a pace that is precipitous even by the standards of today's desktop software development cycles. We demonstrate that for a certain class of applications, these apparently con icting goals can be reconciled by completely separating the application logic from the runtime support for scalability and high availability. To this end, we rst describe TACC, a compositionbased application structuring model that captures four elements that we have empirically found to form the basis of a wide variety of Internet applications: transformation, aggregation, caching, and customization of Internet content. We then present SNS (Scalable Network Server), which was co-evolved with TACC to run applications on a cluster of commodity workstations. Because TACC chooses a speci c extreme point on the well-

2 known spectrum that trades high availability for data consistency, SNS can exploit a variety of simple and robust scalability and availability mechanisms, inspired by similar techniques used to make mission-critical systems failsafe and Internet protocols robust and scalable. Using TACC and SNS as the primary motivating examples, we then generalize the techniques that made TACC and SNS robust in practice and tractable to engineer. We argue that the cost of an unwise tradeo between consistency and availability is magni ed by the large scale of Internet applications, and we propose various alternative approaches to consistency and state management. We describe several production applications that demonstrate the exibility and ease of use of TACC, the robustness and scalability of SNS, and the concrete advantages of the state management and application structuring strategies we advocate. Finally, we attempt to extract lessons for the design and implementation of both Internet-scale interactive applications in particular and complex highly-available software systems in general.

Professor Eric A. Brewer Dissertation Committee Chair

iii

Contents List of Figures List of Tables

vii viii

I Overview and Motivation

1

1 Vision: The Content You Want

2

1 2

3 4

Background: From Bit Plumbing to Information Delivery Four Trends Point the Way . . . . . . . . . . . . . . . . . 2.1 Internet Growth and Diversity . . . . . . . . . . . 2.2 Device and Network Convergence . . . . . . . . . . 2.3 Application-Level Infrastructure Services . . . . . 2.4 Toward an Internet Services Integration Strategy . How To Get There: Hypothesis and Contributions . . . . 3.1 TACC: Composable Services . . . . . . . . . . . . 3.2 Robust Deployment: Cluster-Based TACC Server . Thesis Map . . . . . . . . . . . . . . . . . . . . . . . . . .

2 The Case For Infrastructure Proxies 1 2 3

4

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

Infrastructure Applications In General . . . . . . . . . . . . . . . An Application of Infrastructure Services: Adaptation By Proxy Bene ts of Infrastructure Proxies . . . . . . . . . . . . . . . . . . 3.1 Incremental Deployment . . . . . . . . . . . . . . . . . . . 3.2 Legacy Support . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Aggressive Application Partitioning . . . . . . . . . . . . Arguments Against Proxies . . . . . . . . . . . . . . . . . . . . . 4.1 End-to-End Arguments . . . . . . . . . . . . . . . . . . . 4.2 Client Evolution . . . . . . . . . . . . . . . . . . . . . . . 4.3 Layered and Pre x Encodings . . . . . . . . . . . . . . . . 4.4 Latency Trade-O Will Go Away . . . . . . . . . . . . . . 4.5 Proxies Induce Pessimal Routing . . . . . . . . . . . . . . 4.6 End-to-End Security . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

2 4 6 6 7 8 9 11 12 13

15 16 18 20 20 20 21 21 22 23 24 25 25 26

iv 5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Adaptation-Centric Internet Services 1 2

3 4

Datatype-Speci c Distillation and Re nement . . . . . . Performance of Distillation and Re nement On Demand 2.1 Images . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Rich-Text . . . . . . . . . . . . . . . . . . . . . . Dynamic Network Adaptation . . . . . . . . . . . . . . . Summary of Adaptation . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

27

28 28 30 31 34 36 37

II A Programming Model and Scalable Server for Internet Services 40 4 Internet Application Semantics 1 2

3

4 5

Infrastructure Service Challenges: \The Internet Utility Company" ACID Confronts Its Discontents . . . . . . . . . . . . . . . . . . . . 2.1 Traditional ACID Applications . . . . . . . . . . . . . . . . 2.2 Internet Applications as ACID on a BASE Substrate . . . . The CAP Principle and Approaches to Consistency . . . . . . . . . 3.1 ACID{BASE Decomposition . . . . . . . . . . . . . . . . . 3.2 Best Eort Internet Applications . . . . . . . . . . . . . . . 3.3 Expiration-Based Consistency . . . . . . . . . . . . . . . . . 3.4 Summary of Consistency Mechanisms . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Make The Common Case Fast. . . . . . . . . . . . . . . . . . 4.2 The Right Tool For the Job . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 TACC: A Programming Model For Internet Services 1 2

3

4 5

Targeting the Infrastructure Service Challenges . . . TACC Model Overview . . . . . . . . . . . . . . . . 2.1 Modular Applications . . . . . . . . . . . . . 2.2 Parameterizable Workers . . . . . . . . . . . 2.3 TACC Functionality . . . . . . . . . . . . . . 2.4 Composition . . . . . . . . . . . . . . . . . . 2.5 Failure Semantics . . . . . . . . . . . . . . . . TACC Mechanics . . . . . . . . . . . . . . . . . . . . 3.1 Worker Model . . . . . . . . . . . . . . . . . 3.2 Worker Logic . . . . . . . . . . . . . . . . . . 3.3 Dispatch Rules . . . . . . . . . . . . . . . . . 3.4 TACC Cache . . . . . . . . . . . . . . . . . . 3.5 Customization Database . . . . . . . . . . . . 3.6 Access to Per-Node Resources . . . . . . . . . State Management and ACID{BASE Decomposition TACC Open Issues . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

41 41 42 42 44 46 49 54 56 57 57 58 59 59

61 61 62 62 64 65 66 67 69 69 69 70 73 74 75 76 77

v

6

5.1 Formal Composition Rules . 5.2 Degree of Worker Modularity 5.3 Cluster Dependency . . . . . Summary . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

The Case For Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Non-Cluster Scalability and Availability . . . . . . . . . . . . 1.2 Harnessing Clusters for Availability and Partition Resilience . Architecture of SNS, A Cluster-Based TACC Server . . . . . . . . . SNS Operation Logistics . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis of the TranSend Implementation . . . . . . . . . . . . . . . 4.1 Failure Management . . . . . . . . . . . . . . . . . . . . . . . 4.2 Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SNS and the CAP Principle . . . . . . . . . . . . . . . . . . . . . . . 5.1 Constraints On Workers . . . . . . . . . . . . . . . . . . . . . 5.2 Session State and the SNS Cache . . . . . . . . . . . . . . . . 5.3 User Semantics and Common Case Failures . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. 81 . 81 . 83 . 84 . 87 . 87 . 88 . 89 . 89 . 92 . 93 . 96 . 96 . 97 . 97 . 101

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

6 A Cluster-Based, Scalable TACC Server 1

2 3 4 5 6

7 Application Case Studies 1

2

3 4 5 6

TranSend Web Accelerator . . . . . . . . . . . . . . . . . . . . . . 1.1 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Customization and Caching in TranSend . . . . . . . . . . . 1.3 TranSend Extensions . . . . . . . . . . . . . . . . . . . . . . 1.4 Error Handling and Crash Recovery . . . . . . . . . . . . . 1.5 TranSend Summary . . . . . . . . . . . . . . . . . . . . . . Top Gun Wingman . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Wingman as a Partitioned Application . . . . . . . . . . . . 2.3 Customization, Caching, and Crash Recovery in Wingman . 2.4 Client Competence and Client Responsiveness . . . . . . . . 2.5 Wingman Summary . . . . . . . . . . . . . . . . . . . . . . Top Gun MediaPad . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Leveraging Scalable Reliable Multicast . . . . . . . . . . . . 3.2 Orthogonal State Management . . . . . . . . . . . . . . . . 3.3 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . Charon: Indirect Authentication for Thin Clients . . . . . . . . . . Group Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 State Management . . . . . . . . . . . . . . . . . . . . . . . \Screen Scrapers" and Aggregators . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

77 77 78 79

80

103 104 105 105 106 107 107 108 108 110 111 111 112 112 113 114 114 116 116 118 119 120

vi 7

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

III Related and Future Work

123

8 Related Work 1

2 3 4

Internet Gateways, Intermediaries, and Composable Applications . . . 1.1 Composable Proxy-Like Services . . . . . . . . . . . . . . . . . 1.2 Semi-Transparent Agents For Semantic Transformation . . . . 1.3 Proxies and Groupware . . . . . . . . . . . . . . . . . . . . . . Application Partitioning and Adaptation . . . . . . . . . . . . . . . . . 2.1 Application Partitioning for Small Devices . . . . . . . . . . . . 2.2 Dynamic Client Adaptation By Proxy Interposition . . . . . . 2.3 Network Adaptation By Proxy . . . . . . . . . . . . . . . . . . Scalable Reliable Servers For Interactive Workloads . . . . . . . . . . . 3.1 Scalable Servers . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Trading Consistency For Simplicity, Availability, or Robustness Wider Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 Lessons and Ongoing Work 1

2

3 4

The KISSO Principle . . . . . . . . . . . . . . . . . . . . . 1.1 Simple Mechanisms . . . . . . . . . . . . . . . . . 1.2 Soft State With Refresh . . . . . . . . . . . . . . . 1.3 Orthogonality Is Better Than Layering . . . . . . . Other Lessons From TACC and SNS . . . . . . . . . . . . 2.1 Separation of Concerns and Front-End Scalability 2.2 Keep Clients Simple . . . . . . . . . . . . . . . . . 2.3 Sociology: \Build It and They Will Come" . . . . 2.4 Incremental Deployment . . . . . . . . . . . . . . . Non-Goals, Non-Contributions, and Open Problems . . . Long-Term Research Agenda . . . . . . . . . . . . . . . .

10 Conclusions Bibliography

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

124 125 125 126 126 127 127 128 129 130 130 130 131

133 133 134 134 137 139 139 140 142 143 143 144

147 149

vii

List of Figures 1 2

3 1

2 3 1 2 3 4

Distillation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . End-to-end latency for images with and without distillation. Each group of bars represents one image with 5 levels of distillation; the top bar represents no distillation at all. The y-axis number is the distilled size in kilobytes (so the top bar gives the original size). Note that two of the undistilled images are o the scale; the Soda Hall image is o by an order of magnitude. . . . Screen snapshots of our rich-text (top) versus ghostview (bottom). The richtext is easier to read because it uses screen fonts. . . . . . . . . . . . . . . . Architecture of a cluster-based TACC server. Components include front ends (FE), a pool of TACC workers (W) some of which may be caches ($), a user pro le database, a graphical monitor, and a fault-tolerant load manager, whose functionality logically extends into the manager stubs (MS) and worker stubs (WS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Worker queue lengths observed over time as the load presented to the system

uctuates, and as workers are manually brought down. . . . . . . . . . . . . An enlarged view of gure 2. . . . . . . . . . . . . . . . . . . . . . . . . . . Screenshot of the Top Gun Wingman browser. This screenshot is taken from the \xcopilot" hardware-level Pilot simulator [36]. . . . . . . . . . . . . . . The Multivalent Document (MVD) editor/browser Java applet, which provides the user interface to the GrAnT service. . . . . . . . . . . . . . . . . . Group annotation of the Web as a TACC service. TACC modules control storage and retrieval of annotation les from a separate annotation database, merging annotation information with HTML pages on the y. . . . . . . . . The Wingman aggregator user interface. The example aggregator shown is for the HotBot search engine. The search terms in the \Aggregator commands" text eld will be passed to the HotBot aggregator when the OK button is pressed, the site-speci c handler will convert these to an HTML form submission suitable for the HotBot server, and HotBot's results will be transformed and sent back to the Wingman client. . . . . . . . . . . . . . .

31

33 39

85 93 94 109 118 119

121

viii

List of Tables 1 2 3

1 2

3 1

1

Physical variation among clients . . . . . . . . . . . . . . . . . . . . . . . . Typical Network Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Capabilities of desktop PC's costing about $2,500 in unadjusted dollars, between 1994 and 1998. According to Survey.Net's second Internet survey started in March 1997 [129], both the mean and median connection speed for home users are hovering around 28.8 Kb/s, and for those users that have access from work (81% of the same group of respondents), only 26% report access speeds of 56 Kb/s or greater. . . . . . . . . . . . . . . . . . . . . . .

19 19

Three important types and the distillation axes corresponding to each. . . . Distillation latency (seconds of wall clock time) and new sizes as a percent of the original, for three sets of distillation parameters and four images. Each column is independent, i.e. the last column gives the results for performing all three operations. There is an implicit requantization back to the original color palette in the rst conversion, accounting for its higher times. . . . . . Features for PostScript distillation . . . . . . . . . . . . . . . . . . . . . . .

30

32 35

Data semantics required by each component of the ctitious consumer shopping site, implementation requirements for each, and eect on the overall service when each component fails. Note that in each case, the site can continue to oer some service. . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Results of the scalability experiment. \FE" refers to front end. . . . . . . .

95

23

ix

Acknowledgments From its inception, this project has bene ted from the detailed and perceptive comments of countless anonymous reviewers, users, and collaborators. Ken Lutz and Eric Fraser con gured and administered the test network on which the TranSend scaling experiments were performed. Cli Frost of the UC Berkeley Data Communications and Networks Services group allowed us to collect traces on the UC Berkeley dialup IP network and worked with us to deploy and promote TranSend within UC Berkeley. Undergraduate researchers David Gourley, Anthony Polito, Benjamin Ling, Andrew Huang, David Lee, and Tim Kimball helped implement various parts of TranSend and Top Gun Wingman. Ian Goldberg and David Wagner helped debug TranSend (both the implementation and the design), especially through their implementation of the so-cool Anonymous Rewebber, and Ian implemented major parts of the client side of Top Gun Wingman (especially the 2-bit-per-pixel hacks) as well as the rendezvous mechanism for nonblocking TACC inter-worker calls. Paul Haeberli of Silicon Graphics contributed image processing code for Top Gun Wingman. Thanks also to the patient students of UCB Computer Science 294{6, Internet Services, Fall 97, for being the rst real outside developers on our TACC platform and greatly improving the quality of the software and documentation. In particular, the Group Annotation Service described in the \Applications" chapter was prototyped as a class project by Marcel Kornacker and Ray Gilstrap. Thanks is due for all the valuable feedback from my UC Berkeley colleagues, especially those outside my thesis committee|particularly Eric Anderson, Trevor Pering, Hari Balakrishnan, and Mark Stemm. Within the committee, Randy and David have provided continuous valuable feedback, especially during and since my quals. Thanks also to my patient thesis readers: Yatin Chawathe, Steve Gribble, David Wagner, Drew Roselli (especially for helping me ne-tune the title and abstract), and Kristie Sallee. As well, the anonymous referees of the many papers that have come out of this project have sharpened

x the project's focus as well as improved the quality of our reporting. Eric Brewer, my thesis advisor, has been a valuable role model in more than just the traditional academic sense. His consistently broad vision (\You're not thinking big enough!") and ability to deliver targeted, high-value, concise feedback on short notice, and his ability to identify innovative and high-leverage solutions to dicult problems, are certainly abilities I'd like to develop. As of this writing, he is also the only role model I've ever had whose current net worth exceeds my expected cumulative lifetime salary.

:-)

My discussions about Internet services with Dr. Murray Mazer at Curl (formerly at the Open Group Research Institute) have ranged from the philosophical to the nuts-andbolts, and to the extent this work sheds any new light on the complex topic of characterizing Internet applications, a lot of credit and thanks is due Murray for the focused yet wideranging discussions he facilitated. Bob Miller and Terry Lessard-Smith always seemed to be one step ahead of us, knowing where our next paychecks were going to come from, what time the retreat buses departed, and all other manner of arcana resulting from the frequent headaches we doubtless caused them. Their performance as grant administrators and general good Samaritans has been nothing less than heroic. The folks at ProxiNet, Inc. were very understanding every time I griped about writing my thesis. Now it's written so I'll have to nd something new to gripe about. Because of Kathryn Crabtree, I have been spared from visiting Sproul more than one or two times during my entire stay at UC Berkeley. I don't know how she does it, but thanks for helping us all stay sane. That's the ocial stu. Personal kudos goes rst to the residents of 443 and 445 Soda, especially Ian, Dave, Steve, Yatin, and Mark. This project has bene ted from the collective talents of the nest group of people I've ever worked with, and I don't look forward to competing for funding against any group that includes them. A great deal of

xi what is reported in this thesis wouldn't have happened without them. From the shared all-nighters to the monthly media circuses to ReciprocityWare, life in 445 has been nothing if not interesting. I think we've all learned a lot of things they don't teach you in graduate school. Sally Shepard, who will succeed at any business endeavor she touches in addition to being one of the warmest and most giving people I know, provided unwavering love and support when it was needed most. I'm sure our paths will cross again in the future and I look forward to it. Finally, my parents and family belong on any acknowledgments list. However esoteric my professional work, and however complicated my personal life becomes, they are always there to help me in ways I never imagined. Thanks, Mom and Dad, for pulling out one surprise after another. And lest we forget: This research was supported by DARPA contracts #DAAB0795-C-D154 and #J-FBI-93-153, the California MICRO program, the UC Berkeley Chancellor's Opportunity Fellowship, the NSERC PGS-A fellowship, Hughes Aircraft Corp., and Metricom Inc.

1

Part I

Overview and Motivation

2

Chapter 1

Vision: The Content You Want We would like to evolve the Internet from its present incarnation into a \smarter" system that delivers the content you want. This phrase re ects the observation that despite its recent explosive growth and diversity of content and services, the Internet, and especially the Web, remains a primitive place. \One size ts all" is the rule for servers providing content and services to users, and no overarching data schema exists to facilitate automated data aggregation and transformation from dierent Web sites. There is no reason the future Internet cannot evolve into a system that delivers content in a way that depends on how you get to it: the client device (PDA, smart phone, laptop, public telephone system) and combination of networks (public telephone system, public Internet, cellular network), and what speci c content you are interested in. To understand why this is a problem that needs solving, and how we got here, it is instructive to brie y review the evolution of the Internet from its early days.

1 Background: From Bit Plumbing to Information Delivery The Internet began as \bit plumbing"|the ability to route arbitrary bits among identical computers (the venerable Internet Message Processors) using uniform mechanisms.

3 The major contribution of the Internet Protocol, which we now take for granted, was the ability to send data packets across dierent networking technologies by providing a uniform network-level communication protocol and a uniform set of mechanisms for routing and naming. After some experimentation, it was decided that IP would provide only a besteort packet delivery service. A reliable, ordered stream abstraction (TCP) quickly evolved as a higher layer, followed by facilities to copy les (ftp ), conduct remote interactive terminal sessions (telnet ), and so on. For the purposes of this thesis, the interesting story begins with the early infrastructure services : high-level applications hosted entirely in the network infrastructure,

often on special-purpose or dedicated hardware, providing value-added functionality to the infrastructure. For example, the Domain Name Service (DNS [98]) added a useful level of indirection to the Internet's naming scheme, by allowing Internet hosts to be named according to a symbolic hierarchy; full-sail.cs.berkeley.edu is certainly easier to remember than \128.32.33.29". DNS servers, arranged into a hierarchy, periodically exchanged information about new registered hosts, and observed a simple expiration-based caching protocol to bound the \staleness" of information in any one server. User programs accessed this infrastructure service through a simple standardized remote procedure call (RPC) like mechanism, via a library-based interface provided by the freely-distributed Berkeley Internet Name Daemon (BIND [46]). This architecture|an infrastructure-based service accessible via a well-known interface|establishes DNS as the rst widely-used Internet infrastructure application. Another early infrastructure service was electronic mail: the Simple Mail Transport Protocol (SMTP [122]) (and its rst widely distributed instantiation, the now-ubiquitous sendmail program) provided simple but reliable point-to-point email delivery. SMTP was

certainly an improvement over earlier systems such as UUCP: in that system, users had to explicitly specify the hop-by-hop mail route as part of the recipient address, and if any

4 node in the path was down, the mail could not be delivered. By relying on DNS's ability to name the destination server symbolically, and IP's ability to route to arbitrary addresses and automatically use alternate routes during transient failures, SMTP avoided both of these drawbacks. SMTP was a sign of things to come in that it achieved simplicity and robustness in part by transparently exploiting existing lower-level protocols: TCP for reliable delivery, DNS for symbolic destination addressing, IP routing for automatic forwarding. This same strategy was a key factor in the success of CERN's World Wide Web protocol suite in 1994. The suite proposed HTTP as a simple transport protocol for rich content, exploiting TCP for reliability and DNS for naming; the MIME type system for describing the content itself; and the HTML markup language, a simple variant of the earlier Standard Generalized Markup Language (SGML), as a way of describing the structure of rich text with embedded objects such as images and ll-out forms. CERN's proposed protocols underwent almost immediate (by typical technologytransfer timescales) mass adoption followed by explosive growth; today, for many users, the Web is the Internet. Just as the Web was itself enabled by a set of underlying protocols, it is now serving as a building block for increasingly sophisticated interactive services, including electronic commerce, software distribution, massive information indexing (e.g. search engines), and most recently, full-service \Internet portals" providing email access, personalized news feeds, Web site bookmarks, and a variety of other features, all via the nowfamiliar Web interface. It is this evolution, combined with hardware and communication growth trends, that motivates the present work.

2 Four Trends Point the Way This thesis proposes a rst cut at a general programming model for building services and applications in the wide-area Internet. Four speci c trends motivate the need for

5 such a model, two of which fall out of the above discussion. 1. The explosive, exponential growth of the Internet, according to almost any metric: number of users, number of connected sites, total bandwidth consumed, total bytes moved. With growth comes diversity|as we discuss in Chapter 3, users are connecting to the Internet at speeds ranging from hundreds of bits per second to a gigabit per second, using devices ranging from cellular phones to server-class desktop workstations. 2. Network convergence: Separate voice and data networks are converging, and wireless connectivity (via paging, cellular, and proprietary RF networks) is becoming ubiquitous and inexpensive. 3. Device convergence powered by Moore's Law: computers are getting smaller and communication-enabled (e.g. Seiko Instruments' RuPuter, a one-way pager and personal organizer worn as a wristwatch), and communication devices are starting to look like general-purpose programmable computers (e.g. Nokia's Communicator 9000, a cellular phone with a keyboard and pixel-addressable display, based on the GeoWorks operating system). 4. Infrastructure services, providing access to the increasing fraction of interesting data that by its nature requires an Internet connection: email and news, event noti cation, search engines (indexing the rapidly-changing Web), and group-oriented services such as shared calendars. We now discuss each trend in detail to determine what design requirements it induces.

6

2.1 Internet Growth and Diversity The Internet's recent explosion has been enabled largely by the mass-market adoption of the World Wide Web. Technically, the important contributions were a simple set of open standards enabling the representation and exchange of rich content, a particular type system (MIME) and provisions in the transport protocol for encoding type information along with data, and the speci cation of a universal namespace that directly leverages DNS and IP. By any metric, the Internet is undergoing exponential growth, which is expected to continue for some time [95]. Thus, Internet services must be engineered not only to handle today's already-heavy demands of millions of hits per day, but also to scale to tomorrow's demands and beyond. Furthermore, due to device and network convergence, Internet services will have to serve an unprecedented variety of clients, with widely varying communication and computation capabilities.

2.2 Device and Network Convergence In part because they specify a universal user interface, Web technologies have played a signi cant role in speeding up device convergence. Devices that have languished on the drawing board and projects that have progressed in ts and starts now have a concrete design target: World Wide Web access. The target is appealing not only because of its universality, but because of the ever-growing body of content and services available there, which results in increased user demand, which provides a larger audience for the development of new services, and so on. This positive feedback eect is captured by Metcalfe's Law, which states roughly that the utility of a networked service grows as the square of the number of connected users. Fueled by transistor curves and the lure of leveraging the Web, the spectrum of devices is impressive and growing: palmtop computers with wireless adapters, \smart" cellular phones (essentially phones with embedded palmtop computers), two-way pagers with

7 keyboards and pixel-addressable screens, set-top Web boxes, standalone Internet kiosks, watches with embedded pagers and PIM functions. Not surprisingly, device convergence is being accompanied by network convergence: data can be carried on cable and voicetelephony networks, voice calls can be carried on packet-switched IP networks, wireless data and voice services are growing, and in general the distinctions between wireless/wireline and voice/data networks are rapidly blurring. Unfortunately, Internet protocols and Web content in particular grew up around essentially a single design point|the desktop workstation with a fast connection to the Internet. The unprecedented heterogeneity introduced by device and network convergence has been a major stumbling block for Internet enablement of convergent devices. And because the devices ride transistor curves, hardware obsolescence will accelerate and we will see a continuing increase in the fraction of \legacy" clients being used to access the Internet. (This phenomenon is already clearly visible in the desktop PC arena: a PC purchased in 1994 cannot even run Web browsers released in late 1995.) Thus, the assumption that heterogeneity will remain a pervasive part of the Internet landscape should be a foundation for any Internet service strategy.

2.3 Application-Level Infrastructure Services Besides the increased reliance on email and news, interactive infrastructure services have seen a corresponding rise in popularity. Publicly-accessible databases (shopping catalogs, index services), consumer-oriented electronic commerce (online merchandise sale, home banking, travel reservations), and group-oriented services (shared calendars and discussion groups) are all accessible using the Web's universal user interface and protocols. In many cases these services represent increasingly aggressive \innovation behind the interface": sophisticated, fundamentally infrastructure services that target existing devices and interfaces.

8 The advantages of infrastructure services include potential economy of scale, wellconnected access to the rest of the Internet (particularly important for services that provide timely data such as news feeds), the centralization of reliability and administration concerns, and the ability to enhance the service \behind the interface" without requiring end users to go through an upgrade process. (The voice telephony market experienced a similar phenomenon when services such as call waiting and conference calling had to be accommodated with existing telephone user interfaces.) However, these very virtues become a liability in light of the enormous user loads on such services, and the users' expectations of 24-hour, 7-day-a-week availability for services that they consider to have become \mission critical". The engineering of a robust and scalable infrastructure service is nontrivial, and since we expect these requirements to apply to future infrastructure services, any proposed strategy for evolving the Internet must address this challenge.

2.4 Toward an Internet Services Integration Strategy The conclusions suggested by the above trends are tantalizing: it is cheaper than ever to be wirelessly connected, and there are more services than ever available to both wireless and wireline network users. The discovery by mobile users that access to online data is the \killer app" is motivating infrastructure data and computation services on large scales. Unfortunately, because the trends have evolved independently, the resulting amalgam of services suers from an almost complete lack of integration, with respect to both handsets and services. Pagers can receive email|but at a dierent maildrop and with severe length restrictions. Fax machines can be used as crude printers|but a local PC with fax software or a well-known fax gateway is necessary. Modern handsets typically have some rich-text and graphics display capability|but existing Internet content is authored in a \one size ts all" fashion that is largely undigestible by such devices. Existing ad hoc solutions to these problems do not interoperate, because they represent points in design spaces that

9 have evolved independently from disparate views of the communications and computation infrastructure. A long-term integration strategy is needed to bring this collection of services together, just as IP internetworking brought together a disparate collection of networking technologies. As we discuss in Chapter 3, our experience has shown that application-level adaptation is the most eective technique for addressing the device and network heterogeneity

implied by trends #2 and #3. So any successful strategy must enable adaptive services, by providing abstractions for adapting to network and client heterogeneity, a framework for describing the structure of adaptive applications, and a means of \intermediating" access between a disparate set of clients and servers. This last requirement can be met by providing adaptation in the middle, between the client and server endpoints|a technique that also addresses the legacy client problem. Trends #1 and #4 require that Internet applications (adaptive and otherwise) be deployed in a manner that provides high availability as well as scalability to millions of users. Given that scaling to such levels requires a potentially large and complex system, the ability to handle transient hardware and software faults automatically will be instrumental in making the system as a whole cost-eective to deploy and administer. Handling software faults is particularly relevant because one of the eects of the rapid growth of the Internet has been to further compress the software development cycle, from \desktop time" to \Internet time"; for example, the HotMail service [32] deploys bug xes and site improvements roughly every two weeks.

3 How To Get There: Hypothesis and Contributions The above material motivates the need for a solid design philosophy for scalable and highly available infrastructure services; client and network adaptation and Internet growth are just two of the major areas where such services are valuable. Through deploy-

10 ment of customizable infrastructure services, the Internet can be any experience the user wants it to be, and can deliver the content the user really wants. However, the rapid evolution of the Internet requires that such services be developed and deployed on rapid cycles. This requirement appears to be at odds with the intricate engineering required to deploy an infrastructure service: on the one hand robustness and scalability are paramount, but on the other we require the exibility to modify or extend service functionality once a service is deployed. In this thesis, we argue that the two goals can in fact be reconciled. We oer an existence proof using a layering strategy that enables the creation of new infrastructure services without rebuilding the \infrastructure service" engineering for each new service. In our strategy, incremental scalability, load balancing, automatic recovery from partial failures, and high availability are isolated in a lower layer that can be re-used as new services are built. This layer is suciently robust that new services can be created by composing existing services or even by importing legacy code that was never designed to be run in a hostile Internet environment. The isolation between layers makes the dierence between a system that is tractable to engineer and admits of easy extension, and a complex and brittle system that cannot be extended without risking stability of the service. In this thesis, isolation is achieved by reexamining traditional decisions about the data semantics required by speci c applications and how the correctness of those applications' behavior is characterized, and identifying how the resulting observations might be exploited to enforce the separation between the \infrastructure layer" and the \application layer" of our model. Examples of speci c observations and their implications include the following:

For some applications, the function that measures the usefulness of the application's output may be a smoothly-degrading function rather than a step function. For example, returning a partial list of search results may be better than returning nothing at

11 all. In many cases, a smoothly-degrading utility function can be mapped directly onto engineering mechanisms that provide continuous but degraded performance (rather than temporary unavailability) under transient faults.

Many applications do not require the strong consistency semantics of database-like applications, and at Internet scale, the cost of providing such semantics is prohibitive and needs to be justi ed. Applications compatible with weaker semantics allow the deployment of simple but robust software failure handling mechanisms, and can easily exploit the inherent redundancy of a cluster of commodity workstations for high availability.

3.1 TACC: Composable Services In many cases, new and useful services can be created by conceptually composing two or more existing services. For example, consider a Web-based interactive map service (such as TripQuest) on the one hand, and a transformation service that adapts Web content for pocket-sized PC's on the other. However, the machinery required to compose the two services robustly is not readily available. Even when a new service cannot be synthesized from existing services, the fundamental technology building blocks required to implement the new service may already exist. For example, the mapping service could be voice-controlled if it were possible to attach an existing speech-to-text converter to its input. Although there are generally no full-blown services that specialize in speech-to-text conversion, there are certainly programmatic building blocks for doing this. Composition is especially hard when the building blocks consist of legacy code; in addition to the usual vagaries, incorporating legacy code into a robust infrastructure service presents an even more severe set of challenges, which we outline in the next section. As a rst step toward solving the composition problem, one contribution of this

12 thesis is a simple building-block-based programming model for Internet applications. The TACC model provides abstractions for creating and composing application modules that

Transform, Aggregate, Cache, and Customize (on a per-user basis) content from Internet servers. The TACC model aims to facilitate composition at both the module and wholeservice level, in a way that insulates the programmer from the implementation details of deploying a highly-available scalable Internet service. TACC's composition-centric view of Internet services directly enables rapid application prototyping and support for new or legacy clients, including the large and useful class of adaptation-based applications.

3.2 Robust Deployment: Cluster-Based TACC Server The TACC \runtime" must be provided by a TACC server: a hardware and software platform that hosts TACC modules and instantiates the TACC API's and composition mechanisms. In designing such a server, we must observe signi cant additional constraints. With respect to Internet connections and services, users have come to expect 24 7 service and predictable performance regardless of load, as well as the ability to \tune" services such as My Yahoo [78] to their individual needs as they have become accustomed to doing in desktop applications. Finally, the service must also be reasonably cost-eective to operate, lest it be priced beyond the reach of most users. In terms of building services, the above constraints translate into the following engineering requirements for a TACC application server:

Scalability: Ability to track exponential growth in number of users and amount of trac. Since incremental growth is economically preferable to \forklift upgrades", a successful service should also support incremental scaling using low-cost building blocks.

High Availability: Users expect 24 7 service, despite inevitable transient failures in hardware and software.

13

Ease of Management: Large growing services are fundamentally complex. To avoid becoming a costly administrative nightmare, any framework for deploying them must support simple monitoring and management even when dealing with hundreds of nodes. In this thesis, we orthogonalize scalability, reliability, and management (which we refer to collectively as the \Internet utility requirements") with respect to application structure and content. In other words, mechanisms that address the Utility Requirements are provided transparently to all applications running on the server; no interface is provided, nor is any necessary, for applications to exploit the mechanisms. The software that encapsulates the Utility Requirements consists of a runtime layer for a cluster of commodity PC's; we will argue that cluster-based servers provide some inherent advantages for robustness, redundancy, and cost-eectiveness, and that the TACC model is a particularly good t for clusters. As we will show, orthogonalizing the Utility Requirements greatly simpli es the design, implementation, and overall robustness of the server, provided that the consequences on application behavior can be well characterized.

4 Thesis Map In the remainder of the rst part, we motivate infrastructure services in general and the proxy-based interposition model for such services in particular (Chapter 2). We also discuss in detail a particular class of applications, focused on network and client adaptation, that are of immediate practical interest and can be eciently implemented by infrastructure proxies (Chapter 3). In the second part, we motivate a composition-based programming model called TACC for structuring general Internet services, including proxy-based adaptation services. Chapter 4 raises trade-os between the engineering cost of the Utility Requirements and data semantics guarantees for Internet services, arguing that the traditional trade-os must

14 be re-examined for Internet applications. A main conclusion of that chapter is that a useful design point for many such applications involves transparent provisioning of high availability, scalability, and resilience to network partitions, but places state management concerns directly under the control of application modules. This motivates the compositionbased TACC model, which is described in detail in Chapter 5, along with its limitations and the implications for designing applications that do not t well into the TACC model. In Chapter 6 we describe and analyze a scalable cluster-based TACC server called SNS (Scalable Network Server), whose architecture exploits the TACC trade-os to achieve high availability and excellent scalability. Chapter 7 then provides case studies of several existing applications deployed on SNS, some with thousands of users. Finally, the third part discusses the rich body of related work (Chapter 8), extracts lessons from the present work, and proposes some avenues for future research (Chapter 9). In particular, in addition to speci c open questions suggested by this research, we will argue that the generalization of the design techniques underpinning SNS should be applicable to the design of complex reliable software systems, Internet or otherwise. We conclude with a summary of the contributions of the thesis in Chapter 10).

15

Chapter 2

The Case For Infrastructure Proxies Roughly speaking, we take the network infrastructure to consist not only of bitcarrying hardware such as wires and routers, but also of centrally-managed servers, either at the workgroup/intranet scale or the carrier-class/Internet scale. Under this de nition, a departmental NFS server, a central switch or router bank, a large shared Internet cache, and a public Web server all constitute infrastructure elements. An infrastructure service is a (typically substantial) application the bulk of whose functionality resides in the network infrastructure, and which is accessed interactively or non-interactively via some well-de ned client interface. Although routing and trac management are infrastructure services, we are concerned primarily with application-level infrastructure services. Of particular interest in this thesis are \Internet scale" services designed to run very powerful applications with extremely high (millions of users) workloads; for example, the HotBot search engine serves about 10 million hits per day, and Inktomi Traf c Servers are serving tens of millions of connections per day at major cache sites. This emerging class of applications presents challenges of scale and robustness far beyond the

16 engineering of typical software systems. In addition to the pure engineering challenges of large scale [104], infrastructure services must be highly available, even in the face of transient hardware and software failures. High availability is even more challenging to achieve at large scales, since transient faults become essentially inevitable as systems grow to encompass large numbers of components. Furthermore, Internet applications must be publicly accessible, which implies more stringent requirements for robustness and capacity management compared to \closed" controlledaccess infrastructure services such as private banking networks [65]). Not surprisingly, it is extremely dicult to engineer such services, so their existence requires some justi cation. In this chapter we motivate the desire for infrastructure services generally, and also identify a particular class of services of broad and immediate relevance, namely those that provide transparent adaptation to heterogeneous clients and networks through the use of infrastructure proxies. We also evaluate a variety of arguments against proxy-based services.

1 Infrastructure Applications In General Especially given the explosive growth of the Internet (which is expected to continue [95]), a number of compelling motivations for infrastructure services are immediately evident.

Access to data. Especially for mobile professionals, an increasing fraction of the data relevant to them is not static or locally stored on their portable devices, but is inherently in the network; email, news, and the Web are obvious examples from recent history, but even early services such as DNS were manipulating and serving data that is generated by the infrastructure itself. Much of this data, such as stock quotes and time-sensitive email, is not useful if stale. A service running at an infrastructure computing center is likely to have higher-quality, nearly-continuous connectivity to

17 the servers providing these kinds of data, especially compared to the connectivity typically available to mobile or home users.

Economy of scale. Basic queueing theory shows that a large central (virtual) server is more ecient in both cost and utilization (though less predictable in per-request performance) than a collection of smaller servers; standalone desktop systems represent the degenerate case of one \server" per user. To the extent that infrastructure applications do work that can be leveraged by multiple users, this supports one of the arguments for Network Computers [75] and suggests that infrastructure services provide one way to achieve eective economies of scale.

Reduced requirements on clients. Some services are simply too large and complex to host on individual clients. For example, a number of Web sites provide access to \interactive atlases" that can generate detailed driving directions, including graphical maps, for virtually any address in the United States. Even on a powerful desktop workstation, it would be cumbersome and expensive, at best, to store both the (nontrivial) application and all the map data required.

Rapid prototyping during turbulent standardization cycles. Software development on \Internet time" has aggravated the already-chaotic process of distributing software upgrades to millions of clients. Infrastructure services are centrally upgraded and managed, and the recent crop of Web-based services generally requires no client software changes, thanks to the use of the now-ubiquitous Web browser as the universal graphical user interface, and HTTP/HTML as the universal programmatic interface, to a wide range of applications.

Network externality eect due to large user communities. The well-known network externality eect, due to Bob Metcalfe, suggests that the usefulness of a network service grows roughly as the square of the number of connected users. Although

18 the Web itself supports this claim, a speci c example of leveraging this eect is illustrated by Fire y Network [103], in which intelligent software agents cross-correlate the entertainment preferences of large groups of users and recommend video titles based on the patterns of users with similar renting histories. Having a single centralized service serving millions of users, rather than a large number of instances of the service each serving fewer users, allows such collaborative ltering to take place on an unprecedented scale. Electronic commerce providers also leverage this ability to crosscorrelate the buying patterns of their customers; for example, the DoubleClick service can track users across multiple sites and use the information to improve advertisement targeting. Which leads to. . .

Mass customization. Many online services, including the Wall Street Journal, the Los Angeles Times, and C/Net, have deployed \personalized" versions of their service as a way to increase loyalty and the quality of the service. According to a recent Wired poll, customization is one of the few Internet features users seem to want more of. Such \mass customization" requires the ability to track users and keep pro le data for each user.

2 An Application of Infrastructure Services: Adaptation By Proxy The current Internet infrastructure includes an extensive range and number of clients and servers. Clients vary along many axes, including screen size, color depth, eective bandwidth, processing power, and ability to handle speci c data encodings, e.g., GIF, PostScript, or MPEG. As shown in tables 1 and 2, each type of variation often spans orders of magnitude. High-volume devices such as smart phones [31] and smart two-way pagers will soon constitute an increasing fraction of Internet clients, making the variation even

19 more pronounced.

Platform High-end PC Midrange PC Typ. Laptop Typical PDA

SPEC92/ Memory 200/64M 160/32M 110/16M low/2M

Screen Size

1280x1024 1024x768 800x600 320x200

Bits/ pixel 24 16 8 2

Table 1: Physical variation among clients

Network Local Ethernet ISDN Wireline Modem Cellular/CDPD

Bandwidth Round-Trip (bits/s) Time 10-100 M 128 K 14.4 - 56 K 9.6 - 19.2 K

0.5 - 2.0 ms 10-20 ms 350 ms 0.1 - 0.5 s

Table 2: Typical Network Variation These conditions make it dicult for servers to provide a level of service that is appropriate for every client. Application-level adaptation is required to provide a meaningful Internet experience across the range of client capabilities. Despite continuing improvements in client computing power and connectivity, we expect the high end to advance roughly in parallel with the low end, eectively maintaining a gap between the two and therefore the need for application-level adaptation. The remainder of this chapter therefore motivates a speci c subset of infrastructure services, namely those that focus on adaptation . We argue for a proxy-based approach to adaptation, in which proxy agents placed between clients and servers perform aggressive computation and storage on behalf of clients. Implementing proxies as infrastructure services allows them to realize the advantages of Section 1: high-quality access to data (particularly important in mobile computing, where connectivity may be intermittent or weak); economy of scale (by allowing many end users and many servers to share the proxy); reduced

20 requirements on clients (also relevant to mobile computing, where client capabilities may be sacri ced to reduce physical size or power consumption); and rapid prototyping during turbulent standardization cycles (particularly relevant now, as the standards for wireless and mobile computing are in ux).

3 Bene ts of Infrastructure Proxies We believe the proxy approach directly confers three advantages over client-centric and server-centric adaptation approaches: incremental deployment, legacy support, and the ability to do aggressive application partitioning.

3.1 Incremental Deployment The enormous installed infrastructure, and its attendant base of existing content, is too valuable to waste; yet some clients cannot handle certain data types eectively. A compelling solution to the problem of client and network heterogeneity should allow interoperability with existing servers, thus enabling incremental deployment while evolving content formats and protocols are tuned and standardized for dierent target platforms. A proxy-based approach lends itself naturally to transparent incremental deployment, since an application-level proxy appears as a server to existing clients and as a client to existing servers. Proxy-based adaptation provides a smooth path for rapid prototyping of new services, formats, and protocols, which can be deployed to servers (or clients) later if the prototypes succeed.

3.2 Legacy Support In the client-based approach to adaptation, all clients are expected to provide a least-common-denominator level of functionality (e.g. text-only, HTML-subset compatibility for thin-client Web browsers). The server-based approach, in contrast, attempts to insert

21 adaptation machinery at each end server to compensate for varying client abilities. Proxies can provide legacy support \at both ends": shielding servers from dealing with legacy clients, and providing interfaces to legacy servers. (In a sense, Web front-ends to SQL databases already perform a variant of the latter function, enabling much more widespread access to those databases.)

3.3 Aggressive Application Partitioning The partitioning of applications between a small client device and back end server is a powerful idea that has been explored in many past projects, in particular as the focus of the Wit [132] project and Xerox PARC's Ubiquitous Computing projects [133]. In those projects, partitioning was used as the mechanism by which slow networks or less-powerful devices were made peers in an existing infrastructure. As we will show in the example applications in Chapter 7, the proxy-based approach is a natural t for partitioning, and in particular allows us to observe two rules of thumb for application partitioning: client competence (letting the client do only what it does well, moving all other work to the

proxy) and client responsiveness (keeping the user interface responsive by not requiring communication with the server for every UI action).

4 Arguments Against Proxies In this section we evaluate several arguments against infrastructure services in general, and proxy-based services in particular. In examining the arguments, it is important to remember that proxy is a generic term that has only recently come to have the connotation \application-level proxy," with which we are primarily concerned. Many of the arguments have historically been advanced against lower-level proxy services (and have been justi ed in such cases), but lose force when applied to application-level proxies.

22

4.1 End-to-End Arguments End-to-end arguments [113] have proven to be a valuable design guideline for evaluating the design of systems with multiple layers of functionality. Roughly speaking, the end-to-end argument suggests that it is redundant and often misguided to implement functionality in lower layers that needs to be present in upper layers (the \endpoint" layers) anyway. For example, the engineering cost of a completely reliable link-level packet delivery for the Internet is not justi ed, because the fact that it must interoperate with unreliable networks means that applications will have to perform end-to-end reliability checks anyway. Because proxies are elements of a network infrastructure that naturally divides into layers, they are certainly candidates for evaluation by end-to-end arguments. We argue that application-level proxies (as opposed to network-level or lower) actually reinforce the end-to-end argument rather than violating it. This is because the \ends" in the endto-end argument refer to logical placement of functionality in layers|not necessarily to the physical endpoints of the application, although until the widespread deployment of proxies, physical endpoints happened to be the only place where higher-layer behaviors were implemented. (Consider Web servers: intermediate routers and caching proxies all move trac, but only the client renders content in a datatype-speci c way, and only the server generates application-level metadata such as MIME type and expiration date, based on the contents of each object served.) In Chapter 3 we will motivate speci c adaptation techniques that are well implemented by proxies. Because of the datatype-speci c nature of these techniques, the endto-end argument suggests that the functionality be placed at the application level, rather than the network level. The decision to place the functionality at a proxy as opposed to within clients or servers can then be motivated by other observations, but clearly a proxy implementation is consistent with the end-to-end argument.

23

4.2 Client Evolution Client devices continue to evolve at an almost alarming rate; today's (1998) palmsized devices carry computation, memory, and display resources comparable to a desktop PC of 1994, all powered by two small batteries. It is reasonable to wonder whether the continuation of this trend will eventually obviate the need for proxies. What is relevant to consider in answering this objection is the gap between lowend/pocket devices and high-end/desktop devices. If today's pocket devices compare favorably to a 1994 PC, then a 1998 PC would trounce pocket devices in almost every dimension, as table 4.2 shows. Sure enough, Web site content has become increasingly sophisticated, featuring animations, downloadable sound clips, applets embedded on pages, and other such bells and whistles that exploit the extra memory, better graphics, and faster CPU's.1 In such a world, proxies alleviate the management of legacy clients and even allow new features and encodings to be handled in one place rather than many, avoiding the need to redistribute client software to end users or accommodate every new device at every server.

Feature

1994 PC

1998 PC

Processor 66 MHz Pentium-class 400 MHz Pentium II-class Memory 8 MB 64 MB Display 800 600 16 1600 1200 24 Permanent store 720 MB 8 GB Internet connection 14.4{28.8 Kb/s 14.4{33.6 Kb/s Table 3: Capabilities of desktop PC's costing about $2,500 in unadjusted dollars, between 1994 and 1998. According to Survey.Net's second Internet survey started in March 1997 [129], both the mean and median connection speed for home users are hovering around 28.8 Kb/s, and for those users that have access from work (81% of the same group of respondents), only 26% report access speeds of 56 Kb/s or greater. Interestingly, as device evolution accelerates, the area of slowest change involves quality of Internet connectivity. Just as desktop software seems capable of absorbing exponentially-increasing hardware capacity, so Internet content and services seem capable of absorbing increasing bandwidth|although, sadly, bandwidth is increasing at a far slower rate. This observation reinforces the argument for proxies as a means of high-level network adaptation, as we discuss in Chapter 3. 1

24

4.3 Layered and Pre x Encodings Pre x encodings, such as Progressive JPEG [80], have the property that any pre x of the entire object is itself a valid object, in particular a reduced-quality version of the original. True pre x encodings are bit-granular; in practice, the more common layered encodings divide the object into a discrete and usually small number of layers, such that

receiving any pre x set of layers is sucient to render the object at reduced quality. For example, in the Progressive JPEG encoding, later layers contain successively higher-frequency components, so that a blurred image can be rendered quickly after receiving the rst few layers and \re ned" progressively as more layers arrive. It has been argued that such encodings would obviate the need for Web-acceleration proxies, since users would simply be able to click the \Stop downloading" button once a given object had been rendered at satisfactory quality|essentially trading download time for quality. However, pre x encodings cannot in general be used on data types in which the whole is not a layered composition of structurally-similar components, because the desired semantic \layering" is not inherent in the object's structure. For example, a reasonable semantic layering for a technical paper might involve producing rst a table of contents, then successively more detailed subsection headings, and nally complete text, which could be selectively retrieved for arbitrary noncontiguous sections of the document. To date, no common document encoding embodies this layered structure directly, although encodings such as HTML admit of application-level ltering techniques to \extract" the layers from the source document. Even when layerings are available, they do not always provide the desired granularity of layering or provide semantic degradation along the desired axis. For example, since Progressive JPEG layers are frequency-based, the rst-layer image is necessarily blurry. For images composed largely of line art, such as advertising banners, this layering is not useful; what is really desired is a \layering" that preserves contrast at the expense of some other

25 image feature, such as color depth. The Netscape LOWSRC image tag was introduced as a workaround for this very problem. Finally, for high-latency connections (including consumer modems and many wireless networks), it is quite possible that by the time the user clicks the \Stop downloading" button, most of the remaining packets may already have been sent from the server. The eect of clicking the Stop button will therefore be to discard the data after signi cant network resources (in the infrastructure anyway) have already been committed. Even when this is not the case, the capability for \choreographing" the download of multiple large images is limited because the data is being pushed by the server, not pulled by the client.

4.4 Latency Trade-O Will Go Away For complex formats such as PostScript, analysis and format conversion can take signi cant time (we present some prototype PostScript-to-HTML timing measurements in Chapter 3). As a result, in some cases no time savings will be realized since the reduced transmission latency will have been more than compensated by the analysis and conversion latency. This objection really amounts to an implementation and policy issue, not an argument against proxies in general. We will provide constructive proof in Chapters 3 and 6 that in many cases signi cant savings can in fact be achieved; devising a policy that makes the trade-o more intelligent is already recognized as an area for further study (see, for example, [93]).

4.5 Proxies Induce Pessimal Routing Under the proxy model, all user requests are routed through the proxy before going to the origin server, and all server replies are routed through the proxy before being returned to the client. This may cause pathological data routing; the extreme case occurs when a client requests content from a server that is in the next room, but the proxy through which

26 the request is made is far away (either geographically or from a bandwidth/performance perspective). This is similar to the \dogleg routing" or \triangle routing" problem in Mobile IP [82]. The solution, of course, is to deploy proxies at multiple strategically-selected locations; however, the choice of such locations, and the application-level algorithms that would discover and select the appropriate proxy, remain open questions relative to the present work. IBM's commercial WomPlex product [28] performs a similar function for redirecting trac to mirror servers; rst steps in this direction for proxied architectures would likely involve leveraging tools such as SPAND [118], which provides servers with information about the network conditions seen by the requesting client.

4.6 End-to-End Security End-to-end security (such as secure HTTP access to e-commerce sites or sites that serve sensitive information) is an example of a requirement that makes an application awkward to proxy. In particular, if the bene ts of data transformation are to be exploited, the proxy must in general be trusted to examine plaintext (although it can re-encrypt the transformed plaintext before forwarding it to the client). This scenario is incompatible with today's end-to-end security models. One could locate the proxy inside the server's trust perimeter, but that is tantamount to making the proxy part of the server. Perhaps a more likely scenario is that trusted third-party proxy operators will emerge, in whose nancial interest it is to maintain high standards of integrity. Even under such conditions, many users would be unwilling to trust any intermediary with sensitive plaintext; we conclude that for such sensitive applications, non-proxied solutions will have to be found. As we report in Chapter 7, it is fortunate that a wide range of useful applications do not have this property.

27

5 Summary In this chapter we outlined the general advantages of infrastructure services. Proxybased adaptation applications, especially those focusing on mobile or thin clients, are a natural candidate for implementation as infrastructure services: their design requirements are exactly ful lled by infrastructure services (high quality connectivity, reduction of client-side requirements, shielding clients and servers from uctuating standards and legacy support). In the next chapter we identify some speci c principles for the design of proxy-based adaptation services. We will show that the engineering requirements suggested by these design principles strengthen the case for implementing proxies as infrastructure services; the middle four chapters will then focus on the solving the engineering challenges posed by such services, including measurements of a prototype infrastructure application server and examples of several deployed adaptation-centric applications.

28

Chapter 3

Adaptation-Centric Internet Services In the previous chapter we motivated the proxy-based approach to client and network adaptation. The tables comparing current Internet clients and networks revealed order-of-magnitude dierences in client capabilities and network connectivity. In this chapter we describe speci c design principles for addressing this variation, and show that these principles exploit application-level knowledge to enable aggressive adaptation. This in turn suggests that application-speci c logic is needed to apply the design principles across a range of applications|\generic" network-level approaches such as simple lossless compression are simply not sucient. We thereby motivate the need for an infrastructure application server capable of handling the nontrivial application-level adaptation tasks outlined in this chapter.

1 Datatype-Speci c Distillation and Re nement We propose three design principles that we believe are fundamental for addressing client variation most eectively.

29 1. Adapt to client and network variation via datatype-speci c lossy compres-

sion. Datatype-speci c lossy compression mechanisms can achieve much better compression than \generic" compressors, because they can make intelligent decisions about what information to throw away based on the semantic type of the data. For example, lossy compression of an image requires discarding color information, high-frequency components, or pixel resolution. Lossy compression of video can additionally include frame rate reduction. Less obviously, lossy compression of formatted text requires discarding some formatting information but preserving the actual prose. In all cases, the goal is to preserve information that has the highest semantic value. We refer to this process generically as distillation. A distilled object allows the user to decide whether it is worth asking for a re nement: for instance, zooming in on a section of a graphic or video frame, or rendering a particular page containing PostScript text and gures without having to render the preceding pages. 2. Perform adaptation on the y. To reap the maximum bene t from distillation and re nement, a distilled representation must target speci c attributes of the client. The measurements we report in Section 2 show that even on today's hardware, the distillation time for typical images and rich-text is small in practice, so that end-toend latency is reduced because of the much smaller number of bytes transmitted over low-bandwidth links. On-demand distillation provides an easy path for incorporating support for new clients, and also allows distillation aggressiveness to dynamically track (e.g.) signi cant changes in network bandwidth, as might occur in vertical handos between dierent wireless networks [128]. We have successfully implemented useful distillation \workers" that serve clients spanning an order of magnitude in each area of variation. 3. Move complexity away from both clients and servers. Application partitioning

30 arguments have long been used to keep clients simple [132]. However, adaptation through a shared infrastructural proxy enables incremental deployment and legacy client support, as we argued in Chapter 2. Therefore, on-demand distillation and re nement should be done at an intermediate proxy that has access to substantial computing resources and is well-connected to the rest of the Internet. Table 1 lists the \axes" of compression corresponding to three important datatypes: formatted text, images, and video streams. We have found that order-of-magnitude size reductions are often possible without destroying the semantic content of an object (e.g. without rendering an image unrecognizable to the user).

Semantic Type Image Text Video

Speci c encodings

Distillation axes

GIF, JPEG, Resolution, PPM, color depth, PostScript color palette Plain, HTML, Richness (heavily PostScript, formatted vs. PDF simple markup vs. plaintext) NV, H.261, Resolution, frame VQ, MPEG rate, color depth, progression limit (for progressive encodings)

Table 1: Three important types and the distillation axes corresponding to each.

2 Performance of Distillation and Re nement On Demand We now describe and evaluate datatype-speci c distillers for images and rich-text.1 The goal of this section is to support our claim that in the majority of cases, end-to-end latency is reduced by distillation, that is, the time to produce a useful distilled object on 1

A distiller for real-time network video streams is described separately, in [2].

31 today's workstation hardware is small enough to be more than compensated by the savings in transmission time for the distilled object relative to the original.

2.1 Images We implemented an image distiller called gifmunch , which performs distillation and re nement for GIF [64] images, and consists largely of source code from the NetPBM Toolkit [111]. Figure 1 shows the result of running gifmunch on a large color GIF image of the Berkeley Computer Science Division's home building, Soda Hall. The image of Figure 1a measures 320 200 pixels|about 1/8 the total area of the original 880 610|and uses 16 grays, making it suitable for display on a typical handheld device.

Left (a) is a distilled image of Soda Hall, and above (b) illustrates refinement. (a) occupies 17 KB at 320x200 pixels in 16 grays, compared with the 492 KB, 880x600 pixel, 249 color original (not shown). The refinement (b) occupies 12 KB. Distillation took 6 seconds on a SPARCstation 20/71, and refinement took less than a second.

Figure 1: Distillation example Due to the degradation of quality, the writing on the building is unreadable, but the user can request a re nement of the subregion containing the writing, which can then be viewed at full resolution. Table 2 shows the latencies of distilling a number of GIF images with three different sets of distillation parameters, and the resulting size reductions. The measurements were taken on a lightly loaded SPARCstation 20/71 running Solaris 2.4. The three sets of

32

Original GIF size KB 48 153 329 492

Reduce to 8KB

A Framework For Separating Server Scalability and ... - CiteSeerX

A Framework For Separating Server Scalability and ... - CiteSeerX

Suggest Documents

A Scalability Model for ECS's Data Server - CiteSeerX

Scalability of content-aware server switches for cluster ... - CiteSeerX

Performance and Scalability of Client-Server Database ... - CiteSeerX

Performance and Scalability of Client-Server Database ... - CiteSeerX

A General Framework for Scalability and Performance Analysis of DHT ...

Server Network Scalability and TCP Offload - Usenix

A Scheduling Framework for Web Server Clusters with ... - CiteSeerX

A separating framework for increasing the timestep in ... - Springer Link

Separating Optimism and Pessimism: A Robust ... - CiteSeerX

A Miniature System for Separating Aerosol Particles and ... - CiteSeerX

J2EE Server Scalability through EJB Replication

J2EE Server Scalability through EJB Replication

Performance scalability of the JXTA P2P framework - CiteSeerX

A Componentizable Server-Side Framework for Building Remote and ...

Performance Scalability of a Multi-Core Web Server

2 A Server-Assigned Spatial Crowdsourcing Framework - CiteSeerX

2 A Server-Assigned Spatial Crowdsourcing Framework - CiteSeerX

Separating Sequence Overlap for Automated Test ... - CiteSeerX

lmbench3: measuring scalability - CiteSeerX

The Impact of SCTP on Server Scalability and Performance - MICE

A Framework for Formalizing Inconsistencies and ... - CiteSeerX

A FRAMEWORK FOR INTERPERSONAL ATTITUDE AND ... - CiteSeerX

a framework and methodology for knowledge ... - CiteSeerX

Separating linear forms for bivariate systems - CiteSeerX