Consistency and Locking for Distributing Updates to Web Servers

0 downloads 0 Views 130KB Size Report
guarantee describes the interaction between distributed clients that concurrently read ... Consequently, a Web server can see files that are in the process of being mod- ... reducing the number of messages sent between a client and server when .... protocol (HTTP) does not implement consistency between the browser cache.
Consistency and Locking for Distributing Updates to Web Servers Using a File System Darrell D. E. Longy Department of Computer Science University of California, Santa Cruz [email protected]

Randal C. Burns and Robert M. Rees Department of Computer Science IBM Almaden Research Center frandal,[email protected]

Abstract Distributed file systems are often used to replicate a Web site’s content among its many servers. However, for content that needs to be dynamically updated and distributed to many servers, file system locking protocols heavily utilize the network between the Web servers which results in high latency when accessing content. Poor performance arises because the Web-serving workload differs radically from the assumed workload. To address the shortcomings of file systems for distributing Web content, we introduce the publish consistency model well suited to the Web-serving workload. We implement this consistency model in the producer-consumer locking protocol and conduct a comparison of this protocol against other file system protocols by simulation. Simulation results show that for the Webserving workload producer-consumer locking removes almost all latency due to protocol overhead and significantly reduces network load.

1 Introduction Distributed file systems have become a key technology for sharing content among many Web servers. They can be used to provide scalability and availability [13]. In this work we are particularly interested in large scale systems that serve dynamically changing Web content with a distributed file system: files that contain HTML and XML that are concurrently updated (written) and served to Web clients (read). A good example of such a system is IBM’s Olympics web site [27] which uses the DFS file system [19] to replicate changing data like race results at a global scale. The distributed file system takes responsibility for providing synchronized access and consistent views of shared data, shielding Web servers from complexity by moving these tasks into the file system. The file y

The work of this author was performed while a Visiting Scientist at the IBM Almaden Research Center.

1

system responsibilities include data consistency and cache coherency. In a file system, the data consistency guarantee describes the interaction between distributed clients that concurrently read and write the same file. Most often, file systems implement sequential consistency [21], where all process see changes to data as if they were sharing a single memory and cache. While this model is correct for most applications, Web servers do not require sequential consistency because the data that they serve through the HTTP protocol require no instantaneous consistency guarantees. Web servers benefit from weakening the consistency guarantee in order to achieve better performance. Parallelism in the workload leads to performance problems when using an unmodified distributed file system to share dynamically updated data among Web servers. File systems assume that file data has locality to clients, a given file has affinity to a few clients at any one time. However, when serving Web content, if we assume that load is balanced uniformly among all Web servers, all Web servers have equal interest in all the files concurrently. For content that changes frequently, file system consistency protocols incur significant network utilization and exhibit high latency when replicating these changes. Performance concerns aside, sequential consistency is the wrong model for updating Web content because it can result in errors when Web clients parse the HTML or XML content the files contain. When a reader (Web server) and a writer (content publisher) share a sequentially consistent file, the reader sees each and every change to file data. Consequently, a Web server can see files that are in the process of being modified or rewritten. Depending upon the update model, such files are either incomplete or contain data from both the new and old version and cannot generally be parsed. The Web server distributes these files out to its Web clients where parsing errors occur. This happens whenever a Web server distributes a incompletely written file. However, before the writer begins and after the writer finishes, the file contains valid content. A more suitable update model allows the Web server to continue serving the old version of the file until the writer is finished and then the file system atomically updates the readers’ view to the new version. We call this publish consistency. We present a more formal definition of publish consistency based on the concepts of sessions and views. In a file system, the open and close function calls define a session against a file. Sessions are divided into read sessions that do not modify file data and write sessions that do. Associated with each session is that session’s image of the file data that we call a view. Data is publish consistent if (1) all writes occur in the context of serialized write sessions, (2) write views are sequentially consistent, (3) a reader’s view does not change during a session, (4) a reader creates a session view consistent with the close of the most recent write session of which it’s aware, and (5) readers eventually become aware of all write sessions. When opening a 2

file, a reader obtains a recent image of the file data and uses it for a whole session. Modifications to files are not propagated to all readers instantaneously. However, all clients learn about all modifications eventually. Publish consistency not only suits the semantics of serving Web data, the loose synchronization between readers and writers allows us to implement cache coherency data locks that improve performance. We took the Web-serving workload into account when building a publish consistency locking protocol and devised producer-consumer locks. These locks efficiently replicate changes from a computer holding a producer lock (for writing) to the many computers holding consumer locks (for reading). Producer-consumer locks eliminate almost all protocol latency because each Web server that holds a consumer lock always has a recent view that it can serve directly out of its cache. Producer-consumer locks also reduce network utilization by reducing the number of messages sent between a client and server when updating files to reflect recent changes.

1.1 A Direct Access File System A brief digression into the file system architecture in which we implement locking helps to motivate our solution and its advantages. In the Storage Tank project at IBM research, we are building a distributed file system on a storage area network (SAN). A SAN is a high speed network designed to allow multiple computers to have shared access to many storage devices. The goal of Storage Tank is to take advantage of SAN hardware to provide computers with the scalability and management benefits of a distributed file system without the performance penalty expected from a client/server file system. For our distributed file system on a SAN, clients access data directly over the storage area network. Most traditional client/server file systems [28, 16, 19, 7] store data on the server’s private disks. Clients dispatch all data requests to a server that performs I/O on their behalf. Unlike traditional file systems, Storage Tank clients perform I/O directly to shared storage devices on a SAN (Figure 1). This direct data access model is similar to the file system for Network Attached Secure Disks [10] that uses shared disks on an IP network and the Global File System [25] for SAN attached storage devices. Clients communicate with Storage Tank servers over a logically separate network, called a control network, to obtain file metadata. In addition to serving file system metadata, the servers run distributed protocols for cache coherency, authentication, and the allocation of file data. In our architecture metadata and data are accessed and stored separately [23, 5, 14]. Metadata, including the location of the blocks of each file on shared storage, are kept on high-performance storage at the server. 3

Control Network

Client

Client

Client

Storage Area Network (SAN)

Server Cluster

Server

Server

Server

Disk

Disk

Tape

Figure 1: Schematic of the Storage Tank client/server distributed file system.

The shared disks contain only the blocks of data for files, and do not contain metadata. In this way, the shared devices on the SAN can be optimized for data traffic, block transfer of data, and the server private storage can be optimized using database techniques to provide high transaction rates for a metadata workload – frequent small reads and writes. We will show that the architecture of our SAN file system and producer-consumer locks have synergy. In particular, by separating metadata traffic from data traffic, we efficiently distribute new versions of files by sending only metadata about the new version to the client rather than transmitting a whole new version of the file. Clients obtain file data only when they require it to service HTTP requests and then only need to acquire the changed portions, rather than re-reading the whole file. This differs from non-SAN producerconsumer implementations in which the whole contents of the new file must be pushed to the consumers. Our experiments present comparisons of producer-consumer locking and other locking systems for both SAN and traditional distributed file systems.

2 Locking for Wide Area Replication The limitation of using a file system for the wide-area replication of Web data is performance. Generally, file systems implement data locks that provide sequential consistency between readers and writers. For replicating data from one writer to multiple readers, sequential consistency produces lock contention, resulting in a flurry of network messages to obtain and revoke locks. Contention loads the network and results in slow application progress. We present an example Web server application to illustrate lock contention, propose

4

SQL/ODBC

HTTP

HTTP

HTTP

Web Server

Web Server

FS Client

FS Client

Read

Read

Read

FS Cache

FS Cache

FS Cache

Post

DBMS Trigger FS Client Write FS Cache

.............

Web Server FS Client

Locking Protocol Locking Protocol FS Server

Figure 2: Example configuration for replicating data among many Web servers.

a new set of locks that address this problem, and show how our SAN file system architecture takes unique advantage of these locks.

2.1 Sequential Consistency We again use posting race results in an Olympics Web site [27] as an example application to show how sequential consistency locking works for replicating Web servers. A possible configuration of the distributed system (Figure 2) has hundreds or thousands of Web servers integrated with a DBMS. Race results and other frequently changing data are inserted into the database through an interface like SQL or ODBC. The insertion updates database tables and sets off a database trigger. The trigger causes the databases to author new content in a file. The file system is responsible for distributing the new version of the content to the remainder of the servers in a consistent fashion. Note that I/O is performed out of the file system caches that the locking protocol keeps consistent. Servers take HTTP requests that result in data reads from the file system. Poor performance occurs for a sequential consistency locking protocol when updates occur to active files. For this example, we assume that the file is being read concurrently by multiple Web servers before, during, and after new results are being posted and written. For race results during the Olympics, a good example would be a page that contains a summary of how many gold, silver and bronze medals have been won by each country. This page has wide interest and changes often. We use sequential consistency locks

5

on whole files for this example. The same locks used by the DFS [19] and Calypso [7] file systems. The system’s initial configuration has the clients at all Web servers holding a shared lock (S ) for the file in question. Figure 3(a) displays the locking messages required to update file data in a timing diagram The best possible performance of sequential consistency locking occurs when the file system client at the database is not interrupted while it updates the content. In this case, the writing client requests an exclusive lock (X ) on the file. The exclusive lock revokes all concurrently held read locks. When the writer completes, the client at each Web server must request a shared lock on the file to reobtain the privilege to read and serve the Web content. All messages to and from the Web server occur for every server. In the best case, four messages, revoke, release, acquire, and grant, go between each Web server and the file system server. For large installations, multiple messages between Web servers and the file server consume time and network bandwidth prohibitively. The situation becomes much worse when the file system at the DBMS is interrupted while updating the file. In file system protocols, data locks are preemptible so that the system is responsive when multiple clients share file data. In our example, clients that request read locks on the file data revoke the DBMS’s exclusive lock. The file system at the DBMS continues to request a write lock until it completes writing the file. However, reobtaining the lock may take some time and the file system may be interrupted multiple times while attempting to write a file. The more highly replicated that file data are, the more Web servers there are, and the more likely the update will be interrupted. Contention for the lock can stall the update indefinitely. The example application presents a non-traditional workload to the file system. The workload lacks the properties a file system expects and therefore operates inefficiently. Specifically, the workload does not have client locality [18] – the affinity of a file to a single client. Instead, all clients are interested in all files. Performance concerns aside, sequential consistency is the wrong model for updating HTML and XML. As we discussed in Section 1, reading clients cannot understand intermediate updates and byte-level changes. So sequential consistency leaves content invalid when Web servers see modifications to files at a fine granularity. The problem is that sequential consistency has no way to synchronize the file view that the Web server sees with application state. Publish consistency addresses this problem by allowing writers to create a session against a file and not distribute updated views until the session ends. The intermediate update problem aside, sequential consistency is still not useful for serving Web content. Since the Internet client/server protocol (HTTP) does not implement consistency between the browser cache and the server, a sequential consistency guarantee at the server cannot be passed on to the ultimate consumers 6

DBMS

FS Server

DBMS

Web Server

acquire(X)

FS Server

Web Server

open(W)

revoke respond(data)

release(data) grant(X,data)

Update Results Write File close(data)

Write File

invalidate

acquire(S)

Discard Cache

revoke reacquire(R)

release(data)

respond(data)

grant(S,data)

(a) Sequential consistency.

(b) AFS consistency

Figure 3: Locking protocols for distributing data to Web servers.

of the data at their browsers. Therefore, a weaker consistency model appears identical to Web users and can be more efficiently implemented.

2.2 AFS Consistency For existing distributed systems, the Andrew file system (AFS) comes closest to publish consistency. AFS does not implement sequential consistency, rather it chooses to synchronize file data between readers and writers when files opened for write are closed. Previous research [18] argues that data are shared infrequently to justify this decision. DFS [19], the successor product to AFS, reversed this decision. DFS found that distributed applications sometimes require sequential consistency for correctness. While sequential consistency is now the de facto standard for file systems, AFS consistency remains useful for environments where concurrent action and low overhead override the need for sequential consistency. Figure 3(b) shows the protocol used in AFS to handle the dynamic updates in our example. At the start of our timing diagram all Web servers hold the file in question open for read. In AFS, an open instance registers a client for call-backs: messages from the server invalidating their cached version. The DBMS opens the file for writing, writes the data to its cache, and closes the file. On close the DBMS writes it cached copy of the file back to the server. The file server notifies the Web servers of the changes by sending

7

invalidation messages to all file system clients that have registered for a call-back. When compared with the protocol for sequential consistency locking, AFS saves only one message between client and server. However, this belies the performance difference. In AFS, all protocol operations are asynchronous; i.e., operations between one client and a server never wait for actions on another client. Using the AFS protocol, the DBMS obtains a write instance from the server directly and does not need to wait for a synchronous revocation call to all clients. Another significant advantage of AFS is that the old version of the file is continuously available at the Web servers while it is being updated at the DBMS. The disadvantage of AFS is that it does not correctly implement publish consistency. The actual policy implemented at the client is to forward changes to a file to reading clients whenever a writing client submits modifications to the file system server. In general, AFS clients write modified data to the server when closing a file which results in publish consistency. However, sometimes AFS writes data to a server before the file has been closed. This occurs in two ways. First, when a file has been open for more than 30 seconds a timer writes modified file data to a server. Second, if a client’s cache becomes full it may send modified file data to a server to make free space in its cache. In either case reading clients can see partial updates which is a violation of publish consistency.

2.3 Publish Consistency We offer a locking option that improves performance when compared with AFS and sequential consistency, implements publish consistency correctly, and takes advantage of the SAN architecture. Our locking system reduces the protocol latency that a Web Server sees when accessing data to nearly nothing. There is a one time cost to access data initially, but all subsequent reads are immediate. Furthermore, for the Web-serving workload, our protocol lessens the network utilization by reducing the number of messages to keep a file consistent among many Web servers. We capture publish consistency in two data locks: a producer lock (P ) and a consumer lock (C ). Any client can obtain a consumer lock at any time, and a producer lock is held by a single writer. Clients holding a consumer lock can read data and cache data for read. Clients holding a producer lock can write data and allocate and cache data for writing, with the caveat that all modifications use copy-on-write1 semantics to retain a read-only version available to readers and an active version used by the writer. Because clients hold 1

Copy-on-write places new modifications to a block of a file in a new storage location, leaving the existing blocks in their

current location. Subsequent modifications to the same block are written to the existing storage location. File systems can use copy-on-write for efficient write performance and as part of a point-in-time snapshot feature [15].

8

DBMS

FS Server

Web Server

Storage

Lock Held = C

DBMS

FS Server

Web Server Lock Held = C

acquire(P)

acquire(P)

grant(P)

grant(P)

Write File write(data) publish

publish(data) update

Discard Segments

update(data)

read respond(data)

(a) with SAN

(b) without SAN

Figure 4: Producer-consumer locking protocol for distributing data to Web servers.

P

C locks concurrently, the protocol must ensure that the P lock does not overwrite any of the old versions of the file. Copy-on-write permits the client holding the P lock to proceed with writing while the and

remainder of the system reads the old version. File publication occurs when the

P

lock holder releases its lock. Upon release the server notifies all

clients of the new location of the file. Recall that clients read data directly from the SAN. Servers do not transmit data to the clients. Having told all clients about the new version of the file, the server reclaims the blocks that belong to the old version that are not in the new version. Clients that receive a publication invalidate the old version of the file and, when interested, read the new version from the SAN. The client need not invalidate the whole new file. Instead, it only needs to invalidate the portions of the file that have changed. Because changed blocks have been written out-of-place, the client knows that any portions of the file for which the location of the physical storage has not changed have not been written and that the client’s cached copy is valid. Looking at our example using

C and P

locks (Figure 4(a)), we see that the protocol uses fewer mes-

sages between the writing client and server than both AFS and DFS, and, like AFS, all operations are asynchronous. The DBMS requests and obtains a producer (P ) lock from the server. The lock is granted immediately. While the DBMS holds the P lock, Web servers continue to read and serve the current version of the file. When the DBMS completes its updates, it closes the file and releases the

P

lock. The writer

must release the P lock on close to publish the file. The server sends updates that describe the location on the SAN of the changed data to all clients that hold consumer locks. The clients immediately invalidate the 9

affected blocks in their cache. At some later time, when clients generate interest in the changed data, they read the data from the SAN. The producer-consumer model has performance advantages when compared to sequential consistency and AFS locking and it also correctly implements publish consistency. Producer-consumer locking saves a message when distributing updates to readers. In producer-consumer locking changes are pushed to clients in a single update message. In contrast, AFS and sequential consistency locking require data to be revoked and then later obtained. With C and P locks, the Web servers are not updated until the the P lock holder releases the lock. This means that the P lock holder can write data to the SAN and even update the location metadata at the server without publishing the data to reading clients.

C and P

locks dissociate updating readers’ cache contents

from writing data and therefore correctly implement publish consistency. Producer-consumer locking also leverages the SAN environment to improve performance. First, network operations are distributed across two networks, an IP network for protocol messages and a SAN for data traffic. This mitigates the network bottleneck that arises when servers transmit data over the same network on which they conduct the locking protocol. The two networks also allow clients to update data in their cache asynchronously and at a fine granularity. Unlike AFS the whole file does not need to be replaced at once. Instead, because C lock updates invalidate cached data at a fine granularity, changes to data are read on a block by block basis from storage. This reduces the total amount of data involved in publishing the new file version and spreads the acquisition of that data over more time. For our SAN file system, this design makes the hidden assumption that storage devices cache data for read. The Web servers read data directly from storage, and they all read the same blocks. A storage controller services reads out of its cache at memory speed, rather than at disk speed. Without a cache, all reads would go to disk and any communication savings of producer-consumer locking could potentially be lost by repeatedly reading the same data with slow I/O operations. We assert that this assumption is reasonable because SAN storage requires a controller card for the SAN network interface that generally implements reliability (RAID [8], striping [5], and mirroring) and caching. Producer-consumer locking is not restricted to SANs, and analogous protocols are possible for traditional file systems (Figure 4(b)). We will show that even for traditional architectures producer-consumer locking reduces network utilization and has near-zero read latency. However, producer-consumer locking on traditional file systems exacerbates the one performance problem with this system – a phenomena we call overpublishing. Overpublishing occurs when the server publishes versions that clients do not read. The 10

publish message has network costs that are never recovered by avoiding future lock requests. Publishing assumes that clients read each version of the file. If this assumption holds, the cost of publishing saves network bandwidth when compared with the client later requesting the data from the server. On a SAN, overpublishing only sends location information, a small amount of data. The client reads the published file data only when requested. However, on a traditional file system with a write-dominated workload, the whole file contents are transmitted, and the network load from overpublishing could potentially become a performance problem. The overpublishing phenomena is not of concern for Web serving in general and in particular for Web serving with Storage Tank. Producer-consumer locking is designed to address a specific, yet important, workload. The Web-serving workload consists mostly of reads on many Web servers: writes are less frequent and write-sharing non-existent. The frequency of reads prevent this workload from exhibiting overpublishing. Furthermore, producer-consumer locking is well suited to the SAN environment where updates to files consist only of metadata. A SAN-based file system mitigates any negative effect of overpublishing. As an aside, in Storage Tank we provide other file data locking options to meet the needs of other workloads. In particular, we offer two classes of sequential consistency locking for the standard file-system workload and database workload respectively [4].

2.4 Implementation Comments Producer-consumer locking requires support from both the file system client and server to implement views and guarantee session semantics. In addition to copy-on-write, the file system client must permit multiple open views of the same file concurrently. This need arises when a client initiates overlapping sessions before and after it receives a publish message. The file system server must maintain metadata and allocated data on an old version of a file as long as active sessions on that version persist. The server can garbage collect metadata and deallocate data only after all sessions close. AFS and sequential consistency maintain an instantaneous concept of the contents of a file and do not require these features.

3 Simulation Results We constructed a discrete event simulation of the presented locking protocols in order to verify our design of producer-consumer locking and understand the performance of the various protocols for a Web-serving workload. The goal of this simulation is to compare the network usage and latency of sequential consistency,

11

AFS consistency, and producer-consumer locking. In an attempt to get the most fair comparison possible, we assumed that all protocols run on the same hardware. We present results both for file systems running on traditional hardware without a SAN and for the locking protocols running on a SAN file system architecture. For simulations on SAN hardware, we had to reformulate AFS and sequential consistency protocols to separate data and metadata. The discrete event simulation was constructed using the YACSIM toolkit [17] that was developed to model parallel computer systems. We simulated the locking protocols through four different components, a reading client, a writing client, a network and a protocol server. These components conform with Figure 2. The network component of our simulation is modeled as an ideal fast Ethernet, carrying 100 megabits per second. Ideal means that the network is fair and carries its full bandwidth up to the channel capacity. Fair indicates that messages are delivered in the order which computers present them to the network. In reality, a network is not guaranteed to be fair, but this is the expected behavior, particularly when network load is light. Furthermore, unfair behavior varies in type and frequency depending upon the network hardware. The locking protocols themselves can be run on any network. So, we factor hardware specifics out by modeling the network as ideal and fair. We also restrict our study to lower network utilization where these factors have little or no effect. These decisions allow us to simulate the protocol rather than network behavior. The protocol server manages the distribution of locks by communicating with the reading and writing clients over the network. The protocol server implements locking protocols for sequential consistency, AFS consistency, and publish consistency through producer-consumer locks. The writing client modifies a file which is then replicated among all of the reading clients. The writing client modifies the file periodically. Periodic updates represent dynamic data to be served that changes at fixed intervals. Examples of data that are modified in this fashion include stock quotes, news wire, and sports scores. The reading clients request data according to a Poisson process – an exponential random variable describes the time between read requests on a given client. Experiments with other random variables, such as heavy tailed for network traffic [24, 6] and uniform, indicate that our results are invariant among different client random processes. Our choice to drive the simulation with an artificial workload reflects availability of data. There are no Web traces known to the authors that address dynamically updated data where activity against a single page can be isolated. Most known traces measure a large body of static content [20]. We conduct multiple experiments varying the write interval and number of clients while keeping the read rate constant. Write intervals are varied to explore the effects of lock contention on the system. At 12

lower write intervals clients refresh their cache their cache with new versions more frequently. At higher write intervals clients obtain a new version of a file less frequently and service many read requests against that version out of their cache. For all clients reads occur according to a Poisson process with a mean rate of 1 read per second at each client. We vary write intervals among 1, 10, and 60 seconds. For the 1 second case, each client is expected to read each new version once. Consequently, we expect that each read request requires a network message. We obtain results for both traditional file systems (without SAN hardware) and the Storage tank file system (with SAN hardware). SAN and non-SAN results are presented side-by-side for the same experimental parameters and therefore can be compared directly. For the SAN experiments, results express latency and network utilization on the control network only, not the storage network. In this way we isolate the behavior of the locking protocol. The SAN results show drastically lower network utilization and latency. However, we comment that all data tasks are performed on a separate network that our simulation does not model. Therefore, these results are not a comparison of SAN file systems versus traditional file systems. Rather, the results describe the relative performance of locking protocols in different hardware environments.

3.1 Latency The latency results (Figure 5) describe the time interval between an incoming read request and the file being available at the client for reading. When the file is not immediately available at the client, the client is assessed the time required to communicate with the server, the server to obtain the data through the protocol, and deliver the data. When the data is already at the client, the read occurs in no time. Because we do no charge any time for processing the file on the local system, we measure protocol latency, rather than overall read latency. Results (Figure 5) show latency in seconds as a function of the number of reading clients in the system for write intervals of 1, 10, and 60 seconds. Each graph contains curves for publish consistency (PC), sequential consistency (SC), and AFS consistency (AFS). For publish consistency, latency is negligible at all times. In this protocol, a read lock is obtained once at the start of the simulation and is never revoked. A reading client always has data available. Consequently, this result is trivial. When the writing client modifies data, the server sends the reading client an update, which replaces the old version. The client always holds a valid copy of a file. For the sequential consistency and AFS protocols, latency increases super-linearly with the number of reading clients. As the number of clients increases, the protocols for invalidating data and re-acquiring the 13

1 second 0.2

3 PC AFS SC

0.15

x 10

1 second (SAN)

−4

PC AFS SC

2.5

Latency(s)

Latency(s)

2 0.1

0.05

1.5 1 0.5

0 0 −0.05 0

2

4

6

8

Number of Clients

10

−0.5 0

12

10 seconds 0.8 0.7

PC AFS SC

x 10

15

20

25

10 seconds (SAN)

−4

PC AFS SC

7 6

0.5

10

Number of Clients

Latency(s)

Latency(s)

0.6

8

5

5

0.4

4

0.3

3

0.2

2

0.1

1

0 0 −0.1 0

20

40

60

Number of Clients

80

100

0

60 seconds 0.5 PC AFS SC

x 10

5

40

60

Number of Clients

80

100

60 seconds (SAN)

−4

PC AFS SC

4

0.3

Latency(s)

Latency(s)

0.4

6

20

3

0.2

2

0.1

1 0 0 −0.1 0

50

100

150

Number of Clients

200

0

50

100

Figure 5: Read latency as a function of the number of clients.

14

150

Number of Clients

200

file at the client takes more time. When the writing client modifies the file, each reading client must have its cache invalidated and then must request a new lock from the server to reobtain data after an invalidation. All of these messages share the same network resource and all want to use the resource at the same time. Because messages are submitted to the network at the same time, some must wait for the resource to become available before being transmitted. More clients results in more resource contention and therefore more latency. Contention accounts for the super-linear growth of the latency curves for AFS and sequential consistency protocols and synchronization accounts for the difference between these curves. Sequential consistency invalidates read copies before writing data whereas AFS updates data and then invalidates the readers data. This means that sequential consistency adds latency while the clients wait for the writer to obtain its write lock, modify the file, and release the lock. This effect is more visible when lock contention is high in the 1 second trial. For lower lock contention in the 60 second trial, more reads are serviced out of the cache, latency is lower for the same number of clients, and network contention is a more important factor than synchronization. Our simulation favors sequential consistency protocols slightly. We implemented an idealized sequential consistency protocol in which we did not model livelock: reading and writing clients acquiring and revoking locks without making any progress towards reading and writing data. This simulation of sequential consistency should be considered “best case” results. The behavior of livelock depends upon many subtle factors, including how data are written and the order in which locking and file system routines execute on a client. Modeling these factors fairly and trying to capture the conditions under which livelock occurs is outside the scope of a protocol simulation. Reduced read latency is the key performance advantage of publish consistency when compared with sequential and AFS consistency. Producer-consumer removes all read latency. This dramatic improvement is most important when the number of clients becomes large because read latency grows faster than linearly in the number of clients for AFS and sequential consistency locking.

3.2 Network Usage The network usage results (Figure 6) describe the utilization of the network resource shared by Web servers. Utilization can be considered the fraction of the networks total capacity that a protocol uses or equivalently, for ideal and fair networks, the percentage of time that the network is busy. For all locking protocols network usage increases linearly with the number of reading clients. This 15

1 second

1 second (SAN)

1.2

0.045 PC AFS SC

1

PC AFS SC

0.04 0.035

Utilization

Utilization

0.8

0.03

0.025

0.6 0.4

0.02

0.015 0.01

0.2 0.005 0 0

5

10

Number of Clients

15

0 0

20

5

10

Number of Clients

15

20

10 seconds (SAN)

10 seconds 0.025

0.7 PC AFS SC

0.6

PC AFS SC

0.02

Utilization

Utilization

0.5 0.015

0.4 0.3

0.01

0.2 0.005 0.1 0 0

20

40

60

80

Number of Clients

0 0

100

60 seconds 0.25

0.2

7 PC AFS SC

x 10

6

20

40

60

Number of Clients

80

100

60 seconds (SAN)

−3

PC AFS SC

Utilization

Utilization

5 0.15

4 3

0.1

2 0.05 1 0 0

50

100

150

Number of Clients

0 0

200

50

100

150

Number of Clients

200

Figure 6: Network utilization for data locking protocols between reading and writing clients as a function of the number of reading clients.

16

relationship holds until the network reaches capacity. Unlike latency contention has no effect on network simulation results. Protocols always send the same number of messages to distribute updates. The network complexity of the different protocols accounts for the differences in the results. The AFS protocol uses one less message between the server and the writing client (see Figure 3). Since AFS and sequential consistency differ only in the writer’s protocol, the difference is constant over any number of readers for a given write rate. This difference can be seen more clearly in the 1 second write interval experiments where write traffic makes up a more significant portion of the network load. The producer-consumer locking protocol saves messages between the servers and reading clients (see Figures 3 and 4) when compared with AFS and sequential consistency. Therefore, as the number of clients increase, the savings increase commensurately. The network usage of producer-consumer locking therefore grows at a different rate than AFS and sequential consistency protocols. In traditional file systems not all messages are equal. The messages saved by AFS and publish consistency locking are protocol only messages and do not contain data. They are not as expensive as the publish, grant, and release messages that contain file data. However, in the SAN environment no messages contain data and the reduced message complexity has a more significant effect. In other words, the SAN file system architecture exposes the differences between file system protocols by removing the more expensive operation of transmitting data between clients and servers.

3.3 Overpublishing Producer-consumer locks exploit the properties of the Web-serving workload to reduce latency and network utilization. However, as we observed in Section 2.3, when workload assumptions are incorrect, producerconsumer locks exhibit a behavior called overpublishing that causes it to use more network bandwidth than AFS or sequential consistency locks. Overpublishing occurs when the server publishes versions that clients do not read. The publish message has network costs that are never recovered by avoiding future lock requests. To simulate overpublishing we varied the rate at which client read requests are serviced, focusing on read rates that are slower than the rate at which a file is being written and published. We ran the simulation for 100 reading clients. In Figure 7, we see that when reads occur less frequently than writes in non-SAN file systems, producer-consumer locking utilizes the network resource more heavily. Sequential consistency and AFS locking exhibit network usage that grows in proportion with the increasing read rate. However, producer-consumer locking uses about the same amount of network resource for all read rates. This matches 17

Overpublishing (SAN) 0.12

4

x 10

3.5 0.1

Utilization

Utilization

3

Overpublishing (SAN)

−3

PC AFS SC

2.5

0.08

0.06

2

1.5

PC AFS SC

1

0.04 0.5 0.02 0

0.5

1

1.5

2

Read rate/write rate

2.5

0 0

3

0.5

1

1.5

2

Read rate/write rate

2.5

3

Figure 7: Network utilization as a function of the per client read rate.

our intuition. For a given write rate and a fixed number of clients, the amount of network bandwidth should be the same. The only network traffic, excepting startup costs, is the cost of publishing data to the clients. For a fixed number of clients, the number of protocol messages is approximately the same regardless of how often the clients read. Overpublishing is insignificant for SAN file systems because the fixed cost of publishing just metadata is so small. Producer-consumer locking, while clearly the best choice for the wide-area replicating workload, does not perform well on all workloads. In particular, this solution does not apply to workloads that are dominated by write. Exploring overpublishing is important so that we understand the properties of producer-consumer locking, but has little practical effect on our system because we are targeting the read-dominated Webserving workload.

3.4 Discussion The authors made many decisions about design and parameter selection in this simulation in an effort to most accurately characterize system performance. We briefly discuss some of our decisions and their effect on simulation results. We found that the conclusions we draw from our simulation do not depend on the absolute values of the write interval and the read rate for clients. In fact, the values are not nearly as important as their ratio. The ratio of these quantities determines the number of read requests that can be serviced out of the cache and therefore the amount of lock traffic per read request. We chose the absolute values of read rates and 18

write intervals so that they both matched our intuition of the Web-serving workload and produced suitable results in the described hardware environment. Experiments at different absolute values with the same ratios produce results of the same form; i.e., the shape of the graphs are the same and the same conclusions can be drawn. Simulation results characterize the performance of distributed locking protocols against a single file, whereas a real system has many files open concurrently. It is hard to deduce from simulation results the exact effect that multiple files would have on latency. However, simulation results do give some intuition as to the scalability of network usage for these protocols. Through simulation we determine the amount of network resource we require to keep a given file consistent for a specified workload. Observing that network usage does not vary as a function of lock contention (see Section 3.2), the aggregate network utilization for many files should be the sum of the utilization of individual files. However, to draw conclusions about scalability beyond this statement requires measurement of a live system or a trace driven simulation against many files.

4 Related Work Consistency describes how the different processors in a parallel or distributed system view shared data. Lamport defined the problem for a multiprocessor and introduced sequential consistency [21]. Subsequent work on multiprocessors and shared memories introduced many other similar but more liberal models [1]. Not all distributed systems require instantaneous consistency guarantees and updates to data views can be propagated without time constraints. This includes weak consistency [11] with guarantees that view updates are propagated eventually. Consistency in file systems is derived directly from multiprocessor research. Most modern file systems, including DFS [19], Calypso [7], Frangipani [26], xFS [2], and CIFS [22], implement sequential consistency. Other file systems implement consistent views in a less strict manner. NFS updates views based on time [28]. Other systems like AFS [18] and Ficus [12] update views whenever the server is aware of changes, but do allow distributed data views to be inconsistent at any point in time. However, no file systems known to the authors provide delayed update propagation based on sessions views as we do with producer-consumer locking. Research on view management in database systems also has similarity to publish consistency because databases implement multiple versions with session semantics. However, an inspection of the details reveals

19

that the goals, data structures, and techniques share little in common. This area of database research is broad and well developed. Bernstein, Hadzilacos and Goodman present an nice overview of concurrency control and view management [3].

5 Conclusions We have identified and remedied the semantic and performance shortcomings of using a file system for distributing dynamic updates to Web content in a large scale system. With the sequential consistency implemented by most file systems, Web servers can see incomplete changes to file data that can result in parsing errors at Web clients. We have developed the publish consistency model that gives Web servers session semantics to files and allows file views to atomically change. Publish consistency prevents parsing errors. From publish consistency, we derived the producer-consumer cache coherency locks that reduce network utilization among Web servers and eliminate protocol latency. Simulation results verify these claims. Producer-consumer locks reduce the number of messages required to distribute updates to file system clients and thereby reduce the load on the server network by the cost of those messages. With existing file system protocols, the protocol latency grows more than linearly with the number of reading clients. Producerconsumer locking eliminates all protocol latency from the reading clients, excepting startup costs to obtain the first lock.

References [1] S. V. Adve and K. Gharachorloo. Shared memory consistency models: A tutorial. IEEE Computer, 29(12), December 1996. [2] T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A. Patterson, D. S. Roselli, and R. Y. Wang. Serverless network file systems. ACM Transactions on Computer Systems, 14(1), February 1996. [3] P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley Publishing Company, 1987. [4] R. C. Burns. Data management in a distributed file system for Storage Area networks. PhD thesis, University of California at Santa Cruz, 2000. [5] L.-F. Cabrera and D. D. E. Long. Swift: Using distributed disk striping to provide high I/O data rates. Computing Systems, 4(4), 1991. [6] M. E. Crovella and A. Bestavos. Self-similarity in world wide web traffic evidence and possible causes. Performance Evaluation Review, 24(1), 1996. [7] M. Devarakonda, D. Kish, and A. Mohindra. Recovery in the Calypso file system. ACM Transactions on Computer Systems, 14(3), August 1996. 20

[8] A. L. Drapeau, K. W. Shirriff, J. H. Hartman, E. L. Miller, S. Seshan, R. H. Katz, and D. A. Patterson. RAID-II: A high-bandwidth network file server. In Proceedings of the 21st International Symposium on Computer Architecture, 1994. [9] P. Druschel and G. Banga. Lazy receiver processing (LRP): A network subsystem architecture for server systems. In Proceedings of the 2nd Symposium on Operating Systems Design and Implementations, 1996. [10] G. A. Gibson, D. F. Nagle, K. Amiri, F. W. Chang, H. Gobioff, E. Riedel, D. Rochberg, and J. Zelenka. Filesystems for network-attach secure disks. Technical Report CMU-CS-97-118, School of Computer Science, Carnegie Mellon University, July 1997. [11] R. A. Golding. Weak-consistency group communication and membership. PhD thesis, University of California at Santa Cruz, 1992. [12] R. G. Guy, J. S. Heidemann, W. Mark, Jr. Page T. W., G. J. Popek, and D. Rothmeier. Implementation of the Ficus replicated file system. In Proceedings of the 1990 Summer Usenix Conference, 1990. [13] S. Hannis. AFS as part of the IBM WebSphere performance pack. In Proceedings of Decorum ’99, March 1999. [14] J. H. Hartman and J. K. Ousterhout. The Zebra-Striped network file system. In Proceedings of the 16th ACM Symposium on Operating System Principles. ACM, 1993. [15] D. Hitz, J. Lau, and M. Malcom. File system design for an NFS file server appliance. In Proceedings of the USENIX San Francisco 1994 Winter Conference, January 1994. [16] J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems, 6(1), February 1988. [17] J. R. Jump. YACSIM reference manual. Technical Report available at http://wwwece.rice.edu/˜rsim/rppt.html, Rice University Electrical and Computer Engineering Department, March 1993. [18] M. L. Kazar. Synchronization and caching issues in the Andrew file system. In Proceedings of the USENIX Winter Technical Conference, February 1988. [19] M. L. Kazar, B. W. Leverett, O. T. Anderson, V. Apostolides, B. A. Bottos, S. Chutani, C. F. Everhart, W. A. Mason, S. Tu, and R. Zayas. DEcorum file system architectural overview. In Proceedings of the Summer USENIX Conference, June 1990. [20] T. M. Kroeger and J. C. Mogul. Digital’s web proxy traces. Technical Report http://ftp.digital.com/pub/DEC/traces/proxy/webtrace.html, Digital Western Research Lab, 1998. [21] L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9), 1979. [22] P. J. Leach. A common Internet file system (CIFS/1.0) protocol. Technical report, Network Working Group, Internet Engineering Task Force, December 1997. [23] K. Muller and J. Pasquale. A high performance multi-structured file system design. In Proceedings of the 13th ACM Symposium on Operating Systems Principles, 1991. 21

[24] V. Paxson and S. Floyd. Wide-area traffic: The failure of Poisson modeling. In Proceedings of SIGCOMM’94, 1994. [25] K. W. Preslan, A. P. Barry, J. E. Brassow, G. M. Erickson, E. Nygaard, C. J. Sabol, S. R. Soltis, D. C. Teigland, and M. T. O’Keefe. A 64-bit, shared disk file system for Linux. In Proceedings of the 16th IEEE Mass Storage Systems Symposium, 1999. [26] C. A. Thekkath, T. Mann, and E. K. Lee. Frangipani: A scalable distributed file system. In Proceedings of the 16th ACM Symposium on Operating System Principles, 1997. [27] Transarc’s DFS provides the scaleablility needed to support IBM’s global network of Web servers. http://www.transarc.com/Solutions/Studies/EFS Solutions/DFSOly/dfsolympic.htm, 1999. [28] D. Walsh, B. Lyon, G. Sager, J. Chang, D. Goldberg, S. Kleiman, T. Lyon, R. Sandberg, and P. Weiss. Overview of the Sun network file system. In Proceedings of the 1985 Winter Usenix Technical Conference, January 1985.

22