File Access Patterns in Coda Distributed File System Yevgeniy Vorobeychik
[email protected] Computer Science Department McCormick School of Engineering and Applied Sciences Northwestern University Evanston, IL 60201
Abstract
1. Introduction
Distributed File Systems have long utilized file caching techniques to improve performance. In many DFS’s clients are allowed to update the cached replicas of files, necessitating a variety of mechanisms that ensure the consistency of other replicas of these files across the network. This problem becomes complicated if there are many unstable files on the network, and especially so if there are no central servers. Surprisingly, there has not been much research into the access patterns of shared files. While researchers have found that there are relatively few unstable files, this claim has been disputed by others, and the last such study that I am aware of dates back to 1992. Even more strikingly, I am aware of no detailed studies of file reading and writing patterns at all. In an attempt to fill this gap, I analyzed file access patterns in Coda Distributed File System using traces collected at Carnegie Mellon University over a period of approximately two years. I found that 1) Most files analyzed are stable; 2) Most unstable files tend to be updated and read by only one computer, but the computer that reads a file tends to be different from the one that updates it; and 3) A vast majority of files are read by the same computer that created them.
Ever since the beginning of distributed systems caching of all sorts of data became increasingly common. Indeed, nowadays it’s common knowledge that most types of distributed systems utilize some form of caching, be it the web browsers that cache the web pages we visit, or a distributed file system which makes local replicas of recently accessed files. Both of these are examples of file caching, or caching of a chunk of data that is logically organized into something we like to call files. There isn’t much difficulty involved in readonly caching, such as that of a web browser. However, a DFS would not normally impose such a restriction, and, consequently, the system may often have to propagate updates done to one replica across the whole network. The way most DFS’s deal with this scenario is by some form of writing the updates back to the file server and a file versioning or invalidation scheme. This may have been sufficient in most situatio ns, but the increasing popularity of peerto-peer networks presents a grave difficulty to this approach (no server to write back to!). In addition to this important limitation, there is a possibility of frequent updates to a file by different computers on the network, which would significantly reduce system performance due to frequent write backs. Finally, if there are many unstable files on the network, the file server performance may be compromised, as it will be wasting many CPU cycles simply
processing the updates and keeping track of inconsistencies throughout the system. The mere possibility of these difficulties evokes a natural question: “What kind of research has been done to date to explore the frequency of such scenarios and to provide more information about them?” Well, there have been numerous studies that have appraised caching, including those done by [1] and [3], to which I will return in Section 2 of this paper. These two studies, as well as [5] and [6], also address the question of the frequency (or, rather, the infrequency) of write accesses within a given system. Aside from that, I have failed to discover much work relevant enough to the problems that I have posed. As I am aware of no study that analyzes create, read, and write access patterns on shared files, the goal of this project is to extract detailed information about these access modes. More specifically, I perform a case analysis, discussed in Section 3, of the frequency and properties of create, write, and read file access. In addition to the case analysis, Section 3 will describe how the file access data was extracted from the Coda traces collected at Carnegie Mellon University. In Section 4 I will present the results from the data gathered thus far. Section 4 will also analyze the results and apply intuition to justify them, while in Section 5 I will suggest several important implications of the results for improving file caching techniques. Section 6 will then expose the flaws and limitations of this research in its current state, and Section 7 will present ideas for future work. Before moving on to Section 2, I would like to note that this project deals only with whole file caching. Of course, some Distributed File Systems only cache parts of files. This situation is beyond a scope of this paper due to current time constraints, though it should in the future be studied in a similar manner.
2. Related Work While there has not been very much work done to characterize the file access patterns in distributed files systems, there are several important related studies worth mentioning. The earliest of these, [3], studied UNIX 4.2 BSD trace data at University of California, Berkeley. It found that majority of accesses are in read mode, though it provided no information regarding the fraction of shared files that are accessed only in read mode, nor about how frequently files are updated. This project also investigated disk block caching performance, finding that delayed-write policy is optimal and that miss ratio decreases with cache size and block size. These findings can hardly be generalized to make any tenable conclusions regarding file sharing in DFS’s, but it’s a good start. A more recent study, done in 1991 at Berkeley 1 analyzed file access patterns and caching of Sprite DFS. This study concludes, as did [3], that most file accesses are read-only (88% of all accesses and 80% of all bytes). This suggests that there may be very few unstable files, and, possibly, many files that are updated very infrequently. However, they do not actually measure per- file access patterns, instead concentrating on the global characteristics. [1] also found that caching leads to less than expected improvements in read hit ratios and virtually no reduction in write traffic. This conclusion suggests that the caching mechanism in Sprite (which is similar to many other DFS’s) is not smart enough to deal with file writes. It does not, however, provide enough information to suggest any improvements that could remedy this problem. The most recent paper dealing with file access patterns on a network is [5]. This study questions the generalizations implied by [1] and [3], suggesting that these studies only apply to small-project environments. [5] presents a study of a large engineering firm, finding that 1
[1]
unstable sharing of files is much more common than predicted by earlier discoveries. However, they also found that files tend to be updated by only one cluster and read by many. The latter observation led them to the idea of FROLIC 1 , a cluster-based file replication scheme. Essentially, instead of caching the files locally, FROLIC replicates them across clusters. This approach reduces inter-cluster communication, which is expected to be slower than communication within each cluster. [4] found that FROLIC indeed significantly reduces average file access times. None of the work discussed thus far addressed per- file access patterns, information that I believe would suggest improvements to the file caching techniques. The goal of my study, therefore, was to provide this additional information, which I broke up into several cases and sub-cases discussed in detail within the next section.
3. Methodology As I had already mentioned, the study described in this paper focused on extracting and analyzing file access patterns for individual files. The trace data that was used was collected from Coda Distributed File System at Carnegie Mellon University over the span of over two years, and although it had already aged somewhat, it is the most recent set of such data that I was able to access. The next three subsections describe how the study was organized. The first subsection contains a highlevel description of the framework I used. The second subsection provides further detail on how the information was collected from the provided traces, and the third gives a summary of the cases that I looked for within the data.
3.1. Framework After extracting all the files that were created, written to, or read from the trace set 2 I used a very small PERL library, consisting of a total of four classes, to analyze the resulting sub-traces and to organize the data into two summary files. The first summary file contains information on access patterns for each file that was opened within the trace set analyzed. The second summary file compressed all the information into a short case analysis, described in section 3.3. 3.2 Examining the Data Before searching the data for instances of file access, I used program replay packaged as a part of DFSLib to convert each trace (one by one) into ASCII, so that it can be easily parsed using PERL scripts. There were several reasons for undertaking this extra step instead of using the provided library and writing additional modules for custom data analysis. The main reason is the issue of standardization of C. Unfortunately, while there is a set of standards for C, there are still significant differences between individual compilers, and it can often be observed that the same piece of relatively simple code compiles using one compiler but not another. To exacerbate this situation, the evolution of even the same track of compilers has left older code in the dust, as the compilers are not always completely backwards compatible. The latter problem manifested itself with DFSLib, as it refused to compile with the newer version of gcc, and only after several changes to the code was it successfully rebuilt. PERL, on the other hand, has been in my experience relatively stable across different versions and systems, and, with the obvious exception of system calls, PERL code is very portable. 2
1
[6]
Only the first 5 CD’s of the traces were examined due to time constraints, but in the future the remaining 33 CD’s should certainly be examined as well
In order to analyze the data, I created two sets of PERL classes: one for retrieving only the files accessed through the three aforementioned open modes, and another one for analyzing the access patterns for all files thus retrieved. The two classes used for data retrieval are TracefileSet and Tracefile. Tracefile class contains the name of the trace file, its extension, and the name of the computer on which it was collected It provides an interface for “processing” the trace file, or retrieving the relevant data for that file and writing it to a summary file in the format: .sum.txt “Computername” refers to the computer on which the trace file was collected, and this information is readily available as a part of the name of the trace file. Each time a trace file is processed the appropriate information is appended to the summary file for the computer on which that trace was collected. An alternative way to have stored the information would have been to put it in the same file and append the computer name to the end of the file record. I chose the first approach due to its higher level of clarity. TracefileSet class internally uses Tracefile for each of the trace files that are a part of the “set”, or the group of trace files analyzed at a time, be it a physical CD, or a CD-worth of files downloaded from the web 1 . TracefileSet class is invoked within a small PERL script called gettracedata.pl. This script accepts –cdrom or –web as possible switches to specify the location of the trace files and the method for their retrieval. In addition, the –ncp
1
There is currently no option that allows for reading a set from a specified directory. The reason for this shortcoming is the limited time allotted for this study and my anticipation that there is little usefulness for such an option, as the option of automatically downloading all the files from the web in bulks of one CD per set seems much more convenient. If a need to specify a trace file directory should ever arise, the option would be simple to add, as the interface would be nearly identical to that for reading files from a CD.
switch prevents the program from copying the trace file from CDROM onto the Hard Drive 2 . After the file access information is retrieved from the traces, I used the classes ComputerSet and File to analyze the access patterns on each file within the retrieved sub-traces. The File object’s properties are the name of the file, and a hash for each access mode with computername as the key and the number of times the file was accessed in that access mode as the value. The interface to a file object is primarily through SetName, AddCreator, AddWriter, and AddReader methods, each of the last three accepting computername as the argument. ComputerSet class analyzes each file in the set of sub-traces using the File class to store the information for each file. Once all the trace information is processed, it summarizes the access patterns for all files in two summary files: accesssumfile.txt, which contains the access information for all files, and accesstally.txt, which presents a detailed “case analysis” described in the next subsection. 3.3 Case Analysis Since creating and writing to files have the greatest impact on file caching, and since they are likely to have different patterns associated with them, I chose to determine the access patterns for each of those two modes separately. When a file is created, it may be very helpful to know how it is subsequently accessed and whether there is any relationship between creation details and subsequent accesses. Writing to a file has even greater implications for caching, as it can mean invalidation of other cached copies of the file, increased network traffic, and potential multiple-writer access. In contrast, read access is simple as it has no consequences for computers other than the one that is reading the file. There are three possibilities for file creation that I consider. The first possibility is that no creator has been found for the file within the 2
Not relevant with the –web option
trace set examined. I refer to this as the “No Creator” case. The second possibility is that there is one computer that created the file – the “1 Creator” case. Finally, a file that has been deleted and later created with the same name by a different computer will fall into the “Many Creators” category. As these three categories are all-encompassing, all the files examined should fall into one of the m. In the “No Creators” case, no further investigation is done, since no creation access patterns can be collected without knowing the file creator. The “Many Creators” case is an anomaly and should be very rare and affect mostly special types of files, such as temporary files, to which this discussion would not apply. The “1 Creator” case is the one that should give us some insight into how future access patterns are related to creation, and, therefore, some subdivision of this case is necessary to examine these future patterns. This case is subdivided into two sub-cases: one follows the future writes, while the other contains future reads. The “future writes” track has three categories of its own: “No Writers”, which is the case when the file has not been written to within the trace files since creation; “Very Few Writers”, referring to a “very small” set of computers who write to the file; and “Many Writers”, which contains files that are modified by many computers. Both the “very few” and the “many” writers cases are checked to see if the creator belongs to the set of computers that write to the file. Similarly, the “future reads” track is split into the “No Readers”, “Very Few Readers” and “Many Readers” subcases, and the latter two cases are checked to see if the creator happens to be one of the computers that subsequently read the file. The second set of cases is for access patterns related to file writes. As with file creation, there are three all-encompassing cases that I consider. In the first case, referred to as the “No Writers” case, the file is never accessed in write mode within the trace set. In the second case – the “Very Few Writers” case – the file is accessed in write mode by “very few” computers. Finally,
if many computers access the file in write mode, this file becomes a part of the “Many Writers” case. Since there is not much to discuss if no evidence of write access is found, there are no subdivisions of this case. The “Many Writers” case, on the other hand, is one of the motivations for this study, or, more specifically, it is my hope that such an occurrence is rare and can thus be dealt with by simply disabling caching of the files fall into this bin. The “Very Few Writers” case, however, is further broken down into three subcases: “No Readers”, “Very Few Readers”, and “Many Readers”. If the file belongs to the “No Readers” sub-case, it has not been read within the trace set even though it has been written to by “very few” computers. The file that has been read by “very few” computers will fall into the “Very Few Readers” subcategory, and the file that has been read by many computers will become a member of the remaining “Many Readers” sub-case. Before moving on to the next section, I need to explain the meaning of “very few”, since it carries the implicit assumption that this unspecified number is either known or easily determined. That may not be the case, since “very few” may differ from system to system depending on many factors, such as network speed and average congestion, as well as how the network is used and, of course, the fundamental network characteristics such as topology. In general, I believe that a good metric for “very few” should be a number that justifies additional communication between the computers that write to the file. As a part of the study, I test the hypothesis that “very few” is one for Coda. In other words, I expect to discover that most files that are modified in the system are only modified by one computer. If this is the case, the communication costs within the “writers” group are zero.
“No Creators” “1 Creator” “Many Creators” 23507 6593 26 Table 1 Creation Access Patterns (numbers of files)
“No Writers” “1 Writer” “Many Writers” 29987 136 3 Table 2 Write Access Patterns (numbers of files)
“No Readers”
“1 Reader” “Many Readers” Creator one of readers 3871 10 2710 Creator not one of readers 2 0 Table 3 File Reading Patterns for 1 Creator (numbers of files) “No Writers” “1 Writer” “Many Writers” Creator one of writers 0 0 6593 Creator not one of writers 0 0 Table 4 File Writing Patterns for 1 Creator (numbers of files) “No Readers”
“1 Reader” “Many Readers” Creator one of readers 0 0 122 Creator not one of readers 13 1 Table 5 File Reading Patterns for 1 Writer (numbers of files)
4. Results and Analyses Now that I have delineated the framework used to group the access patterns that the files within the traces follow, it is time to present the actual results. To make the results and their meaning clearer for the reader, I display them in several “case tables”. The first two case tables present the numbers of files that fall into each of the major categories as they are described in the previous section. “No Creators”, “1 Creator”, “Many Creators” cases are summarized in the Table 1, while “No Writers”, “1 Writer”, and “Many Writers” are summarized in Table 2. Tables 3 and 4 present the summary of the access patterns for files that are created by one computer, while Table 5 shows the summary of read access patterns for files that have been written to by only one computer. A total of 30126 files were examined. Several conclusions can be immediately reached from Tables 1 and 2. First of all, it is clear that the files are not created very frequently. Another observation is that very few files are unstable. Only 6619 files were created within the trace set and, even more dramatically, only 139 files have been written to. While it is possible that most of the 139 files are unstable,
this is still a very small percentage of all files. It’s even more important to note that only 3 of these files are updated by more than one computer, and it’s possible that with a higher “very few” number the “Many Writers” case would become negligible. However, the hypothesis that most files are only modified by one computer does appear to hold for Coda. Looking at Table 3, it can be noted that it is very likely that the computer that created a file will subsequently read it. Table 4, unfortunately, does not provide much information, but is a testament to the insufficient amount of data that has been examined. However, the number of cases described by this table can be predicted to be quite small, as there are very few files that have been created and even fewer that have been updated within the traces. The results presented in Table 5 appear somewhat counterintuitive. First of all, most of the files that have one writer are not read within the trace set. Intuitively, this should not be the case, as the files are generally updated for the purpose of being read in the future. Since I only examined the first five CD’s of the data, this result may be attributed to insufficient information, as the data for future reads may be (and probably is) contained in the traces not yet
examined. It is still instructive to look at the more interesting result within Table 5: the computer that writes to a file is never its future reader. As more trace data is analyzed, I am certain that there will be cases that will break this “never” stipulation, but it seems fair to generalize that it is unlikely that the writer will read the file in the future. How can this be? It’s important to remember that we are dealing with shared files, not local files. For local files this result would make no sense. With sharing, however, it is entirely possible that in a smallproject environment there would be designated file “producers” and “consumers”. For example, one could envision a student who works at the “producer” computer, intermittently updating a file, and an advisor, who acts as the “consumer”, reading the file on from time to time on his computer. Indeed, this pattern can be inferred from [SZ92], though on a cluster scale, since they do note that only one cluster tends to write to a file, while many generally read it. At this point another observation can be made: there is only one file in this table that has many readers. One explanation may be that there is a predominance of projects worked on in groups of two (the advisor/student relationship fits into that category). Of course, more data is necessary to make any conclusions regarding this particular result, and even if this were the case, it cannot be generalized to other environments.
This will take a load off the server, allowing it to perform other tasks more efficiently. Moreover, this will be a tremendous help to peer-to-peer networks, as the file serving and cache maintenance mechanisms can be distributed throughout the network. This technique will also increase the resilience of the network, as the server become less important to its critical functions and, therefore, is no longer the single point of failure. Another observation made in Section 4 was that it is very likely for a computer that creates a shared file to read that file later on. Consequently, it makes a lot of sense to cache a file on the computer that creates it. Finally, it has been observed that cached files tend to be read by few computers. If this result holds after more data is analyzed, it may be useful to keep track of the computers that read the file for any further optimizations to distributed file system performance. Having said all this, I want to note that one must be very careful about generalizing the results presented here. The access patterns for Coda file system at CMU may not necessarily be indicative of other DFS’s, nor of other environments. It is very possible that the limitations of Coda drive the observed file access behavior. This study is meant to “get the ball rolling”, so to speak, but it is very important to look at other DFS data before making broad generalizations about shared file access patterns. In the next section, I will discuss this and other limitations of my study.
5. Implications As noted in Section 4, it is safe to assume that, at least within Coda, most files are updated by one computer. Consequently, it makes sense to modify caching techniques to take advantage of this behavior. Specifically, it may be possible in many systems to avoid server writebacks altogether, instead storing information for the set of computers that write to a file (a set of one in this case) and having other clients contact a member of this set directly for file updates.
6. Flaws and Limitations The most notable limitation of my study is the insufficiency of data that has been analyzed. First of all, I only analyzed the first 5 of the 38 CDs of the Coda traces. While I found that the conclusions tend to be stable (the conclusions that can be reached after analyzing only 1 CD of data are similar to those discussed in this paper), it is still difficult to make generalizations based
on such a small proportion of the available data. Furthermore, as briefly discussed at the end of Section 5, there is an inherent problem with generalizing results collected from Coda traces. These traces were collected only at Carnegie Mellon University, and they only apply to Coda. The traces are also nearly ten years old, and I am certain that shared file access patterns have shifted somewhat since then. Nevertheless, the information about Coda is still necessary – it is just not sufficient. Another important shortcoming of this study is that a very small percentage of the data was analyzed in detail. While most files belong to the “No Creators” and “No Writers” cases, there is no additional information provided about these files. These limitations should be addressed in the follow-up research, and possible remedies are discussed in Section 7.
7. Future Work Future work follows directly from the flaws and limitations discussed in the previous section. First and foremost, the remainder of the CD set of Coda traces needs to be analyzed. This will create a clear picture of the per-file access patterns on Coda. While this research alone may provide information for improving caching techniques within Coda, generic improvements in caching strategies will need data from other distributed file systems such as Sprite. Therefore, next step should be analyzing other DFS data in a similar manner. Finally, I noted in the previous section that the “No Creators”, “No Writers”, and “No Readers” cases were not analyzed in detail, despite the observation that the vast majority of the files belong to these cases. In the future, there should be some subdivisions of these cases as well, since information thus retrieved may be helpful in further DFS optimizations. There should also be further breakdown of the “Many Creators”, “Many Writers”, and “Many Readers” cases. Additionally, it may be
necessary to determine the appropriate values for “very few” in other distributed file systems if the patterns observed for Coda do not quite apply to them.
References [1] M. Baker, J. Hartman, M. Kupfer, K. Shirriff, and J. Ousterhout. “Measurements of a Distributed File System.” Proc. of the 13 th ACM Symposium on Operating System Principles, 25, 5, 198-212, ACM, October 1991 [2] L. Mummert, M. Satyanarayanan. “Long Term Distributed File Reference Tracing: Implementation and Experience.” Software Practices and Expiriences, 26, 6, 1996 [3] J. Ousterhout, H. Da Costa, D. Harrison, J. Kunze, M. Kupfer, and J. Thompson. “A TraceDriven Analysis of the UNIX 4.2 BSD File System.” Proc. of the 10 th ACM Symposium on Operating System Principles, 15-24, ACM, December 1985. [4] Y.C. Pang, D.S. Gill, S. Zhou. “Implementation and Performance of Cluster-Based File Replication in Large-Scale Distributed Systems.” Proc. of the Second Workshop on the Management of Replicated Data, 100-103, Monterey, CA., November 1992 [5] H.S. Sandhu and S. Zhou. “A Case Study of File System Workload in a Large-Scale Distributed Environment.” Technical report, University of Toronto, 1992. [6] H.S. Sandhu and S. Zhou. “Cluster-Based File Replication in Large-Scale Distributed Systems.” Proc. of ACM Sigmetrics Performance ’92 Conference, 20, 1, May 1992