Policies for Proxied vs Direct data access in Parallel NFS - Google Sites

1 downloads 114 Views 301KB Size Report
distributed file systems such as NFS [Figure. 1]. [Figure 1: ... Lustre [17] file system supports parallel I/O through t
Policies for Proxied vs Direct data access in Parallel NFS Akshay Shah, Sachin Bhamare, Swaroop Choudhari, Zainul Abedin Abbasi 15-712 : Advanced Operating Systems & Distributed Systems Carnegie Mellon University Abstract: Parallel NFS (pNFS) improves file access scalability by providing the NFSv4 client with support for direct storage access. Parallel NFS allows NFSv4 servers to optionally return to a client a map of where data is on disk, instead of just doing the access on behalf of the client. We believe that it might not always be advantageous to do direct I/O, sometimes it may be faster, more efficient, or more load balanced to have the server do the access, and other times it may be better to do so from the client using the map. In this paper, we have used the U.Michigan/CITI pNFS over PVFS2 prototype to experiment with different policies for selecting the right path of each access.

Keywords: Parallel NFS, NFSv4, performance, Parallel File System, Parallel I/O, Operating System, Distributed File Systems.

1. Introduction Traditionally, granting shared access to set of storage resources was achieved via distributed file systems such as NFS [Figure 1].

[Figure 1: NFS Architecture]

The basic NFS architecture is shown in figure 1. In this model all client requests are serialized by the central NFS server. This model is adequate for a limited number of clients with modest data requirements. However, in the face of a large number of data hungry clients, the central NFS server tends to become a performance bottleneck. With the advent of storage networking, storage nodes became first class network entities and it was now possible for the clients to access the storage nodes directly. This led to the development of a new paradigm know n as the “Out of Band data access” paradigm. The „Out of Band data access‟ paradigm alleviates this bottleneck and allows clients to scale with the aggregate bandwidth of the storage system. It achieves this by separating the control and data flows. This separation provides a straightforward framework to accomplish high scalability, by allowing transfers of data to proceed in parallel directly from many clients to many data storage endpoints, this form of data access is known as direct data access. Control and file management operations, inherently more difficult to parallelize, can remain the province of a single NFS server, inheriting the simple management of today's NFS file service.

[Figure 2: pNFS Architecture [1]]

Parallel NFS (Figure 2), based on the above paradigm, aims to solve the scalability problem while being an open standard. Data transfer may be done using NFS or other protocols, such as iSCSI, under the control of an NFSv4 server with parallel NFS extensions. Such an approach protects the industry's large investment in NFS, since the bandwidth bottleneck no longer needs to drive users to adopt a proprietary alternative solution, and leverages SAN storage infrastructures, all within a common architectural framework. Parallel extension to NFS allows parallel access to data by returning to the client a layout or map of where data is in the file system, instead of doing it on behalf of the client. Using the direct data access approach may not be the best option always. Sometimes, it might be faster, more efficient or load balanced to have the NFS server do the access (proxied data access), and other times it may be better for a client to do it directly after obtaining a layout/map. In this paper we describe our methodology and experimental results that helped defining different policies for optimal access path for the reference implementation and evaluate some of the policies.

The remainder of the project is organized as follows. Section 2 discusses literature overview and related work. Section 3 discusses methodology used for defining policies and parameters of interest. Section 4 describes the experimental setup and resources used for experimentation. Section 5 deals with results obtained and analysis based on which we define data access policies. Section 6 discusses the validation of subset of these policies. Section 7 discusses future work followed by summary and conclusion in section 8.

2. Related Work GPFS [18] supports fully parallel access both to metadata and file data and deploys shared disk architecture to achieve scalability. PVFS2 [13], is a user space parallel file system. It allows user to spread file data and metadata across different nodes in a cluster. By spreading out files in this manner, large files can be created, potential bandwidth is increased, network bottlenecks are minimized and single point of failures are avoided. Lustre [17] file system supports parallel I/O through the use of metadata server which supports file namespace operations while directing actual file I/O requests to Object Storage Targets (OST) which manage the physical storage on underlying Object Based Disks (OBD). All [13], [17], [18] and [19] are optimized for large data transfers and neglect small I/O optimizations. pNFS [14], [15], [16] provides a platform independent open standard that supports parallel file system operations. Hence, allowing proprietary file systems to pay more attention to their core features.

Studies in [1] demonstrate that NFSv4 can increase write throughput to parallel data stores by overcoming parallel file system inefficiencies. It also shows how parallel NFS can improve overall write performance by using direct, parallel I/O for large write requests and an indirect I/O (NFSv4) for small request sizes. However in this work the policies for data access are governed solely by write access size and corresponding workloads for single client.

could be defined looking at each attribute individually and also looking at a combination of attributes.

From these studies they concluded that optimal write threshold value depends on several factors, including server capacity, network performance and capability, and the utilized NFSv4 and pNFS file systems. They suggest that a good way to determine a threshold is to compare execution times for NFSv4 and pNFS with various write sizes and see where the curves cross.

3.1.

In our work, we will define various policies based on request/syscall size, file size, access patterns (randomness of reads and writes) and use [1]‟s methodology to define the thresholds in each case for single and multiple clients. Studies in [11] are based on comparing performance for different file sizes, access patterns (randomness of reads and writes) and workload on PanFS and NFSv3. Policies for each of these parameters were exclusive of each other. But some of these results are not inferable because they were affected by client-side caching. We are considering some of their future work e.g. concurrent access as one of the access policies for evaluation.

3. Methodology There are number of quality attributes that would define a policy for a particular system. For example, performance, load balancing, type of network etc. Policies

Two major quality attributes that affect a system behavior are: 1. Performance. 2. Load balancing. The behavior of these attributes can help in formulating policies for the system. Performance

Parallel file systems are used in scientific applications and large super computing clusters [2], [5]. Performance is important quality attribute for these applications and clusters. To satisfy these large and performance intensive data needs, parallel file systems are used. Performance can be measured in terms of read/write throughput, latency, number of operations per unit time etc. In our system, we use read/write throughput to measure performance. 3.2. Load Balancing Load balancing can be measured by looking at the utilization of the server and comparing it to the utilization of storage nodes. The goal is to ensure that all the storage nodes will be equally utilized and any storage node does not become a bottleneck. The idea here is that the NFS server can act as a slack process and alleviate the load on the storage servers if they are being heavily utilized. The parameters that affect the load on storage nodes and metadata server are number of clients and the amount of data that is being accessed from the storage nodes concurrently. For the scope of this paper, we have not considered load balancing as an attribute of interest.

3.3. Parameters affecting performance Few parameters that have an effect on performance are  File size  Access size / Request size  Access method ( sequential or random)  Caching  Request overhead or Data/Metadata ratio  Number of clients (Concurrency) In our study we defined policies based on individual parameters like access size of the workload, file size of the workload, as well as combination of parameters. 3.4. Determining Threshold The methodology that we used to determine thresholds and define policies is similar to that used in [1]. On plotting the performance of pNFS and NFSv4 each running the identical workload, we determine the threshold point where pNFS starts to out perform NFSv4. The threshold value defines the policy for that specific parameter (Figure 3).

In our study, we show that this methodology can be extended to define policies for multiple parameters. After determining the optimum value for parameter 1, we can then find the optimum value for other parameters by keeping parameter 1 constant.

4. Experimental Setup Our experimental setup consists of four pNFS clients, four storage nodes and one metadata server. Each machine has a 2.4 GHz Intel Pentium (R) 4 processor with 1 GB memory. All nodes are connected via Gigabit Ethernet through a single HP Procurve 2848 switch. All nodes run Linux kernel version 2.6.17. Both pNFS and NFSv4 use PVFS2 1.5.1 as the underlying file system. We use the pNFS prototype implemented by U. Michigan/CITI. We use IOZone v.3.263 for running file system benchmarking experiments. Multiple readings were obtained for each of the experiment and average values were considered for analysis. Experimental setup is depicted in figure 4.

Sample graph to determine threshold 160 140

Performance

120 100 NFS

80

pNFS/PVFS

60 40 Threshold point

20 0 1

2

3

4

5

6

Parameter

[Figure 3: Determining threshold]

[Figure 4: Experimental Setup]

4.1.

Assumption and caveats

5.1.1 Write threshold

For the purpose of this paper, our taxonomy of large, medium and small files is as follows. Files larger than or equal to 512MB are considered as large files. Small files are the files with sizes less than or equal to 4MB. Files that lie between 4MB and 512MB are considered as medium sized files.

Workload: Large files, variable access size In this experiment, we compared sequential write performance of pNFS and NFS for large files. Figure 5 shows the results obtained for a 1GB file.

The code base used is as described in the section 4. Hence, all of the policies and their corresponding threshold values are specific to this code and they may vary for different implementations and versions of pNFS.

5. Experimental Analysis

Results

and

In this section we describe the experiments crafted to help determine a policy for different parameters using IOzone benchmark. For each of these experimental runs, we compared aggregate throughput of pNFS and NFSv4. We then set the threshold for a parameter to be tested where pNFS and NFSv4 performance graphs intersect each other. In Section 6, we evaluate the determined policy by introducing the threshold value and running the same workloads. The client can now switch the access path dynamically depending on the threshold set. We broadly classify our experiments based on the number of clients. 5.1 Single Client Benchmarking Experiments in this section consist of a single client issuing various file system workloads.

[Figure 5: Sequential Write, Large files, variable access size]

Analysis: In figure 5, we see that for large file sequential write access from a single client, NFS achieves flat performance number. It reaches a maximum bandwidth of 30MB/s and remains constant with increasing access size. However, on the other hand, pNFS begins lower than NFS but gradually surpasses N F S ‟s bandw idth until it saturates at 60MB/s. With increasing access size pNFS achieves greater bandwidth due to the direct parallel access of data file from the storage nodes whereas in case of NFS, the maximum throughput remains limited by the server capacity. From above graph, using the methodology explained in previous section, we determined the threshold of 64KB where pNFS and NFS performance graphs intersect. For any access size above 64KB, pNFS outperforms NFS. Workload: Small files, variable access size

In the experiment, we ran the same workload as above on a small file with variable access size. Figure 6 shows the graph obtained for a file of size 128KB.

5.1.2 Read Threshold Workload: Sequential reads, large files, variable access size In this experiment we compared the performance of NFS and pNFS for sequential reads on large files with varying access sizes from 4KB to 16MB. Figure 7 shows the results for a 1GB file.

[Figure 6: Sequential Write, small files, variable access size]

Analysis: In figure 6, we observe pNFS and NFS behavior which is quite similar to figure 5. NFS sequential write performance is stable and never really scales with increasing access size. On the other hand, pNFS starts low and overtakes NFS for access size greater than 64KB. Hence, for sequential write on small files we determined the write threshold to be 64KB above which pNFS performs better than NFS. Also, we can see that the threshold value remains same for both large and small files. Hence for sequential write access pattern on single client the write threshold is not a function of file size. It solely depends on the access size of the request. Policy: Based on our experiments for single client, it is concluded that for any sequential write with access size greater than 64KB, it is better to do direct access as compared to proxied access irrespective of file size.

[Figure 7: Sequential read, large files, variable access size]

Analysis: As seen in figure 7, for sequential read workload, similar to write workload, the NFS performance is almost stable. It reaches a maximum bandwidth of 35MB/s and remains pretty much constant for most of the access sizes. pNFS however scales linearly with the access size till 4MB where it saturates due to network limitation. Both pNFS and NFS plots intersect each other between the access size of 16KB and 32KB. In this case, we choose the lower value of 16KB as the threshold above which pNFS performs better than NFS. Policy: For any read access greater than 16KB on workloads involving sequential read access of large files from single client, use direct access else use proxied access. We ran similar experiments, sequential read workload, for various small file sizes. From

those experiments we observed that NFS was consistently outperforming pNFS by a substantial margin. So, we decided to plot a graph of bandwidth against varying file sizes for sequential read workload.

experiment for different large file sizes. Figure 9 is a plot for a file size of 1GB.

[Figure 9: Random write, large files, varying access size] [Figure 8: NFS caching effect]

Figure 8 is the plot of bandwidth against varying file sizes with access size kept constant at 4MB. From the plot, we observe that for file sizes up to 512MB, NFS consistently out performs pNFS. However, for files greater than 512MB, pNFS performs better than NFS. We attribute this behavior of NFS to its effective use of file system buffer cache thus increasing the read throughput. NFS performance drops when file size becomes larger than the memory available at the server. pNFS performance is independent of file size as caching is not implemented in the pNFS prototype used for the experiments.

Analysis: From figure 9, we get the same threshold value of 64KB above which pNFS outperforms NFS. Thus, the access pattern does not affect the threshold value. Workload: Random write, varying file size, constant/variable access size. In this experiment, we ran workloads similar to previous one but for varying file sizes keeping the access size constant. This experiment validates some of our previous findings.

5.1.3 Random Write Threshold Workload: Random write, large files, variable access size. In this experiment, we compared the large file write performance of pNFS and NFS for random access pattern varying the access size from 4KB to 16MB. We ran this

[Figure 10: Random write, varying file size, 4k access size]

5.2 Multiple Clients Benchmarking Experiments under this section consist of multiple clients issuing running file system workloads as described below. 5.2.1 Write Threshold Workload: Constant/variable file variable access size, sequential writes

[Figure 11: Random write, varying file size, 64k access size]

In this experiment, we ran various workloads on 4MB file with varying access sizes.

[Figure 13: File size 4MB, access size 16 KB] [Figure 12: Random write, varying file size, 128k access size]

Analysis: Figures 10-12 show random write performance comparison of NFS and pNFS for access sizes of 4KB, 64KB and 128KB respectively. It was observed that for access sizes less than 64KB, NFS performs better than pNFS and for access size greater than 64KB pNFS performs better thus confirming our threshold value of 64KB. Policy: For random write accesses, irrespective of file size, access path should be direct access for any access size greater than 64KB.

size,

[Figure 14: File size 4MB, access size 64 KB]

6. Evaluation To evaluate our policies we modified the reference implementation to allow the client to dynamically switch between direct data access paths and proxied data access based on the threshold set on the read/write access size.

[Figure 15: File size 4MB, access size 128KB]

Analysis: As we can see from figures 13-15, pNFS starts performing better than NFS when the cumulative access from all clients is greater than 64KB. For multiple clients, the threshold set for a single client is not effective. For four clients, each making an access of 64KB, the total access made in the case of four clients is a total of 256KB which is much higher than the threshold. But all these requests hit the NFS server, as we can see from the graph that using the 64KB threshold does not improve performance. Hence, we should look at a way in which the NFS server can also set its own internal policies. Depending on those policies it might reject a request that comes to it and instead it returns a map to the client for direct data access hence over-riding the client policy. Policy: For collective access size 64KB onwards using direct data access outperforms proxied data access. This policy is determ ined by the N F S server’s capability and should override the individual client policy.

[Figure 16: Single client sequential write, 1GB file, varying access size]

[Figure 17: Single client sequential read, 1GB file, varying access size]

The two graphs (figures 16, 17) show results for the hybrid implementation where the client uses the NFS route for all write

accesses below write access size threshold value of 64KB and read access size threshold of 16KB.

9. References [1] Dean Hildebrand, Lee Ward, and Peter Honeyman, "Large Files, Small Writes, and

In these graphs, we see that the modified implementation follows NFS for write access sizes less than 64KB and pNFS route for accesses of size 64KB onwards. Using this dynamic switch between NFS and pNFS gives us the best of both worlds and hence provides higher overall performance compared to either NFS or pNFS alone. Similar results can be seen for reads as well.

pNFS," May 2006. [20th ACM International Conference

on

Supercomputing

(ICS06),

Cairns, Australia, (June 2006).] [2] D. Kotz and N. Nieuwejaar, "Dynamic FileAccess Characteristics of a Production Parallel Scientific

Workload,"

Proceedings

in

of

Supercomputing '94, 1994. [3] A. Purakayastha, C. Schlatter Ellis, D. Kotz, N.

7. Future Work We are currently investigating load balancing policies with custom workloads to analyze and characterize server performance. We are also investigating other parameters such as metadata/data ratio, effect of hot files and effect of layout caching at clients on the overall performance to come up with corresponding policies.

Nieuwejaar, and M. Best, "Characterizing Parallel File-Access Patterns on a Large-Scale Multiprocessor," in Proceedings of the Ninth International Parallel Processing Symposium, 1995. [4] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. Schlatter Ellis, and M. Best, "File-Access Characteristics

of

Parallel

Scientific

Workloads," IEEE Transactions on Parallel and Distributed Systems, (7)10, pp. 1075-1089,

8. Conclusion In this paper we have demonstrated that by subjecting computing environments to diverse file access patterns and researching effect of combination of parameters on overall performance helps us to determine policies to decide optimal data access path. Our evaluation results for single client scenario confirmed the validity of our data access policies. We also conclude that a single client sees the system in isolation and cannot determine system wide policies. Hence in multiple client scenarios, which is normally the case, it would be beneficial to have server side policies. These policies will be governed by the resources at server. Since the server has overall view of the system, it should be able to override client side policies to improve the performance of the system.

1996. [5] E.

Smirni

and

D.A.

Reed,

"Workload

Characterization of Input/Output Intensive Parallel Applications," in Proceedings of the Conference on Modeling Techniques and Tools for Computer Performance Evaluation, 1997. [6] P.E. Crandall, R.A. Aydt, A.A. Chien, and D.A. Reed, "Input/Output Characteristics of Scalable Parallel Applications," in Proceedings of Supercomputing '95, 1995. [7] M.G. Baker, J.H. Hartman, M.D. Kupfer, K.W. Shirriff, and J.K. Ousterhout, "Measurements of a Distributed File System," in Proceedings of the Thirteenth Symposium on Operating Systems Principles, 1991. [8] D. Hildebrand and P. Honeyman, "Exporting

Storage Systems in a Scalable Manner with pNFS," in Proceedings of the 22nd IEEE - 13th

2002. [19] Panasas Inc., "Panasas ActiveScale File System

NASA Goddard Conference on Mass Storage

Datasheet," www.panasas.com, 2003.

Systems and Technologies, Monterey, CA,

[20] S. Shepler, B. Callaghan, D. Robinson, R.

2005.

Thurlow, C. Beame, M. Eisler, and D. Noveck,

[9] IOR Benchmark,

"Network File System Version 4 Protocol

http://www.llnl.gov/asci/purple/benchmarks/li mited/ior [10] W.D.

Norcott

and

D.

Capps,

"IOZone

Filesystem Benchmark," 2003. [11] Amber Palekar† , Rahul Iyer† , “ A C ase fo r N etw o rk A ttached S to rage” M ay 2 0 0 5 C M U CS-05-138 [12] Sun Microsystems Inc., "NFS: Network File System Protocol Specification," RFC 1094, 1989. [13] PVFS2, http://www.pvfs.org/documentation.html [14] pNFS Problem Statement. Garth Gibson, Peter Corbett.

Internet

Draft,

July,

2004.

http://bgp.potaroo.net/ietf/idref/draftgibson-pnfs-problem-statement/ [15] Parallel NFS Requirements and Design Considerations. G. Gibson, B. Welch, G. Goodson, P. Corbett. Internet Draft, October 18, 2004. http://bgp.potaroo.net/ietf/idref/draftgibson-pnfs-reqs/ [16] B. Welch, B. Halevy, D. Black, A. Adamson, and D. Noveck, "pNFS Operations Summary," Internet Draft, draft-welch-pnfs-ops-00.txt, 2004. [17] Cluster File Systems Inc., "Lustre: A Scalable, High- Performance File System," 2002. [18] F. Schmuck and R. Haskin, "GPFS: A SharedDisk File System for Large Computing Clusters," in Proceedings of the USENIX Conference on File and Storage Technologies,

Specification," RFC 3530, 2003.

Suggest Documents