Interaction of Access Patterns on dNFSp File System

Interaction of Access Patterns on dNFSp File System Rodrigo V. Kassick, Francieli Z. Boito, Philippe O. A. Navaux Universidade Federal do Rio Grande do Sul: Instituto de Informática Porto Alegre, Brazil {rvkassick, francieli.zanon, navaux}@inf.ufrgs.br Abstract HPC applications executed in cluster environments often produce large quantities of data that need to be stored for later analysis. One important issue with the performance of these applications is the transport and storage of this data. To provide an efficient data storage facility, cluster systems may provide a Parallel File System in which applications can store their data. These systems aggregate disk and network bandwidth of several individual machines in order to improve the performance of data storage and retrieval. These storage systems are often shared by different applications running in the cluster environments in which they are deployed. Concurrent access to the storage service might influence the I/O performance of these applications due to spatial and temporal access patterns. This paper presents a study of interaction between some well-known access patterns over a NFS based parallel file systems. Keywords HPC; Parallel File Systems

I. I NTRODUCTION In the latest years, Cluster architectures have arise as the de-facto architecture for High Performance Computing (HPC) due to their versatility, scalability and low cost. Applications developed for such environments are usually distributed among the available nodes and employ message-passing mechanisms to perform communication among its instances. This need for communication led to the development and deployment of high performance networks in clusters, allowing faster communication and, as a consequence, better performance. In some situations, the application may need not only to exchange data among it’s instances, but also store it on some permanent media. This may be needed for use in future executions, analysis of data using external tools, etc. For such cases, the use of a distributed file system (DFS) might be suitable. DFS’s try to provide a large-space, high throughput storage area that can be accessed by a large number of clients – like a cluster of computers. The tasks relative to data storage and management are distributed among a set of servers – hence distributed file systems – in order to increase data throughput and the system’s scalability. Many distributed file systems have been studied and developed in latest years, like Lustre [14] and PVFS [5], [22]. While these systems use special protocols to allow the clients to communicate with the distributed servers, dNFSp [1] is a file system that aims to keep compatibility with standard NFS clients, while allowing dynamic server configuration and good I/O performance. When using a DFS, huge amounts of data must be transmitted between the clients and the servers. On the other hand, when different applications are executed concurrently on the same cluster, using the same shared parallel file system, their behavior regarding the usage of the common storage area will affect each other. One common behavior of a parallel application is to have alternate CPU-bound phase in which it generates data and an IO-Bound phase in which it will flush this data to the storage subsystem. In such cases, there will be a relatively constant time between two consecutive I/O phases during which the node will be processing the data read or to be written. During such time, other clients might be doing their own I/O operations This paper studies the behavior of the I/O performance of two concurrent applications executing over the same shared set of storage nodes using dNFSp. We utilize two groups of tests to evaluate this behavior. The first one evaluates the impact of different sized applications running concurrently. The second one simulates the bursty I/O phases separated by idle periods, commonly observed in HPC applications that generate a great amount of output data. The paper is divided as follows: Section II presents related works. Section III gives a brief overview of the temporal access patterns used in the tests. Section IV presents dNFSp, the distributed file system we used for our experiments. In Section V we present the tests and the test-environment used. Section VI presents the results and draws some conclusions on them. Finally, in Section VII we present our conclusions.

II. R ELATED W ORK Parallel file systems are solutions that have been long studied in the area of HPC. Systems like Bigfoot-NFS [11], Vesta [6] and the Zebra File System [10] are early examples of the distributed I/O systems. More recently, systems like GPFS [17], Lustre [14] and PVFS [5] have been developed with improved protocols, management tools and access API’s to improve I/O performance and became widely used in large HPC cluster. Several works have studied the I/O performance of parallel applications with different access patterns on this kind of system. Nieuwejaar et al [15] used traces from different I/O workloads to understand the performance of two HPC architectures. Their main goal was to discover the common trends seen in the workloads and understand what performance characteristics were influenced by architecture-specific characteristics. Seelam et al [18] used a similar approach in the BlueGene architecture and used the information obtained to improve performance of the BTIO benchmark [21], [2] over the GPFS file system. Tran and Read [19] studied temporal access patterns present in HPC applications and developed a system able to detect the bursty I/O periods of an application and adjust the system’s pre-fetching policies to improve performance. In Oly et al [16], detection of spacial access patterns of distributed applications was used to fine-tune pre-fetching mechanisms in the file system, adapting the file system to specific I/O necessities. More recently, Byna et al [4] studied the guided pre-fetching for a parallel file system based on access characteristics from the application. In this work, on the other hand, information about such characteristics was extracted from high-level I/O API’s like MPI-IO. Gu, Wang and Ross [9] studied a mechanism for data grouping and pre-fetching on the PVFS file system. In their prototype, this grouping took place only during idle intervals on the I/O server to avoid competition clients’ I/O requests. This strategy, while profiting from spacial distribution of data to improve performance, relied on temporal access patterns from the application to allow the system to reorganize data without severely impacting I/O performance of running applications. These works focus on the behavior and performance of a single application over the distributed storage. In this paper, we study the I/O performance of applications running concurrently on a shared DFS and how the performance of one influences the performance of the other. III. T EMPORAL ACCESS PATTERNS Scientific applications usually need access to large data sets used as input and, after processing such input, generate a considerable amount of data that may be written to a high performance storage system. This access to the data may happen in several ways, according to the needs of the applications and the choices made during the development. Applications may read portions of the input a they finish processing previously accessed data, as in the case of mpiBLAST, a MPI-based parallel implementation of BLAST, for genomic sequence search [12], [7]. Input data may also be accessed only during the process-creation phase, so all the computation is done over an initial dataset. The output, in this case, may be generated along the whole execution. This is the case for ESCAT, a parallel implementation of the Schwinger Multichannel method for study of low-level electron-molecule collisions [20]. Some application may choose to write a partial results from time to time, creating an execution checkpoint that can be later recovered to continue a simulation, as in Flash, a code for astrophysical simulations [8]. In such cases, access to the file system may be bursty during some periods of time, followed by periods of few or no activity at all from the nodes of an application. To the history of the periods of I/O of an application is given the name temporal access pattern of the application. Figure 1 shows an access-trace taken from dNFSp while a synthetic benchmark executed writes to the file system followed by pre-defined idle time. Each cross indicates a request sent from one client to the file system. While the client with id 1 performs few accesses to the storage system, other clients present bursts of I/O requests separated by intervals in which the application executes other activities. This trace presents an example of an application with a well-defined temporal access pattern (I/O, followed by a constant idle time) as seen from the point of view of the file system being used. Although requests are generated in a regular fashion on the clients, their reception and processing on the I/O servers will be less regular. With this in mind, it becomes important to study how much other applications may profit from the idle intervals on the server to perform their own I/O requests with better performance. When more then one application shares the storage system, such temporal patterns may influence the I/O performance of all the applications in execution. Applications with a large I/O inactivity time may have little influence over the overall performance of the system while long-running I/O-bound applications may degrade the performance of applications less data-intensive.

4

Client ID

3 2 1 0 ��

��

��

��

��

��

��

�

��

� ��

�

Time (ms) Figure 1.

I/O Requests over time on the dNFSp file system

��

��

��

��

��

��

��

Figure 2.

dNFSp Architecture

IV. D NFS P dNFSp — distributed NFS parallel — is an extension of the traditional NFS implementation that distributes the functionality of the server over several processes on the cluster. The idea behind dNFSp is to improve the performance and scalability and, at the same time, keep the system simple and fully compatible with standard NFS clients. Similarly to PVFS, dNFSp makes use of I/O daemons – IOD’s – to access data on the disks. The role of the NFS server is played by the meta-data servers – meta-servers. These are seen by the clients as regular nfsd servers; when a request for a given piece of data is received, the meta-server forwards the request to the corresponding IOD to perform the operation and answer to the client. dNFSp distributes the meta-server function onto several computing nodes. The I/O daemons are shared by all meta-servers. However, each meta-server is associated to only a subset of the clients. When using dNFSp, if all the clients access the file system at the same time, the load will be distributed among the meta-servers. This is specially important in write operations, since clients send whole data blocks to the meta-servers. Figure 2 illustrates a typical read operation in dNFSp. The client sends a normal NFS read request to its associated meta-server. The meta-server then forwards the request to the IOD storing that piece of the file. After processing the request, the IOD answers directly to the client. Write operations follow the same process, but since the whole block of data needs to be sent from the client to meta-server and then to IOD, the double copy makes this operation significantly slower than read ones. The synchronization of the meta-servers is done via an LRC 1 based algorithm, in which data is updated in one node only when it needs access to it. The employment of LRC allows nodes to have outdated information, leaving the system partially unsynchronized. Meta-data information for each file is assigned to an specific meta-server via a hashing function. The designed meta-server will be responsible for keeping track of the changes done to this specific file and every time other server needs up-to-date information, it will request a copy of the meta-data for this server. V. T WO -F OLD A PPLICATION I NTERACTION ON D NFS P The temporal access patterns of applications running concurrently on a cluster might influence their I/O performance. It’s important to comprehend this interaction on the dNFSp file system due to the inherent lower performance of write 1 Lazy

Release Consistency

operations. It’s also important to understand the point at which this influence becomes visible to other applications. To study the performance of dNFSp under the load from applications with distinct behaviors, we executed two sets of tests using MPI-IO Test 21[13]. This synthetic benchmark uses MPI and MPI-IO to simulate an application writing an specified number of objects of a given size. The tests are constituted of two instances of the benchmark, called foreground and the background respectively, over a shared dNFSp system. The performance of the foreground test was measured as the configuration of the background test changed. This allows us to evaluate the interference of the I/O behavior of the background instance over the foreground. We evaluated two different scenarios with this approach. In the first one, we executed the two instances of the test concurrently over independent sets of clients, changing the number of nodes involved in the I/O. The objective of this test is to evaluate the performance of an IO-bound execution under the influence of concurrency in the shared system and how dNFSp scales over to independent sets of nodes. The second scenario evaluates the I/O performance of the two instances of the test on the presence of different temporal access patterns. The foreground instance executes a modified version of MPI-IO Test 21 that allows configuring a time to wait between reading or writing two objects. This wait time simulates the time that a real application would spend operating over the data stored in the file system. Both instances are executed during a fixed time to make sure that both are making concurrent access to the file system. During this period, both applications write as many objects as the execution time allows them to. The tests were executed on the Pastel cluster, part of the Toulouse site of Grid5000 [3]. All the nodes are equipped with AMD Opteron processors of 2.6GHz, 8Gb of RAM and 250GB SATA hard disks. The nodes are interconnected by a Gigabit Ethernet network. dNFSp was configured with 6 nodes, each acting as both meta-server and IOD. 24 nodes acted as clients, being evenly distributed among the meta-servers. These nodes were divided in two sets, each executing an instance of the benchmark. When all clients make use of the file system, each meta-server is responsible for the requests of at most 4 clients. Each configuration of the tests was executed 6 times. The bandwidth presented in the graphs is the average write bandwidth taken from these executions. VI. R ESULTS In this section we present the results obtained with the benchmarks described in the previous section. Each section describes the results obtained in one of the scenarios described previously. A. I/O Bound Applications Interaction In this test, the objective is to evaluate the performance of the foreground instance of the benchmark for different number of nodes both in the foreground and background instances. Figure 3 presents the write bandwidth of the foreground instance of the benchmark. Each line represents the bandwidth for a given number of nodes in foreground. The X-axis represents the number of nodes used in the background instance of the benchmark. Figure 3a shows the results for when the number of objects of the benchmark is constant per node - i.e., the more nodes executing the benchmark, the more data is generated in the file system. In figure 3b, the number of objects written by each node was set as to generate the same final amount of data. For this test, object size was set to 128Kb. For the constant number of objects case, the number of objects was set to 1024, resulting in 128Mb for each node. In the second case, the total amount of data was set to 1Gb for each application. The results indicate that an application with few nodes performing intensive I/O do not suffer significantly with the interference of other applications. For the 2-processors foreground instance, the performance fell from 25.75M b/s to 17.83M b/s (69%) when the number of processors in the background ranged from 2 to 12 in (a) and from 24.99M b/s to 24.32M b/s (97%) in (b). The total bandwidth of the file system (bandwidth from both instances), on the other hand, went from 51.49M b/s to 127.16M b/s in (a) and 49.82M b/s to 143.88M b/s in (b). This stability in the performance of the small-sized benchmark is mainly due the NFS client-side caching policies. Even on bursty I/O phases, the write operations are delayed on the client, avoiding that they make full use of the available bandwidth on the file system. This way, the background instance of the benchmark can scale in size, improving it’s performance and the overall performance of the system. As the foreground benchmark size grows, sensibility to concurrency begins to stand out more clearly, as we see the bandwidth fall from 109M b/s to 76.87M b/s for 12 processors in case (a). While the absolute difference becomes more noticeable, the fall ratio from 2 to 12 is close to 70% for all cases in (a). The same does not happen when the total data size is constant: the ratio ranges from 97% for foreground with 2 processors to 66% for 6 and 74% for 12. Since more clients perform better on dNFSp and the amount of data per node decreases with

��

��

��

��

��

��

��

��

��

��

��

� � � � ��

��

��

� � � � ��

��

��

��

��

��

��

� �

�

�

�

��

��

��

(a) Constant Number of Objects Figure 3.

�

�

�

�

��

��

��

(b) Constant Total Data Generated

Interaction between two identical applications with different number of nodes ��

��

��

��

��

Figure 4.

� � � ��

Baseline and Combined Bandwidth for 12-Clients-Foreground and Background Instances

the higher number of nodes, the execution time decreases with the growth of the benchmark. This way, the slice of time in which both instances are executed concurrently gets smaller as the number of nodes of each instance diverge. This explains the higher bandwidth obtained with this case. The results also presented sensibility to the alignment of nodes over the meta-servers. Since the distribution of clients over the file system is static, different number of clients on different applications may result in a few overloaded meta-servers. This can be observed in Figure 3a for foreground with 6 and 12 clients: when background has 6 clients, the number of clients in each meta-server is the same. As the number of clients in the background instance grows to 8, the distribution gets unbalanced and the performance has a more significant drop. Figure 4 shows the combined bandwidth of the two instances of the benchmark for case (a) with 12 clients in the foreground instance. This combined bandwidth is combined with a baseline execution, i.e. the bandwidth from an execution of MPI-IO Test 21 with the same number of nodes as the two instances of the concurrent benchmark would have. As we can see, the baseline bandwidth grows more rapidly until it has 18 clients (3 clients per meta-server). After that, it continues growing at a smaller rate. Since the clients in the baseline execution are evenly distributed along the meta-servers, the performance does not suffer from alignment problems. The combined bandwidth, on the other hand, drops after the number of clients gets too unbalanced, reaching the same performance as baseline once a good alignment is reached once again (24 clients, 4 per meta-server). These results show that, while dNFSp is able to scale the overall bandwidth of the system on the presence of concurrent applications, the performance may be negatively influenced by bad client-to-server alignment.

B. Temporal Pattern Interaction

��

The objective of this test is to study the bandwidth of both instances of the benchmark in the presence of a well-defined temporal access pattern. While the background instance executes the standard benchmark, the foreground instance was configured to wait a predetermined amount of time between it’s I/O operations. Figure 5 shows the performance for the I/O portion of the benchmark under different time intervals. The X-axis represents the amount of time that the foreground instance spends before writing each of it’s objects. The tests were executed with object sizes of 128Kb, 2Mb and 4Mb, to allow us to study the relation between the amount of data written to the amount of I/O idle time. The foreground and the background instances of the benchmark were executed with 12 clients. This graph only presents the I/O bandwidth, not including the time spent in between writes. As we see in Figure 5a, with intervals ranging from 10 to 500ms, the write bandwidth for the tree object sizes remained seemingly constant. The 128Kb object size apparently takes better advantage of the client’s caching subsystem, since it outperformed the cases with greater write-unit size. As the interval reaches 5 seconds, bigger object-sizes have a significant decrease in performance, while the small object-size case manages to keep close to the original performance. In Figure 5b, we see the performance of the background instance of the benchmark. Since this instance does not wait for the specified interval, it profits from the idle times in the foreground instance. With 128Kb size objects, the background instance has a significant performance improvement with the increase in the interval. This is due to the very short I/O bursts from foreground, leaving the system free to process the requests from the background. For 2Mb and 4Mb sized objects, on the other hand, I/O bursts from the foreground instance are longer, and even with long periods of exclusiveness, the background instance gets a more modest increase in it’s write bandwidth. ��

��

��

��

��

��

��

��

��

��

��

��

��

�

��

��

��

��

��

��

�

��

��

��

��

��

��

��

��

(a) Foreground

(b) Background Figure 5.

Bandwidth for different wait times

One question that arises from these tests is how much the bandwidth of the two instances diverge with the different intervals and what’s the point in witch this difference starts to get more noticeable. Figure 6 shows the write bandwidth for the foreground and background instances of the benchmark and their combined bandwidth for objects sizes of 128Kb, 2M b and 4M b. Immediately below to each case is the time ratio for the foreground instance, i.e. the ratio between the time spent on I/O over the total execution time, including the waits. As seen in Figure 6a, the bandwidth for both instances is very close for intervals of 1 and 10ms, diverging for larger intervals. The time ratio is 0.9, 0.74 and 0.41 for intervals 1, 10 and 50ms respectively. In Figure 6b, the divergence started with intervals of 500ms, when time ratio dropped from 0.87 to 0.61. Figure 6c shows that with 4M b sized objects, divergence starts at the same point as with 2M b, with ratio dropping from 0.93 in 100ms to 0.74 in 500ms. This leads us to infer that the minimum time ratio in which both instances will still have similar performances is proportional to the size of the objects. This is due to the relation of the wait interval to the amount of data written after it. With small sized objects, the idle interval can become significantly longer than the time to write one single object. Since such interval happens between writing two objects, the total execution time becomes dominated by the idle time and the ratio tends to fall more quickly with longer intervals. With bigger sized objects, small idle times are less significant to the total execution time and so the ratio takes longer to fall.

Nonetheless, with the increase of the idle time, the background will profit from higher bandwidth availability during such periods of inactivity and, as a result, will increase it’s I/O bandwidth. The reason for the decrease in the bandwidth of the foreground instance with the longer intervals, as seen for object sizes of 2 and 4M b remains to be studied. The results show that dNFSp suffers some contention in the presence of temporal access patterns. Combined bandwidth for 128Kb sized objects was shown in Section VI-A to reach up to 150M b/s, but was shown to peak only 106M b/s in these tests. This degradation was seen both in the foreground and background instances of the benchmark. The lower performance in the foreground was expected, since it would do bursty requests to an already overloaded file system and the idle times could influence the production of ready-to-send NFS requests in the client’s cache. Since the background instance also had a lower performance (by the tests shown previously, the background was expected to reach around 80M b/s), we can affirm that the dNFSp servers themselves are influenced by the presence of the temporal access pattern.

��

��

��

��

��

��

��

��

��

��

��

��

��

��

� �

� �

��

��

��

��

�

�

�

��

��

��

�

��

��

��

��

�

��

(a) 128Kb

��

��

��

��

��

(b) 2Mb Figure 6.

��

��

��

��

(c) 4Mb

Bandwidth Divergence and Time Ratio

VII. C ONCLUSIONS AND F UTURE W ORK In this paper, we have shown the behavior of dNFSp file system when two different applications make heavy use of it. We have tested two distinct scenarios that we considered relevant to the use-cases where dNFSp might be employed. The results for the first scenario indicate that the performance for an individual application is affected by the execution of other applications over the shared system, but that performance for the whole file system was not degraded by the individual executions. The results also highlighted the performance issues when there were uneven distribution of clients over the servers, decreasing the combined bandwidth of the system due to the overloaded servers. For the second scenario, we could observe that long idle I/O periods in one application allowed for increased I/O performance in a concurrent execution. On the other hand, the performance for both applications was below the performance observed in the first scenario, indicating that dNFSp suffer deterioration as a whole with the spaced bursty I/O periods. We also could observe that the time ratio at which the performance of both applications diverge is proportional to the size of the objects written in the system. R EFERENCES ´ [1] R. B. Avila, P. O. A. Navaux, P. Lombard, A. Lebre, and Y. Denneulin. Performance evaluation of a prototype distributed NFS server. pages 100–105. [2] D. H. Bailey. The NAS parallel benchmarks. International Journal of Supercomputer Applications, 5(3):63–73, 1991. [3] R. Bolze, F. Cappello, E. Caron, M. Dayde, F. Desprez, E. Jeannot, Y. Jegou, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard, E.-G. Talbi, and I. Touche. Grid’5000: A large scale and highly reconfigurable experimental grid testbed. International Journal of High Performance Computing Applications, 20(4):481–494, 2006. [4] S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. Parallel i/o prefetching using mpi file caching and i/o signatures. In SC ’08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1–12, Piscataway, NJ, USA, 2008. IEEE Press.

[5] P. H. Carns, I. I. I. Walter B. Ligon, R. B. Ross, and R. Thakur. Pvfs: a parallel file system for linux clusters. In Proceedings of the 4th conference on 4th Annual Linux Showcase and Conference (ALS’00), pages 28–28, Berkeley, CA, USA, 2000. USENIX Association. [6] P. F. Corbett and D. G. Feitelson. The vesta parallel file system. ACM Trans. Comput. Syst., 14(3):225–264, 1996. [7] A. E. Darling, L. Carey, and W.-C. Feng. The design, implementation, and evaluation of mpiblast. In In Proceedings of ClusterWorld 2003, 2003. [8] B. Fryxell, K. Olson, P. Ricker, F. X. Timmes, M. Zingale, D. Q. Lamb, P. MacNeice, R. Rosner, J. W. Truran, and H. Tufo. Flash: An adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. The Astrophysical Journal Supplement Series, 131(1):273–334, 2000. [9] P. Gu, J. Wang, and R. Ross. Bridging the gap between parallel file systems and local file systems: A case study with pvfs. In Parallel Processing, 2008. ICPP ’08. 37th International Conference on, pages 554–561, Sept. 2008. [10] J. H. Hartman and J. K. Ousterhout. The zebra striped network file system. ACM Trans. Comput. Syst., 13(3):274–310, 1995. [11] Kim, G. H. Kim, Minnich, and McVoy. Bigfoot-nfs: A parallel file-striping nfs server (extended abstract), 1994. [12] H. Lin, X. Ma, P. Chandramohan, A. Geist, and N. Samatova. Efficient data access for parallel blast. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pages 72b–72b, April 2005. [13] Los Alamos National Laboratory (LANL). MPI-IO Test 21, 2009. Available in http://institutes.lanl.gov/data/software/#mpi-io. [14] S. Microsystems. Lustre file system. lustrefilesystem wp.pdf.

White Paper, 2008.

Available in http://www.sun.com/software/products/lustre/docs/

[15] N. Nieuwejaar, D. Kotz, A. Purakayastha, C. S. Ellis, and M. L. Best. File-access characteristics of parallel scientific workloads. IEEE Trans. Parallel Distrib. Syst., 7(10):1075–1089, 1996. [16] J. Oly and D. A. Reed. Markov model prediction of i/o requests for scientific applications. In Proceedings of the 16th international conference on Supercomputing (New. ACM Press, 2002. [17] F. Schmuck and R. Haskin. Gpfs: A shared-disk file system for large computing clusters. In FAST ’02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, page 19, Berkeley, CA, USA, 2002. USENIX Association. [18] S. Seelam, I.-H. Chung, D.-Y. Hong, H.-F. Wen, and H. Yu. Early experiences in application level i/o tracing on blue gene systems. Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pages 1–8, April 2008. [19] N. Tran and D. A. Reed. Arima time series modeling and forecasting for adaptive i/o prefetching. In Proceedings of the 15th international conference on Supercomputing (ICS ’01), pages 473–485, New York, NY, USA, 2001. ACM. [20] C. Winstead, H. Pritchard, and V. McKoy. Parallel computation of electron-molecule collisions. IEEE Comput. Sci. Eng., 2(3):34–42, 1995. [21] P. Wong and R. F. Van der Wijngaart. NAS parallel benchmarks I/O version 2.4. Technical Report NAS-03-002, NASA Advanced Supercomputing (NAS) Division, Moffett Field, CA, Jan. 2003. [22] Y. Zhu, H. Jiang, X. Qin, D. Feng, and D. R. Swanson. Design, implementation and performance evaluation of a cost-effective, fault-tolerant parallel virtual file system. International Workshop on Storage Network Architecture and Parallel I/Os, held with 12th International Conference on Parallel Architectures and Compilation Techniques, 2003.

Legión - Sistema de Computación en Grid Legion - Grid Computing System Genghis Ríos Kruger Pontificia Universidad Católica del Perú, Dirección de Informática Académica [email protected] Martín Iberico Hidalgo Pontificia Universidad Católica del Perú, Dirección de Informática Académica [email protected] Oscar Díaz Barriga Pontificia Universidad Católica del Perú, Dirección de Informática Académica [email protected] Abstract It is not often possible to acquire a large computing infrastructure dedicated to the execution of computationally intensive applications within university environments in developing countries due to their high cost. Drawing on the resources available in the computer labs of universities themselves, the problem can be solved by implementing a Desktop Grid Computing. This document shows the Legion System, which allows you to manage multiple projects using computer-intensive Desktop Grid Computing implementation raised by the BOINC infrastructure. With the solution implemented in Legion, researches in different areas will have access to broad existent computacional resources inside university campuses, through a web user interface, hidding the complexity of the grid computing system to the end user. 1. Introducción La computación en grid tiene sus orígenes a principios de los años 80’s, donde la investigación intensiva en el campo de la algoritmia, la programación y arquitecturas que soportan paralelismo, provocaron que los primeros éxitos en este campo ayuden a otros procesos de investigación en diversas áreas. A esto último se agrega el exitoso uso de la computación grid en la ingeniería y en el campo empresarial [1], permitiendo la ejecución de procesos en múltiples equipos, pudiendo tener cada uno de ellos características heterogéneas, sin la necesidad de estar ubicados en ambientes dedicados. Durante los últimos años, se han enfocado los esfuerzos en la investigación y desarrollo de protocolos, servicios y herramientas que facilitan la implementación de sistemas en grid escalables [9]. Entre las principales

implementaciones tenemos a Condor y BOINC, como sistemas de gestión de trabajos, y a Globus Toolkit, como un conjunto de módulos independientes [10] elaborados para el desarrollo de servicios orientados a las aplicaciones de computación distribuida [11]. La infraestructura BOINC [2], Berkeley Open Infrastructure for Network Computing por sus siglas en inglés, surge con la finalidad de aprovechar la donación voluntaria de recursos de computadores de escritorio distribuidos alrededor del mundo, donde los usuarios brindan un porcentaje de dichos recursos a investigaciones científicas de todo tipo. Además, BOINC plantea su uso como una Desktop Grid, donde se realiza el aprovechamiento de recursos computacionales localizados en ambientes supervisados o controlados, dentro de una organización en particular.

Conforme pasa el tiempo, se tiene una mayor cantidad de aplicaciones computacionalmente intensivas, por lo que en muchos casos se hace necesario de una infraestructura de hardware muy costosa. Una posible solución se encuentra en el uso de Desktop Grid Computing, que permite el aprovechamiento de recursos disponibles en ambientes controlados locales o cercanos, siendo para nuestro estudio, computadores ubicados en los laboratorios dentro de un campus universitario. El objetivo principal del trabajo realizado es canalizar esta capacidad de procesamiento hacia proyectos de investigación computacionalmente intensivos, realizando un manejo eficiente de la infraestructura existente dentro de laboratorios informáticos, y motivando a investigadores a realizar estudios más ambiciosos. 2. La infraestructura BOINC BOINC es una plataforma de código abierto que permite a diversos proyectos hacer uso de capacidad computacional ociosa la cual proveniente de computadores ubicados alrededor del mundo de manera voluntaria. Una computadora ejecutando el cliente BOINC puede realizar peticiones al servidor BOINC en busca de tareas disponibles y, si las encuentra, empieza a trabajar para el proyecto al cual se ha unido procesando los datos asociados. El cliente BOINC puede ser limitado en la cantidad de recursos computacionales que puede utilizar, priorizando las necesidades del usuario que utiliza el computador. En el presente caso los usuarios de los laboratorios no deben percibir cambios en el rendimiento del equipo. La flexibilidad y control que permite el cliente BOINC sobre los recursos donados hacen que la infraestructura BOINC sea adecuada para implementar la solución planteada. Con BOINC el proceso general se divide en tareas pequeñas llamada workunits. Tales tareas deben ser independientes unas de las otras en cuanto a secuencia de procesamiento,

lo que se conoce como “paralelismo perfecto” [3]. Entre las características más importantes de BOINC, relacionadas con la implementación de la solución en este proyecto, tenemos: Autonomía de Proyectos: Varios proyectos independientes pueden definirse en el servidor BOINC. Administración de Recursos: Cada uno de los clientes BOINC puede participar en múltiples proyectos, permitiendo configurar la cantidad de recursos asignados a cada uno de éstos: cpu, ram, espacio en disco, velocidad de red. Disponibilidad de Recursos: El cliente BOINC se pueden configurar para disponer permanentemente de la capacidad de procesamiento de la computadora anfitriona, esté o no siendo utilizada por sus usuarios habituales. Multiplataforma: Disponibilidad del cliente BOINC para múltiples plataformas: Windows, Linux y Mac OS X. 3. La Arquitectura Legión A diferencia de la implementación tradicional de BOINC, utilizada como plataforma de encolamiento y distribución de workunits, el Sistema Legión es un desarrollo propio destinado a la administración a alto nivel de usuarios y proyectos, automatizando los procesos de generación y recopilación de tareas, así como el control de computadores cliente. La arquitectura está formada por: • •

•

•

La plataforma BOINC, como aplicación base. Base de datos en MySQL para la administración de usuarios, computadoras, proyectos y tareas. Servidor de archivos web en Apache para almacenamiento de los resultados de cada tarea. Aplicaciones en PHP para la interfaz web de usuario.

•

Aplicaciones en C++ para la generación de workunits y resumen de los resultados.

•

Herramientas en java para el mantenimiento y administración de los computadores.

Figura 1. La Arquitectura Legión Una característica particular de la computación en grid es que los equipos que la conforman no son necesariamente computadores dedicados. Esta propiedad, denominada pertenencia dinámica, implica que cualquier equipo puede adherirse o abandonar una determinada grid en tiempo de ejecución. Además un equipo que forme parte de una determinada grid no implica que deje de ser operativo para cualquier otro tipo de tarea, pudiendo seguir siendo utilizado por sus usuarios habituales [6]. 3.1. Interfaz Web Es la interfaz que permite el acceso de múltiples usuarios a múltiples proyectos, que forman parte del Sistema Legión, vía web. Esta interfaz fue desarrollada utilizando el lenguaje de programación PHP, requiriendo en algunos casos, el uso de técnicas de programación AJAX vía el framework Prototype [4], para indicar el progreso o avance de ejecución de una tarea. La interfaz permite a los usuarios la generación y observación automática de diversas tareas que les son permitidas, además de la finalización de éstas para la posterior descarga del resultado respectivo.

En la mayoría de los casos la interfaz es única para cada proyecto, dado que muchos de los parámetros de creación de las tareas son específicos al proyecto correspondiente. 3.2. Interfaz Generadora de Tareas Es la aplicación en C++ encargada de la generación automatizada de workunits. Utiliza la función create_work, propia del API de BOINC, para invocar al demonio Work Generator propio de BOINC (ver Figura 1). 3.3. Interfaz Servidor - Cliente Es un conjunto de aplicaciones realizadas en Java, que permiten comunicar al servidor Legión con los computadores clientes a través de BOINC RPC. De este modo los computadores pueden realizar peticiones forzadas de tareas hacia el servidor, minimizando el tiempo muerto de espera de las peticiones de tareas provenientes desde los clientes BOINC. 3.4. Administración cliente

de

computadores

El uso de una Desktop Grid implica la necesidad de administrar, de forma automática

Interaction of Access Patterns on dNFSp File System

Interaction of Access Patterns on dNFSp File System

Suggest Documents

Evaluating the Performance of the dNFSP File System

Evaluating the Performance of the dNFSP File System - Google Sites

File Access Patterns in Coda Distributed File System - Semantic Scholar

Visualizing Predictability of File Access Patterns

FILE SYSTEM ACCESS FROM RECONFIGURABLE ... - CiteSeerX

The Optimistic Direct Access File System: Design

Interaction patterns

File-Access Patterns of Data-Intensive Workflow Applications and ...

Shockvertising: conceptual interaction patterns as constraints on ...

Effective Visualization of File System Access-Control - Ece.umd.edu

SOIL-FILE INTERACTION

SUPPORTING DIFFERENT PATTERNS OF INTERACTION ...

Traffic Patterns on Different Internet Access

On Improving the Memory Access Patterns During

Predicting Memory-Access Cost Based on Data-Access Patterns

Predicting Memory-Access Cost Based on Data-Access Patterns

A Parts-of-file File System - Usenix

The Optimistic Direct Access File System: Design and Network ...

Paranoid: A Global Secure File Access Control System

12 Accelerating File System Metadata Access with ...

Building a User-level Direct Access File System over Infiniband

The Effects of Interaction Techniques on Talk Patterns in ... - Microsoft

An interaction network perspective on the relation between patterns of ...

A Survey on Distributed File System Technology