Distributed Computing Instrastructure as a Tool for e ...

7 downloads 4013 Views 216KB Size Report
dimension/time scales and for versatile analysis of big data already existing or ..... processing technologies [11] or complex multi-agent simulations [12].
Distributed Computing Instrastructure as a Tool for e-Science Jacek Kitowski1,2(B) , Kazimierz Wiatr1,3 , L  ukasz Dutka1 , Maciej Twardy1 , 1 1 1 Tomasz Szepieniec , Mariusz Sterzel , Renata Slota1,2 , and Robert Pajak  1

2

AGH University, ACC Cyfronet AGH, Krak´ ow, Poland [email protected] Department of Computer Science, AGH University, Krak´ ow, Poland 3 Department of Electronics, AGH University, Krak´ ow, Poland

Abstract. It is now several years since scientists in Poland can use the resources of the distributed computing infrastructure – PLGrid. It is a flexible, large-scale e-infrastructure, which offers a homogeneous, easy to use access to organizationally distributed, heterogeneous hardware and software resources. It is built in accordance with good organizational and engineering practices, taking advantage of international experience in this field. Since the scientists need assistance and close collaboration with service providers, the e-infrastructure is relied on users’ requirements and needs coming from different scientific disciplines, being equipped with specific environments, solutions and services, suitable for various disciplines. All these tools help to lowering the barriers that hinder researchers to use the infrastructure. Keywords: Distributed infrastructure Computing platforms · Clouds and grids

1

·

IT tools and services

·

Introduction

The main goal of research is scientific discovery of unknown phenomena. Among typical three methodologies making new findings realistic: theoretical approaches by using sophisticated analytical methods, experimental investigations with (usually) big and expensive installations and computational studies, making wide use of information technology (IT), the last one has resulted in increasing popularity. Due to the complexity of most of the current problems this is a natural way to harness IT approach for both basic research, especially for extreme dimension/time scales and for versatile analysis of big data already existing or descended from experiments. Hence, computing infrastructures have led to everincreasing contribution to e-Science research, while facing users with demanding technological obstacles, due to complicated IT stuff. In order to prevent the users from the thorny technical problems and to offer them the most efficient and the most convenient way of making research on frontiers and challenges of current science – creation of a more flexible and easy to use ecosystem is required. c Springer International Publishing Switzerland 2016  R. Wyrzykowski et al. (Eds.): PPAM 2015, Part I, LNCS 9573, pp. 271–280, 2016. DOI: 10.1007/978-3-319-32149-3 26

272

J. Kitowski et al.

In this paper we present assumptions and foundations of the distributed computing e-infrastructure as a tool for e-Science. The presented use case covers its implementation for Polish scientists.

2

Issues for e-Infrastructure Creation

Creation of an e-infrastructure needs synergistic effort in several dimensions: 1. Meeting user demands in the field of grand challenges applications. The activity toward a new e-infrastructure should be supported by a significant group of users with real scientific achievements and wide international collaboration as well as by well-defined requirements. 2. Organizational, which is probably the most important, though the most difficult in reality. Two perspectives are significant – horizontal and vertical – equally important and complementing each other. In the horizontal perspective, a federation of computer centres supporting the e-infrastructure with different kinds of resources and competences to cover interests of different groups of users is proposed. Some evident topics are to be addressed, like policy, collaboration rules, duties and privileges of each participant, for smooth and secure operation. Another feature to be attained is efficient use of federation resources by evaluation of computational projects from the community in order to grant them the most appropriate software and hardware environments. In the vertical perspective, organizational involvement of computer, computational and domain-specific experts into e-infrastructure operations is to be introduced for development of the most suitable hardware and software environments for the users, directly dedicated to their needs. Such a kind of collaboration provides the scientific community with necessary expertise, support from the structural, many level helpdesk and training for easy and efficient research using the e-infrastructure. A good example of such organization is Gauss Centre for Supercomputing [1]. 3. Technological, which covers several issues including different computing hardware and software supported with scientific libraries, as well as a portfolio of middleware environments (e.g. gLite, UNICORE, QCG, generic cloud, like OpenNebula) and user-friendly platforms and portals. On that basis more sophisticated, tailored programming solutions can be developed. 4. Energy awareness, being a relative recent development. The problems faced are optimal scheduling strategies of computing jobs among federation resources to minimize energy consumption as a whole. As scale of resources and number of jobs increase, this problem becomes more critical than ever (e.g. [2]). Energy awareness is also a topic that influences selection of computing hardware.

3

Case Study of e-Infrastructure Conceptualization and Implementation

Due to large funding initiative in Poland and as a response to requirements of scientists, the Polish Grid Consortium was established in 2007, involving five

Distributed Computing Instrastructure as a Tool for e-Science

273

of the largest Polish supercomputing centres: ACC Cyfronet AGH in Krak´ ow (the coordinator), ICM in Warsaw, PCSS in Pozna´ n, CI TASK in Gda´ nsk and WCSS in Wroclaw. Members of the Consortium agreed to work as a federation to commence and jointly participate in the PLGrid Programme, to create a nationwide e-infrastructure and significantly extend the amount of computing resources provided to the scientific community. Up-to-date fulfillment of the PLGrid Programme consists of several stages completed in subsequent projects. – PL-Grid Project (2009–2012) aiming at providing the scientific community with basic IT platforms and computing services offered by the Consortium, initiating realization of the e-Science model in the various scientific fields. One of the measurements of success was ranking of all partners’ resources by the TOP500 list (with total performance of 230 Tflops) in fall 2011, with Zeus cluster in Cyfronet located at 81st position. – PLGrid Plus Project (2011–2015) focused on users, involving three kinds of contractors: computer, computational and domain-specific experts, which resulted in introducing 13 scientific domains with specialized software and hardware solutions, together with portals and environments. The total computational power offered by the Consortium was increased by additional 500 Tflops. – PLGrid NG Project (2014–2015) targeting future development of the scientific domains by including into the project subsequent 14 scientific areas, due to rapid increase in demand for services for researchers in other fields. New domain-specific services cover a wide range of specialties – including provision of the specialized software, mechanisms of data storage, modern platforms integrating new type of tools and specialized databases – to speed up obtaining scientific results as well as streamline and automate the work of research groups. – PLGrid Core Project (2014–2015) affirmed recognition of Cyfronet as a National Centre of Excellence, constituting the next step towards cloud computing and handling big data calculations. It aims not only at extension of hardware and software portfolio, but also dedicated accompanying facilities. One of them – a new backup Data Centre is on agenda. A new HPC asset has been installed, called Prometheus, with 1.7 Pflops, and put in operation in May 2015 for the community. It is worth to mention the number of users close to 4000 currently, publishing regularly in highly ranked international journals, often with international collaborators, and many international projects ongoing with the help of the PLGrid infrastructure, funding by FP6, FP7, RFCS, EDA and other international agencies and collaborations. Two books on computing environments, portals, solutions and approaches developed during the Programme have been published by Springer Publisher [3, 4].

4

PLGrid Platforms – Selected Examples

The computing infrastructure offered by the PLGrid infrastructure is not limited only to high performance computing clusters and large storage resources. A set

274

J. Kitowski et al.

of platforms, tools and services is provided, which hide the complexity of the underlying IT infrastructure and, at the same time, expose the actual functions that are important to perform the research. Within this section, capabilities of selected tools are described. 4.1

GridSpace – Web-Enabled Platform for Reproducible, Reviewable and Reusable Distributed Scientific Computing

GridSpace2 [5], as built on top of provided computing capabilities, enables scientists to easily create and run so-called in silico experiments that are featured by: (a) reproducibility – ability to effortlessly run the experiment at another time, by the other researcher or user, on the other computing capacity or using the alternative software, (b) reviewability – ability to effectively examine, verify, assess, test and scrutinize the experiment, (c) reusability – ability to smoothly apply the experiment to the other case, for the other purpose or in the other context.

Fig. 1. GridSpace2 platform layers

GridSpace2 experiments are fully immersed in World Wide Web and structured as workflows composed of code and data items (see Fig. 1). Code items can be written in diverse programming languages and are interpreted by, so called, interpreters, which are implemented as executables and executed through executors on the underlying e-infrastructure. Executables installed on the einfrastructure carry out computations while executors manage and orchestrate

Distributed Computing Instrastructure as a Tool for e-Science

275

computation and data flow. Data items are simply file system elements that are processed, namely read and/or written, when executing code items. In the web layer, code and data items are embeddable as HTML iframe elements, which enables to create mash-up web pages that integrate content of various type and sources, including interactive GridSpace2 experiment items. GridSpace2 is a generic and versatile platform that was applied in experiments from various scientific domains such as chemistry, material and urban engineering, physics and medicine. It was also adopted as a technology for executable scientific papers, namely, Collage Authoring Environment [6] that was integrated with the Elsevier ScienceDirect portal and empowered the first scientific journal issue featuring executable papers. 4.2

InSilicoLab – Science Gateway Framework

InSilicoLab [7] is a framework for building application portals, also called Science Gateways. The goal of the framework development is to create gateways that, on one hand, expose the power of large distributed computing infrastructures to scientists, and, on the other, allow the users to conduct in silico experiments in a way that resembles their usual work. The scientists using such an application portal can treat it as a workspace that organizes their data and allows for complex computations in a manner specific to their domain of science. An InSilicoLab-based portal is designed as a workspace that gathers all that a researcher needs for his/her in silico experiments. This means: (a) capability of organizing data that is a subject or a product of an experiment, i.e., facilitating the process of preparation of input data for computations, possibility of describing and categorizing the input and output data with meaningful metadata as well as searching and browsing through all the data based on the metadata, (b) seamless execution of large-scale, long-lasting data- and computation-intensive experiments. Every gateway based on the InSilicoLab framework is tailored to a specific domain of science, or even to a class of problems in that domain. The core of the framework provides mechanisms for managing the users’ data – categorizing it, describing with metadata and tracking its origin – as well as for running computations on distributed computing infrastructures. Every InSilicoLab gateway instance is built based on the core components, but is provided with data models, analysis scenarios and an interface specific to the actual domain it is created for (see Fig. 2). 4.3

DataNet – Data and Metadata Management Service

DataNet [8] is a service built on top of the PLGrid high-performance computing infrastructure to enable lightweight metadata and data management. It allows creating data-models consisting of files and structured data to be deployed as specific repositories within seconds. One of the main goals of DataNet is to make it usable from the largest set of languages and platforms possible. That is why the HTTP protocol was used as

276

J. Kitowski et al.

Fig. 2. Architecture of the InSilicoLab framework: domain layer, mediation layer with its core services, and resource layer. In the resource layer, Workers (‘W’) of different kinds (marked with colors) are shown.

a basis for transferring data between computing nodes and the service, together with the REST methodology applied to structure the messages sent to and from the repositories. DataNet is fully integrated with the PLGrid’s authentication and authorization system, so existing users can quickly gain access to the service with a fully automated registration process. In order to ensure user data separation, each repository is deployed on a dedicated PaaS platform, which ensures scaling and database service provisioning for structured data. For high-throughput scenarios, it is possible to configure the system to expose several instances of a given repository to increase request processing rate. Another aspect of using DataNet for data management is collaborative data acquisition, which is possible, because a given repository is identified by a unique URL. The URL can be shared among many computing infrastructures, software packages and different users or groups of users to acquire and process data within a single data model. For some collaboration efforts with large amounts of files this introduces structure and means to search the file space. Figure 3 shows the layered architecture of the service. 4.4

Scalarm – a Platform for Data Farming

Executing a computer simulation many times, each with different input parameter values, is a common approach to studying complex phenomena in various science disciplines. Data farming is a methodology of conducting scientific research, considered as an extension of the task farming approach, combined with Design of Experiment (DoE) methods for parameter space reduction, and output data exploration techniques [9].

Distributed Computing Instrastructure as a Tool for e-Science

277

Fig. 3. DataNet architecture

A crucial requirement for efficient application of the data farming methodology is usage of dedicated tools supporting each phase of in silico experiments, following the methodology. Scalarm [10] is a complete platform, supporting the all above-mentioned data farming experiment phases, starting from experiment design, through simulation execution, to results analysis. All Scalarm functions are available to the user via GUI in a web browser (cf. Fig. 4). To perform data farming experiment in Scalarm, a user prepares a simulation scenario, with input parameter types and an application specified. Through

Fig. 4. Basic experiment progress view in Scalarm

278

J. Kitowski et al.

the use of so-called adapters, any application can be run without modification, which allows to use Scalarm in the wide range of scientific disciplines, like metal processing technologies [11] or complex multi-agent simulations [12]. In addition to various scheduling systems for Grids, Scalarm supports several Cloud services [13] and user-defined servers. It also supports different results analysis methods with graphical presentation (see Fig. 4) as well as autonomous input space exploration methods, allowing to change parameter space without user intervention, to satisfy user-defined experiment goal. 4.5

Onedata – Uniform and Efficient Access to Your Data

Grid infrastructures consist of many types of heterogeneous distributed storage systems, managed locally [14]. Taking into account possible different requirements of a user, in terms of access to data [15], it is beneficial to provide a variety of storage systems, which poses challenges for unifying data access. Due to the independence of the centers in the grids, the management of storage systems (storage services) is decentralized. The Onedata system [16] provides unified and efficient access to data stored in organizationally distributed environments, e.g. Grids and Clouds, and it is a complete response to the requirements of end-users, developers and administrators. To offer the required functionalities, Onedata [17] merges and extends: (1) data hosting service, (2) high performance file system, (3) data management system and (4) middleware for developers. While perceived as a high performance file system, Onedata provides access to data via a standard file system interface, offering coherent and uniform view on all data that can be distributed across the infrastructure of a geographically distributed organization. Onedata is a data management system, which allows to manage various storage systems in a cost-effective manner without abandoning the uniform view on data and high performance. Its data management environment consists of: (a) monitoring systems, which gather information about storage utilization, (b) rules definition for automatic data management, (c) event-driven automatic data management based on the rules. To provide high performance and scalability, Onedata is implemented in Erlang and in C language with noSQL database used. Information about metadata and the system state is stored in a fault-tolerant, high-performance, distributed noSQL database to avoid performance bottlenecks and guarantee data security.

5

Conclusions

The realization of the PLGrid Programme fits well with the need of development of an advanced IT infrastructure designed for the implementation of modern scientific research. The well-tailored PLGrid e-infrastructure fulfills researchers’ needs for suitable computational resources and services. It also enables Polish

Distributed Computing Instrastructure as a Tool for e-Science

279

scientific units collaboration with international research organizations, because vast range of services contribute to increase of cooperation between Polish scientists and international groups of specialists from twenty-seven different scientific domains of e-Science. The essential fact is that anyone who is performing scientific research can be the user of the infrastructure. Access to the huge computational power, large storage resources and sophisticated services on a global level is free to Polish researchers and all those engaged in scientific activities associated with any university or research unit in Poland. To obtain an account in the PLGrid infrastructure, enabling access to its computing resources, one should only register in the PLGrid Portal [18]. Since 2010, the PLGrid infrastructure has been a part of the European Grid Infrastructure (EGI), which aims to integrate the national Grid infrastructures into a single, sustainable, production infrastructure. Further strong collaboration and exchange of ideas with EGI is foreseen. Acknowledgements. This work was made possible thanks to the following projects: PLGrid Plus (POIG.02.03.00-00-096/10), PLGrid NG (POIG.02.03.00-12-138/13) and PLGrid Core (POIG.02.03.00-12-137/13), co-funded by the European Regional Development Fund as part of the Innovative Economy programme, including the special purpose grant from the Polish Ministry of Science and Higher Education.

References 1. Gauss Centre for Supercomputing (GCS) (2015). http://www.gauss-centre.eu/ gauss-centre/EN/AboutGCS/aboutGCS node.html 2. Gienger, M.: Towards energy aware scheduling between federated Data Centres. A presentation given at eChallenges International Conference Bristol (2014) 3. Bubak, M., Szepieniec, T., Wiatr, K. (eds.): Building a National Distributed eInfrastructure - PL-Grid: Scientific and Technical Achievements. LNCS, vol. 7136. Springer, Heidelberg (2012) 4. Bubak, M., Kitowski, J., Wiatr, K. (eds.): eScience on Distributed Computing Infrastructure. LNCS, vol. 8500. Springer, Heidelberg (2014) 5. Ciepiela, E., Wilk, B., Hare˙  zlak, D., Kasztelnik, M., Pawlik, M., Bubak, M.: Towards provisioning of reproducible, reviewable and reusable in-silico experiments with the GridSpace2 platform. In: Bubak, M., Kitowski, J., Wiatr, K. (eds.) eScience on Distributed Computing Infrastructure. LNCS, vol. 8500, pp. 118–129. Springer, Heidelberg (2014) 6. Collage Authoring Environment (2015). https://collage.elsevier.com/ 7. Kocot, J., Szepieniec, T., W´ ojcik, P., Trzeciak, M., Golik, M., Grabarczyk, T., Siejkowski, H., Sterzel, M.: A framework for domain-specific science gateways. In: Bubak, M., Kitowski, J., Wiatr, K. (eds.) eScience on Distributed Computing Infrastructure. LNCS, vol. 8500, pp. 130–146. Springer, Heidelberg (2014) 8. Hare˙  zlak, D., Kasztelnik, M., Pawlik, M., Wilk, B., Bubak, M.: A lightweight method of metadata and data management with DataNet. In: Bubak, M., Kitowski, J., Wiatr, K. (eds.) eScience on Distributed Computing Infrastructure. LNCS, vol. 8500, pp. 164–177. Springer, Heidelberg (2014)

280

J. Kitowski et al.

9. Kryza, B., Kr´ ol, D., Wrzeszcz, M., Dutka, L  ., Kitowski, J.: Interactive cloud data farming environment for military mission planning support. Comput. Sci. 13(3), 89–99 (2012) 10. Kr´ ol, D., Kryza, B., Wrzeszcz, M., Dutka, L  ., Kitowski, J.: Elastic infrastructure for interactive data farming experiment. In: Procedia Computer Science, Proceedings of ICCS 2012 International Conference on Computer Science, vol. 9 special issue, pp. 206–215. Omaha, Nebraska (2012) 11. Kr´ ol, D., Slota, R., Rauch, L  ., Kitowski, J., Pietrzyk, M.: Harnessing Heterogeneous Computational Infrastructures for Studying Metallurgical Rolling Processes. In: Proceedings of eChallenges 2014 Conference, 29–30 October 2014. IIMC, Belfast (2014) 12. Laclavik, M., Dlugolinsky, S., Seleng, M., Kvassay, M., Schneider, B., Bracker, H., Wrzeszcz, M., Kitowski, J., Hluchy, L.: Agent-based simulation platform evaluation for human behavior modeling. In: Proceedings of ITMAS/AAMAS 2011, pp. 1–15. Taipei (2012) 13. Kr´ ol, D., Slota, R., Kitowski, J., Dutka, L  ., Liput, J.: Data Farming on Heterogeneous Clouds. In: Proceedings of 7th IEEE International Conference on Cloud Computing (IEEE CLOUD 2014). IEEE (2014) 14. Slota, R., Dutka, L  ., Wrzeszcz, M., Kryza, B., Nikolow, D., Kr´ol, D., Kitowski, J.: Storage management systems for organizationally distributed environments PLGrid PLUS case study. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wa´sniewski, J. (eds.) PPAM 2013, Part I. LNCS, vol. 8384, pp. 724–733. Springer, Heidelberg (2014) 15. Slota, R., Nikolow, D., Skalkowski, K., Kitowski, J.: Management of data access with quality of service in PL-Grid environment. Comput. Inform. 31(2), 463–479 (2012) 16. Dutka, L  ., Slota, R., Wrzeszcz, M., Kr´ ol, D., Kitowski, J.: Uniform and efficient access to data in organizationally distributed environments. In: Bubak, M., Kitowski, J., Wiatr, K. (eds.) eScience on Distributed Computing Infrastructure. LNCS, vol. 8500, pp. 178–194. Springer, Heidelberg (2014) 17. Wrzeszcz, M., Dutka, L  ., Slota, R., Kitowski, J.: VeilFS - a new face of storage as a service. In: Proceedings of eChallenges 2014 Conference, 29–30 October 2014. IIMC, Belfast (2014) 18. PLGrid Portal (2015). https://portal.plgrid.pl

Suggest Documents