Prototypes of a Computational Grid for the Planck ...

7 downloads 1000 Views 150KB Size Report
locate data store data get data repository. Metadata pipeline run. (RSL) notify results. (gsiftp) ... and make easy data recovery and post-processing. An example of ...
ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XIV ASP Conference Series, Vol. 347, 2005 P. L. Shopbell, M. C. Britton, and R. Ebert, eds.

Prototypes of a Computational Grid for the Planck Satellite Giuliano Taffoni, Giuliano Castelli, Riccardo Smareglia, Claudio Vuerli, Andrea Zacchei, and Fabio Pasian National Institute for Astrophysics, OATs, Trieste, Italy Davide Maino University of Milano, Milan, Italy Giancarlo Degasperis University of Rome “Tor Vergata”, Rome, Italy Salim G. Ansari and Jan Tauber European Space Agency, ESRIN, Holland Thomas Ensslin Max Planck Institute for Astrophysics, Garching, Germany Roberto Barbera National Institute for Nuclear Physics, Catania, Italy Abstract. A prototype of a Computation Grid is designed to assessing the possibility of developing pipeline setup for processing Planck Satellite data. The amount of data collected by the satellite during its sky surveys requires an extremely high computational power both for reduction and analysis. For this reason a Grid environment represents an interesting layout to be considered when processing those data.

1.

Introduction

The ESA Planck satellite mission will fly in 2007. This experiment is aimed to map the microwave sky performing at least two complete sky surveys with an unprecedented combination of sky and frequency coverage, accuracy, stability and sensitivity (Tauber 2000). Planck is composed of a number of microwave and sub-millimeter detectors which are grouped into a high frequency instrument (HFI) and a low frequency instrument (LFI) (Pasian 2002) and covers a frequency range from 30 up to 850 GHz. All levels data processing is assign to the two Data Processing Centers (DPCs), one for the LFI centralized at OAT in Trieste, Italy and one for the HFI distributed between Paris, France, and Cambridge, UK. Both DPCs share a site producing an Early Release Compact Source Catalog (IPAC, Pasadena, USA) 320

Prototypes of a Computational Grid for the Planck Satellite

321

PlanckGrid Applications

Application Specific Environment

User Interface Computing Resources

GRID Middleware

Primergy

Storage Resources

Figure 1.

Structure of the Planck@Grid application deployment.

and a site gathering and documenting the final results of the mission, located at MPA, in Garching, Germany. The amount of data produced by the whole mission and by the necessary post-processing is a challenging task both in terms of storage and computational needs (Bond et al. 1999). For example only the LFI DPC is in charge to process 100 GB of data. PlanckGrid is a project whose main goal is to verify the possibility of using a Grid Technology to process Planck Satellite data (Smareglia et al. 2004). The project is exploring scientific and technical problems that must be solved to develop GRID data reduction applications and make them available to the Planck community. In this paper we describe the prototype of a specialized environment based on the GRID middleware, to support Planck Applications (see Figure 1). This environment must guarantee: retrieval of data from a storage located outside the GRID with http or ftp protocol, distribution of the Planck software (LevelS, Level1 and Level2) and libraries and storage and replica of raw and reduced data with a secure access policy. This project coordinates two main initiatives: • ESA and INAF-OATs joint collaboration; • INAF/GILDA (a test-bed Grid infrastructure setup to host test-bed applications that at a later stage will be proposed as test-bed for EGEE).

2.

The GRID Environment

GRID computing enables the virtualization of distributed computing and data resources such as processing and storage capacity to create a single system image, granting users and applications seamless access to vast IT capabilities. The whole software that underlines the fundamental grid services, such as information services, resource discovery and monitoring, job submission and management, brokering and data management and resource management represents the GRID middleware. The middleware builds upon an number of open source solutions like Globus Toolkit 2 (GT2, Foster & Kesselman 1997) and EDG libraries (Ghiselli 2002). In the case of the ESA-INAF collaboration the grid middleware

322

Taffoni et al.

run ine l e pip RSL) (

UI

RB Grid Site

get results

Primergy

Metadata repository

store data

(gsiftp)

Replica Manager

ata e d ice) t a v loc Ser b (We

get data

notify results

query data

WN

SE

Grid environment

Figure 2. Structure of the Planck simulation workflow. The application environment check on a metadata repository if a simulation was already done using an XML description of the Cosmological and instrumental parameters. If the data exists the application download/reduce them otherwise a new simulation started. The simulated data are stored on the Grid. The application get the metadata that describe the simulated data and store them on the metacatalog repository for future download or post-processing.

is based on GT2 and ESA supplied a proprietary workflow tool: GridAssist1 . This workflow tool acts as a resource broker of the computational and storage resources. It is already well tested on GAIA Grid (Ansari 2004). In the case of INAF-GILDA collaboration, the production GRID is supplied by the Istituto Nazionale di Fisica Nucleare (INFN) of Catania. Grid INFN Laboratory for Dissemination Activities (GILDA2 ) was developed as part of the Italian INFN Grid project and the Enabling Grids for E-science in Europe (EGEE3 ) project as testbed for the EGEE applications. The EGEE middleware based on LCG (Robertson 2001) provides some basic services like a User Interface (UI), a full equipped job submission environment and a data Replica Manager (RM, Kunszt et al. 2003). The UI is a computer of the GRID system that allows users to submit jobs, access DB and store/replicate files. Storage elements and Worker nodes are the GRID resources in charge of data management and computing. Parallel and scalar computing are supplied by the system. 1

http://tphon.dutchspace.nl/grease/public

2

https://gilda.ct.infn.it

3

http://egee-intranet.web.cern.ch/

Prototypes of a Computational Grid for the Planck Satellite

Figure 3.

3.

323

A map of the sky simulated via Grid for a 70 GHz LFI channel.

Planck@Grid

One of the primary issue for the DPCs is to define, design and run a complete simulation of the Planck mission to test the data analysis pipelines. The simulation software must mimic the Planck observing procedure, and any source of systematic effect related to the detectors in a realistic way. As a first test for the PlanckGrid project we concentrate on the simulation software (SW). The ESA/INAF project setup a GRID of 3 sites (ESTEC, OATs and ESRIN) managed by the GT2 and by the GridAssist that acts also as the application environment to run the Planck Applications. In the INAF/GILDA GRID a workflow is built upon the EGEE middleware. The interaction with the GRID middleware is requested for: authentication, data movement, Planck SW distribution, resource selection, and computing (see Figure 2). 3.1.

Working Testbed

We successfully port Planck mission simulation SW (LevelS) on LCG/EGEE Grid. SW is supported by a set of Linux shell scripts that allow to interface the simulation pipeline with Grid services. We use the GRID Job Description Language (JDL, Pacini 2003) to submit the numerical calculations and access to the RM data service. Our prototype distributes the LevelS SW on the Grid (RM), selects the available resources, and runs the pipeline, stores the simulated data and assures data access to the Planck users (all the users joining Planck Virtual Organization - VO). A meta-data schema is used to describe the output (parameters, date,size, etc.) and make easy data recovery and post-processing. An example of the simulation results is shown in Figure 3. We also test the reduction SW on the simulated data. As an example we use the destriping procedure as described by Maino et al. (2002). Our input files are the simulated Time Ordered Data distributed on Grid. We design a pipeline

324

Taffoni et al.

that interacts with RM to locate raw data, identify the computing and storage resource suitable for its needs (using the GRID Resource Broker) and finally process raw data (using the GRID job submission tools). The output is stored on Grid and signed on the metadata repository. It can be accessed by Planck Users (VO). 4.

Conclusions and Future Work

We successfully run a simulation of the Planck mission for LFI and we store the simulated mission data on GRID. This data is described by a set of XML files. The metadata description is still on a prototype stage and more work is required to identify the final semantic. The metadata description for the simulation files is stored on GRID. Data files are available for all the Planck users together with their XML description. We plan to port the whole simulation architecture on EGEE GRID to simulate the whole mission also for HFI. This requires to deploy a stable “application specific layer” and define the metadata description. We also plan to run simulations for different values of the parameters (Cosmological and Instrumental) and to test on GRID the simulated raw data reduction. This implies to port on GRID also the Level2 reduction software and to extend the metadata semantic also to reduced data. Acknowledgments. This work is done with the economical support of the Italian Government and in particular of MIUR. References Ansari, S. G., 2004, in ASP Conf. Ser., Vol. 347, ADASS XIV, ed. P. L. Shopbell, M. C. Britton, & R. Ebert (San Francisco: ASP), 429 Bond, J.R., Crittenden, R.G.,Jaffe, A.H., & Knox, L., 1999, in Computers in Science & Engineering, 1, 21 Foster, I. & Kesselman, C., 1997, Intl J. Supercomputer Applications, 11, 115 Ghiselli, A., 2002, in TERENA Networking Conference, 503 Kunszt, P., Laure, E., Stockinger, H., & Stockinger, K. 2003, in Lecture Notes in Computer Science, Springer-Verlag Heidelberg Vol. 3019, ed. R.Wyrzykowski et al., 848 Maino, D., Burigana, C., Grski, K. M., Mandolesi, N., & Bersanelli, M. 2002, A&A, 387, 356 Pacini, F. 2003, DataGrid-01-TEN-0142-0 2 Pasian, F. 2002, MmSAI, 74, 502 Robertson, L. 2001, CERN/2379/rev Smareglia, R., Pasian, F., Vuerli, C., & Zacchei, A., 2004, in ASP Conf. Ser., Vol. 314, ADASS XIII, ed. F. Ochsenbein, M. Allen, & D. Egret (San Francisco: ASP), 674 Tauber J.,A. 2000, in IAU Symposium, 204, 40

Suggest Documents