40 km away. A dedicated fiber connection between the sites is under ... in principle is a subscription based mechanism, i.e. a server 'offers' a stream and clients ...
Astronomical Data Analysis Software and Systems XV ASP Conference Series, Vol. 351, 2006 C. Gabriel, C. Arviset, D. Ponz and E. Solano, eds.
ALMA Binary Data Transport Mechanism using VOTable Headers Andreas Wicenec, Holger Meuss European Southern Observatory Jim Pisano National Radio Astronomy Observatory Abstract. ALMA will produce very large data rates and volumes. In full operation the correlator will generate up to 60 MB/s of visibility data. These data have to be transported from the correlator on the high site (5000 m) to the ALMA archive, the telescope calibration and the quicklook subsystems, which are all located at the low site (2500 m) some 40 km away. A dedicated fiber connection between the sites is under construction and the interfaces between the subsystems are under development. The actual transport format produced by the correlator has been defined and implemented and is described in this paper in more detail. The format is derived from the SOAP with attachments [1], but instead of the SOAP XML envelope it is using a slightly modified VOTable [2] to keep the description of the binary data. The VOTable uses content ID pointers (CID, RFC2111 [3]) to refer to the binary parts contained in the same Multipart/Related (RFC2387 [4]) container. Such Multipart/Related containers are constructed for each ALMA integration and sent through a multimedia streaming connection implemented in CORBA (TAO[5, 6]).
1.
Implementation
The VOTable format has been defined as a transport format for tabular astronomical data[2]. For small to medium size tables the current version 1.1 of the VOtable standard proved to be efficient and is already implemented in many VO applications. For more demanding applications like tables exceeding a size of about 10 MB the performance of the existing VOTable aware tools reach their limit. In particular if the tabular data is used in a well known environment, like in the dataflow system of a single observatory, these limitations can be removed by a small modification of the VOTable standard allowing the FIELD element to contain a LINK element as shown in the example. This extension is described in appendix A.4 of the VOTable document. However, this only solves part of the actual problem, since references to external binary data would not address the requirement to keep the data together with its description (the metadata). One way out of this has been described already as part of [1] by using the well defined mime container model of RFC2387[4] and internal references to individual parts 501
502
Wicenec, Meuss & Pisano
Figure 1. Example of a multipart container as constructed by the ALMA correlator. The mime related sections as well as the binary parts are shown in black, the VOTable lines in grey and the LINK elements are highlighted in white.
ALMA Binary Data Transport Mechanism
503
as described in RFC2111[3]. The usage of XML allows an implementation, which is conform with the RFC standards, the usage of VOTable provides in addition an IVOA compliant description of the fields including semantic guidelines in the form of UCDs. 2.
A/V Streaming
Efficiently and reliably moving a lot of data from one machine to many others close to real-time is the natural domain of streaming technologies in general. The most widely used are various variants of A/V streaming. A/V streaming in principle is a subscription based mechanism, i.e. a server ’offers’ a stream and clients can ’subscribe’ to that stream. In this scenario the server does not take care of the individual connections, but just makes sure that the data is put correctly in the stream. Usually the streaming server implementation keeps adjustable memory buffers for each of the clients to allow for short network outages or variable bandwidth. Also the streaming client keeps a buffer, which allows for an uninterrupted stream on the application side, which ’consumes’ the stream. A/V streaming also offers support for multicasting although this is usually implemented on top of UDP and thus, depending on the actual streaming protocol implementation, could potentially loose data packets. 3.
Application to ALMA
We have implemented an A/V streaming based mechanism to encapsulate the data produced by the ALMA correlator. Part of these data will be used by the telescope calibration and the quicklook pipelines; in addition all the data will be send to the ALMA archive. This means that there is one sender and at least three consumers of the correlator datastream, which can grow to 60 MB/s. Since the ALMA common software ([7], ACS) is using CORBA as its underlying deployment and communication layer, also this interface is implemented using CORBA. Plain CORBA interfaces are neither designed nor tuned for bulk data transport and thus we are using the A/V streaming service as implemented in TAO[5, 6] in order to minimize the overhead. The current implementation provides multiplexing instead of multicasting, because of package loss problems with the UDP based multicasting in TAO. Fig.1 shows an example of a multipart container as constructed by the ALMA correlator. In order to show only the relevant techniques some parts of the VOTable and some attachments have been removed. Such a container is fully compliant to the respective standards and it is thus even possible to open it with some of the more advanced e-mail clients; in this case the VOTable is shown as plain text and the binary attachments can just be saved to files. In the figure the mime related sections as well as the binary parts are shown in black, the VOTable lines in grey and the LINK elements are highlighted in white. The ALMA correlator and the control and processing cluster for the correlator will be located at the high site at about 5000 m above the sea level; the archive as well as most of the other ALMA computing subsystems will be located at the low site close to San Pedro de Atacama at around 2500 m. There will be a dedicated high speed fiber network connection between the two sites, but the load
504
Wicenec, Meuss & Pisano
Figure 2. Due to the special deployment situation of the correlator as the stream sender with respect to the stream clients we have introduced another component, the BulkDistributer, in order to keep the load on the correlator side and the complexity of the correlator sender as low as possible. on the correlator and the network should be limited to the absolutely necessary. In order to achieve this goal our current plan is to connect the correlator on the hight site with a so-called bulk distributer on the low site. This latter component will then re-distribute the data to the final subscribers. The schematic deployment outline is shown in Figure 2. References [1] [2] [3] [4] [5] [6] [7]
http://www.w3.org/TR/SOAP-attachments http://www.ivoa.net/Documents/latest/VOT.html http://www.ietf.org/rfc/rfc2111.txt http://www.ietf.org/rfc/rfc2387.txt http://www.cs.wustl.edu/~schmidt/TAO.html http://www.cs.wustl.edu/~schmidt/av.html http://www.eso.org/~almamgr/AlmaAcs