Prototype for a Generic Thin-Client Remote

6 downloads 0 Views 33KB Size Report
Prototype for a Generic Thin-Client Remote Analysis Environment ... dure Calls [11], and the Simple Object Access Protocol [10], which also relies on ... being planned for Lizard and Hippodraw using Python as the exported interface matures.
Prototype for a Generic Thin-Client Remote Analysis Environment for CMS C. D. Steenberg, J.J. Bunn, T.M. Hickey, K. Holtman, I. Legrand, V. Litvin, H.B. Newman, A. Samar, S. Singh, R. Wilkinson (for the CMS Collaboration) (California Institute of Technology, Mail Code 356-48, Pasadena, CA, 91125, USA) Abstract The multi-tiered architecture of the highly-distributed CMS computing systems necessitates a flexible data distribution and analysis environment. We describe a prototype analysis environment which functions efficiently over wide area networks using a server installed at the Caltech/UCSD Tier 2 prototype to analyze CMS data stored at various locations using a thin client. The analysis environment is based on existing HEP (Anaphe) and CMS (CARF, ORCA, IGUANA) software technology on the server accessed from a variety of clients. A Java Analysis Studio (JAS, from SLAC) plug-in is being developed as a reference client. The server is operated as a “black box” on the proto-Tier2 system. ORCA Objectivity databases (e.g. an existing large CMS Muon sample) are hosted on the master and slave nodes, and remote clients can request processing of queries across the server nodes, and get the histogram results returned and rendered in the client The server is implemented using pure C++, and use XML-RPC as a language-neutral transport. This has several benefits, including much better scalability, better integration with CARF/ORCA, and importantly, makes the work directly useful to other non-Java general-purpose analysis and presentation tools such as Hippodraw, Lizard, or ROOT.

Keywords: remote analysis, distributed data access, grid

1 Introduction The Compound Muon Solenoid experiment (CMS) analysis environment provides a rich set of tools for doing sophisticated analysis of persistent events stored in the object-oriented Objectivity database. The full set of packages and their dependencies are documented at [3]. The bulk of the analysis work is expected to be carried out either at analysis clusters, or on powerful desktop workstations running one of the supported platforms, namely Linux and Solaris. Analyses can be done either using a) compiled C++ codes, b) using the interactive or batch scripting environment Lizard, with graphical environment provided by IGUANA, or a combination of the above. To use these applications over a wide area network (WAN) is current simple in the case of a) and for b) using Lizard scripting, but may be hampered by network latencies in the case where interactive graphical analysis, e.g 3-D detector and event visualization is performed. Even if this limitation is expected to decrease over time as networks get progressively faster, there may always be resistance to use these applications from bandwidth or resource-constrained sites. This paper describes a prototype alternative interface to the CMS analysis environment, called /em Clarens [2]. This interface aims to provide access to the power of the abovementioned tools over a WAN, by trying to minimize the data flow between the analysis site (or server) and the presentation site (or client).

2 Implementation The first implementation detail that has to be considered, is how the communication between the client and server should be conducted. Client/server computing is, however, well established with a great variety of tools and communications protocols to choose from. For this prototype, we considered the well-standardized Common Object Request Broker Architecture (CORBA) [4], XML Remote Procedure Calls [11], and the Simple Object Access Protocol [10], which also relies on XML (extensible mark-up language).

XML-RPC was subsequently chosen because of its lightweight and stable implementation in numerous programming languages, including C/C++, Java (used by Hippodraw [6] and Java Analysis Studio) [7], and Python (used by Lizard [9] and Hippodraw). A Data Interface Module (DIM) or plugin for JAS is being developed as a reference client implementation, with similar client implementations being planned for Lizard and Hippodraw using Python as the exported interface matures. A C++ client interface is used for initial testing and debugging The XML-RPC protocol relies on XML-encoded data being exchanged over the HTTP protocol generally used for serving web pages. For the Clarens prototype, multi-process C/C++ XML-RPC [12] is spawned by a commodity Apache [1] web server in response to procedure calls using the Common Gateway Interface (CGI) for server side applications. This obviates the need to implement a dedicated HTTP server, avoiding the associated security risks of a newly implemented server using untested code. It also allows the server to do authentication of users and encryption of network traffic without any change in the server code. The XML-RPC implementation used also allows the use of a multi-threaded server architecture, but this cannot be utilized at present due to the non-threaded nature of the Objectivity libraries.

3 Exported interface The currently implemented interface allows remote access to persistent histograms and tags stored in a Objectivity database federation. The available methods are:

Method getDBs getHistos getHisto1D getHisto1Dcol getHisto2D getHisto2Dcol getTags getColumns getTagColumnInfo getTagColumn

Table 1: Methods Description Provides a list of databases in the federation Lists the histograms in a database Provide number of bins and min/max range of the 1D histogram Sends the 1D histogram’s bins as an array Provide number of bins and min/max range of the 2D histogram Sends the one of the columns of the 2D histogram as an array of values Lists the tags in a database Lists the column names in a tag Provide information on the tag column, including data type and number of values Sends the tag column as an array of values

Note that three of the methods, getHisto1Dcol, getHisto2Dcol, and getTagColumn may send possibly large arrays of values to the client. It is therefore possible to obtain information about the requested tag or histogram before the actual object is requested using the getHisto1D, getHisto2D, and getTagColumnInfo calls. The performance of the server implementation will be discussed in 5.

4 The JAS client As mentioned before, the JAS client DIM is seen as a reference implementation and therefore evolves along with the server, and is the most usable client implementation. It should be noted that the standard RPC interface used by JAS, the Java Remote Method Invocation (RMI) methods are not used because these are Java-specific. Instead the open source Helma Java XML-RPC implementation is used [5]. The JAS DIM provides an AnnotatedEventSource implementation through the LocalDIM interface, which provides a browsable tree of 1D and 2D histograms, as well as rebinnable Tags in a database.

Downloading of histogram and tag data is done in worker threads, with status feedback dialogs, so as not to “freeze” the main GUI. A sample JAS session is shown in Figure 4.

Figure 1: An Example JAS session with persistent tags from a remote database. Rebinning is done via the slider in the tool bar. All the analysis facilities provided by JAS, including rebinning of tags, calculation of statistics, and Java analysis “scripts” can be applied to downloaded tag and histogram objects.

5 Performance The choice of a thin client reduces the software distribution and installation overheads for the users and still gives them access to the powerful CARF framework and other CMS software. An additional benefit of this approach is that it obviates the need for a remote X11 session which requires continuous network traffic to maintain the display on the user’s desktop, with the associated problems such as network latency and bandwidth. Using the Clarens server, tag and histogram objects can be downloaded once and cached by the client, drastically reducing network performance requirements. Two factors are currently limiting the responsiveness of the server to client requests: the database connection initialization time of around 0.8s per connection, and the potentially large message size of

XML messages that clients receive from the server: retrieving a double precision value requires an average of 78 bytes to be transferred to the client. The first limitation may be overcome by using a pool of persistent connections to the database for each federated database. This can be achieved by using the Apache server’s module architecture, which allows one server process to service multiple connections instead of having to spawn a new process for each request [1]. The second limitation may be overcome by compressing the XML messages produced by the server. This feature is part of the HTTP 1.1 protocol standard, but is poorly supported by XML-RPC client applications. Currently the web server performance and server-side processing by the XML-RPC libraries (serializing binary objecting into XML representations) are not performance bottlenecks, as the server can sustain around 40 connections per second on a PC test-bed when no database connection is initiated. This number is expected to be higher for properly performance tuned web server setups, and should increase even further over time considering the amount of resources devoted to web server performance optimization being done in the computing industry.

6 Conclusion We presented details of the Clarens client/server remote analysis prototype for accessing histograms and tags in CMS Objectivity database federations. This system minimizes network traffic by using ondemand data transfer of raw objects to be visualized by the client, as well as client-side caching to avoid duplicate data transfers. Currently, database access is read-only, but work is underway to expand this functionality to include executing database queries on the server. An interface to so called GRID-tools for migrating objects from remote tier N servers to the server running Clarens is also planned as soon as these tools become available. Multiple client implementations are envisioned, thereby maximizing the usefulness of the server to the heterogeneous analysis environments in use by the Physics community. For more information, including a web-based client implemented in the PHP language, is provided on the Clarens page at [2].

References [1] [2] [3]

http://httpd.apache.org/

http://cmsdoc.cern.ch/cmsoo/cmsoo.html. http://heppc22.hep. caltech.edu/ [4] http://www.omg.org/ [5] http://xmlrpc.helma.org/ [6] http://www.slac.stanford.edu/grp/ek/hippo/ [7] http://jas.freehep.org/ [8] http://jas.freehep.org/Documentation/UsersGuide/DIM/default. shtml [9] http://wwwinfo.cern.ch/asd/lhc++/Lizard/ [10] ttp://www.w3.org/TR/SOAP/ [11] http://www.xml-rpc.org/ [12] http://xmlrpc-c.sourceforge.net/

Suggest Documents