of a high performance graphics workstation acting as a front-end, connected to a .... system (SB uses X-windows) is able to allow the user to perform and monitor ...
Towards Problem Solving Environments for High Performance Computing Marius Cornea-Hasegan, Calin Costian, Dan C. Marinescu, Ioana Maria Martin, and John R. Rice Computer Sciences Department Purdue University West Lafayette, IN 47907, USA Email: ( cornea, costian, dcm, boier, jrr ) @cs.purdue.edu Ph: 1-317-494-6010, FAX: 1-317-494-0739, 1-317-496-1640
Abstract
A typical computing environment for solving Grand Challenge Problems consists of a high performance graphics workstation acting as a front-end, connected to a variety of remote supercomputers. A Problem Solving Environment (PSE) is a system of programs designed to assist a scientist in solving his/her problems without detailed understanding of parallel machines, and guiding him/her through the maze of computing options and mountains of data. A PSE for Structural Biology, SB, is under development at Purdue University to aid biologists manage such computations. This paper discusses the functions supported by SB and proposes it as a generic model for such environments.
1 Introduction
Using a massively parallel processing system, MPP , is a fairly challenging task which should not be undertaken if the scientist or engineer has a problem which can be solved using traditional systems. The scientist or engineer working on problems that require MPP s is faced not only with the scienti c and engineering diculties inherent to the problems being solved, but also with endless challenges of using the present generation of MPP s. The programming environments supported by MPP s are not user friendly and concurrent program execution is inherently more complex than the sequential one. The amount of data generated is often so large that one needs to carefully organize and lter it. A user often uses several MPPs and yet moving hundreds of Mbytes of data from one one location to another over medium speed links may require more time than the actual computation itself. In this paper we discuss a problem solving environment SB to provide better access to MPP s for a typical user solving a very large problem in structural biology and which provides 1
guidance in solving his/her problem. This application is typical of computing environments for solving Grand Challenge Problems using high performance graphics workstations acting as front-ends for a variety of remote MPP s connected via medium speed networks (10 to 100 Mbps). The PSE is a system of programs running on the front-end and designed to assist a scientist in solving his/her problems without detailed understanding of parallel machines, of the idiosyncrasies of the systems involved, and capable to provide transparent access to remote facilities and to monitor and probe the execution of the program. The environment should assist the end user in making complex decisions like: (a) Which one of the remote MPPs with dierent architectures the user has access to, is best suited for solving problem Q; (b) How many processors should be used to solve problem Q on machine M; (c) How to actually run program P needed to solve problem Q on the machine M, assuming that such a program exists and that the machine M is available. (d) Determine the sequence of programs P1, P2; : : : ; Pn needed to be executed to solve problem Q. (e) Where to restart the execution if the computational process is interrupted (e.g., because of hardware failure, expiration of allocated running time, etc.) SB provides a model for building PSEs for such applications; its second generation is now being tested by structural biologists at Purdue.
2 The General Structure of a PSE for Managing Complex Computations Such a problem solving environment is a facilitator, an entity which allows a user to navigate through a maze of existing programs and to extract relevant data from a huge amount of information obtained as a result of computations carried out using MPPs. Consistent with this view, we do not discuss the extremely important issue of designing and implementing programs capable to run on a variety of architectures and produce identical or consistent results; In this paper we assume that such programs already exist. The infrastructure of these parallel programs is xed, i.e., data partitioning, the work allocation and communication among compute nodes, or between compute and I/O nodes. 2
A primary function of the PSE is to provide transparent access to programs and data scattered over a set of sites, in other words to act like a super le system. This function is accomplished by de ning a basic entity called object. An object consists of an underlying le and a stub. The underlying le can be located on a local machine or on a remote host, whereas the stub is a local structure, a piece of information which identi es the object as an entry in a catalog of objects. Each user has his own catalogs, one for each project he is working on. Objects in dierent catalogs and from dierent users may point to the same physical le. For example, dierent users may share the same executable program. Only the owner of an object has the right to modify that object. Programs, data, control input objects (see Section 2.2), access rights, user preferences, are examples of objects. While heterogeneous in nature, all objects handled by the user interface have a common look, being characterized by a series of attributes valid for all of them. These attributes are used to ensure consistent manipulation of the objects. For example, the primitives to copy an object from one location to another should be sensitive to the architecture and type attributes of the object. A data object may need to undergo some form of data conversion if the source and the target architecture are dierent (e.g., byte swapping). A source program object can be moved only under certain circumstances. Another function of the PSE is to keep a log of all actions performed by the user for solving a certain problem. Solving a Grand Challenge class problem usually spans a fairly long interval, weeks, months and possibly years. In any scienti c experiment, one often explores false avenues and needs to backtrack. The PSE has mechanisms to retrieve from a raw log le the computational path which led to the current data and ignore the unsuccessful attempts. The PSE should provide a window based user interface and graphics support for data visualization and performance monitoring. Further, constant performance monitoring is necessary to ensure that computing resources are used eciently. Only a window based system (SB uses X-windows) is able to allow the user to perform and monitor multiple actions in a heterogeneous environment. Further, the PSE must be able to help the user de ne the problem and provide guidance and expert advice for determining a nearly optimal (or, at least, good) sequence of computational steps. To provide this support, the PSE should consist of: (a) a kernel to manage the computation, (b) a problem-de nition subsystem, (c) an expert system and knowledge-base to help the user, 3
(d) a graphics subsystem. The general characteristics of these components are presented here, Section 3 elaborates on these for SB.
2.1 The Kernel
The main functions of the kernel are: (a) to facilitate access to a number of remotely located
MPPs with dierent architectures, (b) to keep track of the user's actions and the state
of all resources used, (c) to provide a user interface, (d) to allow the other components of the interface to work with one another. It maintains information pertinent to the objects managed by the PSE and allows the user to perform operations on them using the interface. There is no clear delimitation of what information is stored in the stub (locally) or in the le (remotely). For example, for some objects the stub may include a small le header of a large data le, for others the stub may include the entire object, as in the case of control input objects. Another function of the kernel is to determine if all objects needed in a session are in a consistent state. An object is in an inconsistent state if the information in the stub is inconsistent with the actual state of the system. For example, the program object residing on machine M , pointed by a stub located on machine N , may be modi ed and the creation date of the new executable program is no longer consistent with the information recorded into the stub. This may cause serious problems and the user needs to be alerted that the program which produced correct results \yesterday" has been modi ed. The kernel enforces consistent manipulation of the objects.
2.2 The Problem-De nition Subsystem
A PSE should allow the user to state and start with a description of the physical problem to be solved. The PSE mechanism to de ne problems uses familiar terminology and checks the consistency of this de nition if the user so desires. Control input objects allow the user to de ne the parameters of his problem in a natural and easy to understand way. Grand Challenge problems often require a complex execution sequence. For example, to solve problem Q, one may need to run program P1 followed by P2, P3 ; : : :; Pn , then check if a condition C is satis ed (e.g., if some form of convergence is achieved), and if not, repeat the cycle P1 to Pn , else execute another sequence of programs Pq , Pq+1, etc. Some of the programs in the sequence P1 to Pn must run sequentially while other programs, say Pa, Pb , Pc, Pd may run concurrently. 4
Thus, the PSE should provide problem-de nition and solution control mechanisms to allow: (a) the creation and modi cation of control input objects, (b) the de nition of the sequence of computational steps in a powerful and yet easy to understand language.
2.3 The Knowledge Base
The knowledge base system should provide three services. First is a set of tutorials on using the PSE. Second is an expert system to assist the user in managing the computation. Third is access to a knowledge base of past experience in solving similar problems and data from the history of the current problem.
2.4 The Graphics Subsystem
Interpretation of results obtained at each step of solving a Grand Challenge problem usually requires the ability to visualize a very large amount of data. A specialist may interpret, compare or even model the behavior of a large volume of data by just looking at it. Visual inspection of a large volume of data is also a powerful debugging mechanism which provides direct checking for blatant inconsistencies. The PSE should provide an interface to the graphics subsystem, as well as support for data conversion. The actual functionality of the graphics subsystem depends, of course, on the application.
3 The SB Problem Solving Environment SB is a PSE for structural biology under development at Purdue University. Structural biology is a branch of biology involved mainly in the recognition, description and classi cation of biological molecules. Of particular interest nowadays is the determination of the 3D atomic structure of proteins and viruses. The goal is to determine the location of the hundreds of thousands or millions of atoms in such molecules. Viruses are somewhat spherical in nature and the surface structure is of special interest as that is where, for example, a drug interacts with a virus to neutralize it. The computation starts with a massive amount of data in which the structure information is very well hidden. 5
Determining these structures requires a huge amount of computing which is spread over weeks or months and which involves many programs and several machines, including parallel supercomputers. The goal of the SB problem solving environment is to manage the complexity of these diverse computations.
3.1 The SB Kernel
3.1.1 Catalog and Objects
The kernel (SBK) is organized around a central entity called the catalog, which is a collection of objects accessible to the user (e.g., executable programs, data les, control input/output les, directories etc.). Each object has an underlying le on the local machine or on a remote host. Every object has associated with it a set of prede ned attributes and, possibly, a set of user-de ned attributes that are used to describe in more detail its structure. The prede ned attributes include the name of the object, its type, location, persistency, owner, creation date, size, dependencies, and some other parameters specifying, for instance, if the underlying le is executable, or if it is ASCII, if it is a directory, etc. In addition to the prede ned attributes, the user may choose to de ne his own attributes to describe more thoroughly the structure of the object or the information contained in it.
3.1.2 Prede ned Attributes
The name of an object uniquely identi es it to the user: all objects have distinct names and they are listed in alphabetical order. The object name must be speci ed by the user when the object is created (incorporated in the catalog), and can be changed at any time. The type of an object is a concatenation of several subtypes separated by dots (e.g., `Prog.IPSC860.ED.Envelope') describing the nature of the le underlying the object. Each subtype can be chosen from a prede ned set (the set of all possible types is xed). The rst subtype shows the class to which the object belongs, for instance: `Prog' (the object is an executable program), `Data' (data le), `CtrlIn' (control input le for a program), `CtrlOut' (control output le), `SBLP' (speci cation language program), or `Directory'. The second subtype can be chosen by convention to be the architecture of the machine on which the object is located. The following subtypes describe in further detail the nature of the object. The full type of an object must be speci ed by the user when the object is created, some subtype attributes can be changed during the processing. The location of an object comprises the machine name, the login of the user on that 6
machine, and the full path of the le that underlies the given object. All objects must have dierent locations (no aliasing is allowed). Occasionally, objects can point to non-existing les, these could have been deleted by the user or be expected to be created as result of the user's subsequent actions. The three components of the location must be speci ed by the user at creation time and they can be modi ed at any time. SB allows abbreviations for paths and automatically updates locations as objects are moved. Each time they are provided, the interface checks that the given le exists and is accessible on the speci ed machine, under the speci ed user id. The persistency of an object can be of two types: permanent or temporary. Temporary objects exist till the end of the SB session, when they are deleted as physical les and also removed from the catalog. Permanent objects continue to reside in the system until explicitly deleted or removed from the catalog by the user. The persistency must be speci ed by the user when the object is created and can be changed at any time. The attributes owner, creation date (last time the le was modi ed) and size of an object are automatically inferred by SB and never need to be speci ed explicitly by the user. The dependencies describe the relationships among objects in the catalog. Object A is said to depend on object B if a reference to B is made in the le underlying the object A. For example, a control input le (which describes the execution parameters for a program) can depend on the data le(s) used by that program, whose locations are speci ed in the control input le. In this case, we say that the control input object depends on the data le object, and this is called a forward dependency. Conversely, if object A has a forward dependency to object B , then object B will have a backward dependency to object A. The dependencies are consulted every time the user wants to remove an object from the catalog, and if any other objects depend on it, a warning is issued. The interface also updates automatically the forward and backward dependencies when the object names are changed. As all these dependencies are automatically handled by the interface, the user never has to specify them directly.
3.1.3 User-De ned Attributes
Besides the mandatory prede ned attributes, the user may choose to de ne his own attributes to describe more thoroughly the structure of the object or the information contained in it. The form of a user-de ned attribute must be as exible as possible. In SB the user de ned attributes have the form: = For example, a user-de ned attribute of a data le containing the scanned intensities 7
from an optical density frame could be: Film Type = Kodak Forcing each user-de ned attribute to have a left-hand-side and a right-hand-side has some advantages. For example, the user can " lter" objects according to their attributes and their values: from a catalog containing hundreds of objects, the user can easily pick up only those that have `Film Type = Kodak', or for an even broader selection, all those that have the attribute `Film Type' de ned.
3.1.4 Catalog Functions
The catalog functions are the basic operations the user can perform on the objects. They include object creation, displaying / changing the object attributes, removing an object from the catalog (without deleting the underlying le), removing the object from the catalog and deleting the underlying le, copying an object (which means creating a new object whose underlying le will be a copy of the rst object in the location speci ed by the user), viewing the underlying le of an object (if it is ASCII), and listing the objects selectively.
3.1.5 Other Parts of the Kernel
The kernel also includes the environment and the log le. The environment is a set of variables that de ne the conditions under which the interface is running and how certain operations should be performed. For example, the name of the currently used line printer is a variable which is part of the environment. Some of these variables have default values, others have to be explicitly set by the user. The log le keeps record of all the relevant user actions. Each entry in the log le is time stamped and describes concisely the action performed by the user. For example, all catalog manipulations like object creation, deletion, change of attributes etc., as well as the steps taken for the execution of a certain program (choosing the control input/output objects, etc.) are recorded. The user has the option of viewing, printing, editing or clearing (restarting) the log le.
3.2 The SB Problem De nition Subsystem
3.2.1 Creation and Modi cation of Control Input Objects
Most programs for macromolecular structure determination use a control input le to specify the parameters of the execution (e.g., name of the input data le, execution mode, etc.). 8
The user can perform operations on objects of type control input via the interface provided by the kernel. Control input objects can be created, modi ed and/or deleted interactively by the user.
3.2.2 The Problem Speci cation Language
The problem speci cation language SBL of SB allows a simple description of an execution sequence for programs in SB. SBL is limited to a set of language constructs that allow structured programming and is similar to the high-level Algol-like languages. Two categories of programs from the SB environment can be invoked from an SBL program: (a) processing programs (b) auxiliary programs, designed to support the appropriate sequencing of the programs in the rst category, to test some of their output, and to transform part of the control input data for these programs. SBL also includes a minimal set of data types, operations and sequencing constructs that allow a concise description of the processing ow. The main advantages of using SBL are:
a given processing sequence can be speci ed in a very simple way; the user is relieved of the task of knowing the commands necessary to run the program on a given machine (the machines currently supported are: Two Intel iPSC/860s, two Intel Paragons, and several Sun, SGI and IBM workstations). the user can stop the computation at any time, and then restart it; the same holds true for the case when the computational process was interrupted, e.g., because of a hardware failure, or because the allocated running time is exhausted: it can be resumed at any time later, without any other loss than possibly re-running part of one SB program. the user can back up a number of computational steps (up to one innermost cycle for iterative computations), by editing a log le that is generated by the compiled SBL program. the execution of a sequence of SB programs stops as soon as one of them fails, insuring data integrity in this way; the user can then determine and remove the cause of the failure, and restart execution as if nothing has happened. 9
3.3 The SB Knowledge-Base
SBX is a knowledge-based system for SB. This system enhances the range of services oered by the SB tool by incorporating learning strategies and reasoning techniques. It consists of a database and a set of mechanisms for search, retrieval, classi cation, inference and maintenance.
3.3.1 The Advisory Component (XADC) Based on information extracted from the database, it assists the user in nding the most appropriate computational environment for processing his/her data. The computational environment of a given problem with given data sets is de ned by the all hardware and software resources needed to process the data and give an answer to the problem. The role of the advisory component is to provide the user with advice concerning the usage of these resources in the form of SB objects. For example, the user can ask XADC for advice in choosing a path through the available execution options. The assistance uses both rule-based and exemplar methodologies for helping select the computational environment. There are many simple rules to assist the biologists or automate completely the selection. The database also contains historical data collected by SB and the characteristics of the current computations can be compared to previous ones. This methodology is described in detail in [Hou 91], [Hou 94] for another application area. Note that for SB the exemplar based approach is particularly suitable since the analysis of one molecule requires the repeated application of several programs with similar input data. 3.3.2 The Monitoring And Estimation Component (XMEC) retrieves informa-
tion about past executions from the database and based on it, estimates future con gurations. XMEC provides help in making decisions such as where to start the computation, on what machine to execute the programs or how many processors will be involved. The same methodology is used as for XADC.
3.3.3 The Learning Component (XLEC) provides a set of tutorials from which the user of the SB system can learn about SB. Two learning strategies are considered:
learning from formal de nitions learning from samples
All objects are formally de ned and their de nitions are stored in the database and retrieved upon request. In addition to these formal de nitions, a group of sample problems that can be solved using SB are provided. The user can learn how to use the services provided in SB by requesting demonstrative solutions to the sample problems. 10
3.3.4 The Maintenance Component (XMNC) is concerned with maintaining the
information existing in the database. This information can be divided into two main categories:
general information speci c information
The rst category comprises information about dierent approaches in solving problems using PSE and also the material used by XLEC in the form of tutorials. The second category refers to individual programs supported by the environment, speci c problems for each user and for each computational environment. The role of XMNC is to add new information to the database, to update the existing information and to remove outdated information from the database, without altering its consistency.
3.4 The SB Graphics
The determination of the 3D atomic structure of proteins and viruses requires large amounts of data at various steps in the execution sequence. Interpretation of results obtained at each step requires the ability to visualize this data in some representation. SBG is a graphics package for SB. Its goal is to provide the crystallographer with a tool which allows him to interpret, compare and model the behavior of a large volume of data by just looking at it.
3.4.1 Interpretation of Crystallographic Data
Crystallographic data is collected from X-ray diraction experiments. The diraction pattern obtained by exposing the crystals to X-ray beams is recorded on lm and subsequently used to compute electron densities at the grid points of a 3D lattice. These electron densities are used to produce electron density contour maps at dierent resolutions. These maps allow the crystallographer to trace the amino acids that form a polypeptide chain. A grid point in the 3D lattice also has a mask which speci es whether the corresponding point is situated in the solvent, in the protein or in the nucleic acid. After the polypeptide chain is delineated, the structure obtained is compared with the wire models, and a molecular model is produced.
3.4.2 Data Display in SBG
SBG has a set of routines for displaying electron density contour maps and mask maps in 2D. The data is displayed in sections, parallel to one of the principal coordinate planes. 11
A conversion routine extracts from the input data le only those sections which are being processed. The electron density contour maps can be displayed separately from the masks or they can be superimposed on the mask maps. SBG oers the following capabilities to the user: (a) edit the color of the masks interactively, (b) identify positions on the plot in grid coordinates, (c) obtain histograms for masks and electron density in a given section, (d) switch the plot between several consecutive sections, (e) zoom and unzoom. A distinct part of SBG is the mask editor, which allows the user to interactively edit the shape of the masks, based on the information obtained from the electron density contour map. Collision checks must be performed, to avoid overlapping of particles. The checks must be done not only at the point of change, but also at all the points related to it by the symmetries of the molecule.
4 Conclusions Complex problems in dierent domains of science often require a complex and long term execution sequence. A problem solving environment is a necessity in order to make the work easier and more ecient. The SB problem solving environment is a model for such applications. This model is organized around a catalog of objects which allows the user to keep track of and to manipulate a relatively large, heterogeneous collection of objects, located at dierent sites and using dierent computers. SB also provides a problem speci cation mechanism, a knowledge-base, and support for data visualization, all of which assist the scientist in solving his/her problems. Experience in using the second generation of SB shows that it very substantially eases the diculty of using MPP s for structural biology.
5 References [For 93] Forbus, K.D., Johan de Kleer, Building Problem Solvers, MIT Press, Cambridge, MA, 1993. 12
[Hou 91] C.E. Houstis, E.N. Houstis, T.S. Papatheodorou, J.R. Rice, P. Varodoglou, Athena: A Knowledge Base System for //ELLPACK. In Symbolic-Numeric Data Analysis and Learning (E. Diday and Y. Lechevallier, eds.), Nova Science, New York, 1991, pp. 459{467. [Hou 94] E.N. Houstis, J.R. Rice, S. Weerawarana, C.E. Houstis, Pythia: A Computationally Intelligent Paradigm to Support Smart Problem Solving Environments for PDE Based Applications, to appear, 1994. [Mar 92] Marinescu, D.C., M.A. Cornea-Hasegan, R.E. Lynch, J.R. Rice, M.G. Rossmann, Macromolecular Electron Density Averaging on Distributed Memory MIMD Systems, Concurrency: Practice and Experience, Vol. 5, 1993, pp. 635{657. [Nie 93] Nielsen J., Noncommand User Interfaces, Communications of the ACM, Vol. 36, 1993, pp. 83{99. [Rew 90] Rew, R.K., G.P. Davis, NetCDF: An Interface for Scienti c Data Access, IEEE Computer Graphics and Applications, Vol. 10, 1990, pp. 76{82. [Ros 89] Rost, R.J., J.D. Friedberg, P.L. Nishimoto, PEX: A Network-Transparent 3D Graphics System, IEEE Computer Graphics and Applications, Vol. 9, 1989, pp. 14{ 26. [Sch 87] Schneiderman, B., Designing the User Interface: Strategies for Eective HumanComputer Interaction, Addison-Wesley, Reading, MA, 1987. [Win 84] Winston, P.H., Arti cial Intelligence, Addison-Wesley, Reading, MA, 1984.
13