A System for Remote Data Visualization - CiteSeerX

27 downloads 0 Views 467KB Size Report
on the workstation, then recorded on a low-cost animation system. Their work is most ..... that require a reply are saved in a list maintained by the sender until a reply is received. ...... These include both short (several day) and longer (several week) projects .... HHR89] Esa Helttula, Aulikki Hyrskykari, and Kari-Jouko R aih a.
A System for Remote Data Visualization Allan Tuchman, David Jablonowski, and George Cybenko Center for Supercomputing Research and Development 305 Talbot Lab University of Illinois Urbana, Illinois 61801 [email protected] [email protected] [email protected] Abstract Vista is a system for simulation-time visualization of data. Vista provides a window into the application by showing program data automatically during execution. The system architecture is designed for a distributed or remotely executing application, however the Vista model allows a data or trace le to replace the executing application, providing a visualization \data browser" for existing data or simulation runs. The data to be displayed and the type of display to be used are chosen interactively while the application is executing. It is not necessary to specify the data or graphics technique before compilation as with conventional graphics tools. With minimal and possibly automatic instrumentation, an application run in the Vista environment will have its data (variables and data structures) made available to a visualization system on a remote workstation. Any data display can be enabled or disabled at any time. The application may execute locally, on a remote supercomputer, on several clusters of a shared memory computer, or even across a network of distributed computers. Designed primarily for scienti c visualization, Vista also o ers an environment for more e ective debugging, program development, and a tool for viewing results. Keywords: data visualization, distributed computing, algorithm animation

Contents 1 Introduction

4

2 Previous Work

5

3 Design

6

4 The Vista Architecture

7

5 Implementation

12

5.1 Network Interface 5.2 Visualization Manager 5.2.1 Functionality 5.2.2 The User Interface 5.2.3 Data Translation 5.3 Application Executive 5.4 Data Manager 5.5 Data Files 5.6 Interprocess Messages 5.7 Process Control 5.8 Synchronization 5.9 Special Topics 5.9.1 Library Sizes 5.9.2 Array Shape 5.9.3 Heterogeneous Data 5.9.4 Parallel Programming 5.9.5 Performance Impact 5.10 Porting Vista to Other Computer Systems 5.10.1 Machine-Dependencies in Modules

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : :

2

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

13 13 13 14 15 15 15 16 16 19 20 20 20 21 22 22 23 24 24

5.10.2 Networking and Data Representation 5.10.3 Interlanguage Communication

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : :

6 Using Vista

25 25

26

6.1 Debugging 6.2 Application Routines 6.2.1 Fortran Program Calls 6.2.2 C Program Calls

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

26 27 27 29

7 Project Status

32

8 Future Work

32

9 Conclusion

36

3

1 Introduction The graphical display of information is important to every aspect of computing. Even applications which are not necessarily centered around visual display bene t greatly from computer generated graphics. Graphics are used in algorithm development, program development, and display of intermediate and nal computational results. In the past, the typical application running on any host computer executed for a while, then called a graphics subroutine to show data or results. The subroutine may have been internal to the application or may have been from a graphics library. Such subroutines may control a graphics device directly or may output a le for later viewing. This model does not promote the usability of graphics. The application developer must specify in the program before compilation which data values will be displayed, when these values will be displayed, and to some extent, how and where they will be displayed. This speci cation often requires major modi cation to the application program and gives the application user little control over what data can be displayed. The notion of any distributed graphics processing or the network environment must be explicit in the application or the graphics library in use. This all becomes a part of the application. For each run, internal or external parameters may control the path of execution, and therefore the speci c graphics routines that are executed, but this is the extent of the control. Other solutions provide post-processed high-quality graphics from data les created by the application. But, we stress the importance of viewing program data as the program is executing. We term this simulation-time animation. The term re ects both the dynamic display update rather than single-image snapshots as well as the immediate feedback with the potential for steering the course of the application rather than a post-processed animation produced sometime after the application is complete. In the domain of scienti c and parallel computation, our goals of easy-to-access graphics, remote display, simulation-time animation, and dynamic selection of data and display method are each fundamental. The start-up time or learning e ort is always a problem in getting people to use a system. The cost, generally poor graphics capabilities, and geographic constraints of supercomputers dictate a design which allows a graphics workstation to manage graphics functions and user interface. Simulation-time graphics are the best way to get insights into the method of solution for a problem. Watching a solution converge while the application is executing provides information on rate of convergence as well as anomalies that may show up only in a time sequence of images. As new insights are realized, the ability to display new data dynamically or to display already visible data in a new form is very important. Individually these goals have been achieved by others, but not together. In this paper we discuss some previous work in visualization systems. We then describe functionality for a distributed display visualization system with dynamic data and display selection that lls a gap in existing systems and is immediately useful to a large class 4

of users. In the bulk of the paper we present the implementation of the Vista system, which addresses some of the weaknesses discussed above. We will discuss its current status and list the directions that we will pursue with Vista. Other papers on Vista are one which provides an overview of Vista and its ease of use [TJC+91b], another which provides a discussion on run-time visualization of program data [TJC91a], and the Vista user's manual [JT91].

2 Previous Work There are a few well-known systems promoted for visualization. Each shares some of the characteristics of our work, but most address a slightly di erent problem. The most well known systems for scienti c visualization are AVS [UTK+89], developed by Stardent Computer, and aPE [Dye90] from Ohio State University. Both of these systems provide a visual programming interface which allows the user to specify a sequence or network of processes to manipulate data. The individual processes can execute on a local or a remote computer. The starting point for most users of these systems is one or more data les. It is possible though, with some small modi cation to the program, to have the application program write directly to the start of the visualization network. This provides an excellent tool to manipulate predetermined data from an application. However, neither system provides direct access to arbitrary program data. The Balsa-II [Bro88] system for algorithm animation provides the same simulation-time animation capabilities that we strive for, yet requires considerable instrumentation of the application source program. It is not easy to dynamically show new data and the system is not inherently distributed. Aladdin [HHR89] has similar use in algorithm animation, and uses a graphical interface to minimize the amount of graphics programming required to visualize the algorithm. Johnston and others at Lawrence Berkeley Laboratory have developed a system for distributed scienti c movie making using a Sun workstation and a Cray supercomputer [JHH+88]. By modifying the source program to include calls to their Scry library, an application can have data rendered on the supercomputer or the workstation, displayed on the workstation, then recorded on a low-cost animation system. Their work is most signi cant in distributed image display and in the low-cost animation solution. Scry does not address dynamic data access. Our own work in this area began as we tried to abstract the necessary control and data access requirements for a distributed simulation-time animation system (STAS) [NT89]. This system was unwieldy and required considerable additions and modi cations to the program source. In later work we separated data access from data display in a remote visualization environment for looking at matrix data in linear algebra applications [TB90]. One way to provide distributed graphics is with a distributed window system such as 5

Application Support

Visualization Server

Application Host

Graphics Workstation

Figure 1: Abstract Model of the Vista System the X Window System or NeWS. Both allow remote display of graphics, but graphics algorithms still execute in the application (client) process. Forthcoming work in PEX (Phigs Extensions to X) will shift more of the low-level graphics tasks to the remote workstation making this a more attractive solution. Using PEX, the graphics algorithms will still execute in the client process, but graphics transformations and rendering functions will move to the server and can potentially be processed by hardware. The window systems still do not address the application access to graphics or issues of dynamic data access.

3 Design In this section we describe the conceptual design of Vista and show how our requirements and goals motivated that design. The speci c modular architecture of its implementation is discussed in Section 4. Several features are useful in a simulation-time animation system, or even one which shows only run-time snapshots of data. The system should

      

have easy-to-access graphics, require minimal modi cation of application program, transparently provide a distributed environment with remote visualization, allow dynamic association of the display method and data, encourage dynamic selection of what to show, and when, make use of existing graphics software, use the simulation-time animation model, the only model which allows computational steering.

Vista's logical design, shown in Figure 1, has two parts, a visualization server which runs in the workstation and an application support module which runs on the application host. The two parts communicate through any inter-machine communications mechanism. At this level of abstraction, we do not yet specify how either part is implemented, or the communications pathway. 6

Application Source

Application Object

Application Executive

defaults

Data Manager

Host Workstation Data Manager

Visualization Manager

defaults resources

remote session Display Tool

Display Tool

Display Tool

login

Network / Communication Interface

Figure 2: Architecture of the Vista System The visualization server must provide a user interface to access the system functionality, associate data with graphical display methods, and manage the display of graphics from these methods. The application support module must be able to access application data at execution time and provide a mechanism for visualization break points. We also refer to these as vis-points for short. The practical implication of the rst requirement is that a table must be maintained containing for each accessible data item its name, address, and data type. This information may be obtained by a source preprocessor which builds a symbol table, automatic or user-inserted calls to enter this information, or by reading a compiler-generated symbol table. The vis-points are needed, for while it is possible to interrupt the program at intervals to access and extract data, this is extremely dangerous. It is possible that the data structures of interest are being modi ed and that in their state of transition may even be temporarily corrupt. The vis-point requirement then, is the user's way of giving the application support module permission to access data.

4 The Vista Architecture Certain aspects of the architecture were determined by the distributed nature if our design. The simple design of two modules shown in Figure 1 might be satisfactory, but would limit our exibility in distributing function, hinder overlapped processing of the two machines, 7

and possibly allow the visualization system to compete with the application for host system resources. For example, a single process on the host would impose some intermachine communication overhead on the application process. Initially this overhead also included data conversion from one machine-dependent format to another when necessary. Although the application support will normally process requests for data at well-de ned vis-points, for some requests we may want to interrupt the application and access information immediately. This latter situation led us to insert an additional support module on the application host. There were less compelling reasons to provide a similar module on the visualization host, but long term plans for connecting many hosts running a distributed application showed that such a module would be useful as a multiplexor if for no other reason. In fact, the symmetric design shown in Figure 2 has worked out very well and includes most synchronization logic in the auxiliary processes labeled Data Manager. The underlying Vista model shown in Figure 2 identi es each major component of the system. As each module is presented, only its important functions and interface to other processes is described in this section. Section 5 will discuss the implementation details of our existing system and present alternative ways to build each module. Each of the four large boxes in Figure 2 is a separate process. The user interface is a part of the Visualization Manager, or VM. To the user the VM appears to be the bulk of Vista. The user interface is a part of the visualization server process that we call the Visualization Manager, or VM. To the user the VM appears to be the bulk of Vista. The VM is already running on one computer, usually a graphics workstation, when the application program begins execution on (the same or) another computer, generally a more powerful machine. Once the application starts, the VM maintains a list of available program symbols and a list of available graphics techniques (widgets or other tools, for example). At any time the user can choose one or more names, associate them with a graphical method, and have a picture show up. The image will be dynamically updated as the program variables change (subject to vis-points discussed below). The three primary responsibilities of the VM are shown as emboldened boxes in Figure 3a. The interface includes the communication routines and data transmission. The graphical user interface allows the user to perform data selection, display selection, and provides control of the Vista system. This is the display shown as the lower left window in Figure 4. Finally, the VM serves to direct incoming data from the application to the graphics display method or process. Optional transformations on the data may be performed; for example, for type coercion or range adjustment. A Display Method is a module that takes transformed data and displays it graphically on the user's screen. A display method may be an internal module of the VM, for example, a widget in a window system toolkit. More generally, a display method may be a separate process (the display method hanging below the dotted box in Figure 3a). An interprocess communication (IPC) mechanism is used to exchange data and control information between the display method and the VM. The use of external display methods is a most important capability as it allows almost any graphics function to be added to Vista without 8

Vista Application Executive Symbol Table Scanner

Application data

Data Access

symbols

AE Symbol table

Routines

request/ response

symbols

Data

C Interpreter

request/ response

Interface to the rest of Vista

Vista Visualization Manager

Interface to the rest of Vista and the application control

data

Main Vista Display

data

Data filtering and transformation

selection Graphical User Interface

Multiplex data to display methods

Display interface via internal routine or IPC Display Method

Display Method

Display Method

Figure 3: (a) at left: Components of the Visualization Manager and (b) at right: Components of the Application Executive 9

Figure 4: Workstation Screen During a Vista Session

10

recompilation or changes to the Vista source program. In fact, this external method may be placing data into the beginning of a visualization pipeline for an existing visualization system such as AVS or aPE. The other part of Vista that the user may be aware of is the application support module associated with the application program. The program is loaded with and linked to an Application Executive, or AE. The AE is mostly independent of the application program, but by sharing the task address space has access to all program data. The main components of the application process when linked with the AE are shown as emboldened boxes in Figure 3b. The interface is not unlike that of the VM. The AE maintains a symbol table in a data structure comprised of a variable name, its address, and its type. This information may be supplied by the application via procedure calls or automatically determined by reading the symbol table from the object le. When presented with a request from the VM, the AE searches its tables, possibly accesses data, and returns an answer. For examples, the is name message asks the AE if a name is a valid program identi er. The return message is either type and size information for the variable or a negative response. At predetermined vis-points, the program transfers control to the AE, which services any outstanding requests and returns to the application. The C interpreter in the AE is not necessary for routine access of simple data types, including arrays with non-unity stride. The interpreter will allow us to access and traverse more complex data types in the future. This interpreter is partially implemented and has aided us in debugging Vista. Neither the AE nor the VM communicate directly with another computer. Each has an associated task, a Data Manager, or DM, running on the same multiprogramming system. The DM connected to the VM is called the VDM, the DM adjoining the AE is the ADM. The primary function of the Data Managers is to pass data between computers. This includes maintenance of communication channels and overall synchronization of the system. The ADM and VDM are symmetric in several layered tasks, discussed below. They di er in other functions, though. For example, the VDM may accept connections from several di erent hosts in one session; the ADM may need to send an interrupt to the application if it needs information immediately, before the next program vis-point. One of the ADM's most important tasks is to maintain two lists, one of work items (such as data access) that must be performed at all vis-points, and another prioritized list of new messages that have arrived since the previous vis-point. The Data Manager design includes other functions, not yet implemented. For example, caching non-changing data and common data selection lters such as extracting a two-dimensional cross section from a three-dimensional array of data. The lowest level of all modules is the networking or communication interface. This handles the inter-process and inter-machine communication. Above this is the transport layer which packages messages and data. The Data Managers provide a data layer to provide transparent exchange of data across heterogeneous computers. The Data Manager design includes other functions, not yet implemented. For example, caching non-changing data and common data selection lters such as extracting a two11

dimensional cross section from a three-dimensional array of data. The VM may use an application-speci c con guration le to automatically request certain variables or pre-associate certain name-graphic combinations. Graphics con guration les may specify preferences or parameters for each graphics tool used (for example, the X windows application resources les). Examples of auxiliary les which may used by the AE are the program symbol table and preprocessor directives for automatic instrumentation of name de nition and vis-point calls. Our goals were to provide a distributed system, as machine-independent and portable as possible. We built on standards or commonly available tools such as X windows and TCP where possible. Finally, the system is modular to facilitate development, and provides some tailored modules for di erent environments, machines, or applications.

5 Implementation Each Vista module may be implemented in many ways. In this section we describe our initial approach, including the network interface, the visualization manager, application executive, data managers and supplementary data les. An early decision was to make the entire system asynchronous. This allows the user to formulate several requests without waiting for each to be satis ed. In fact, a user may make several requests before the application sees any of them next vis-point, providing a good processing overlap. Similarly, it allows a simple request to be satis ed before a more time-consuming one. The interactivity can remain as high as possible. At times we must synchronize or wait for a speci c event before proceeding, but this is handled in the software. We also decided to use a message passing scheme for all inter-module communications. The use of messages provides more error checking in data transfer. An important use of this method, is to order messages, associate responses with speci c queries, and keep track of outstanding messages in need of a response. At rst it may appear that the problem of transparent data transfer between heterogeneous machines has been solved by the Remote Procedure Call (RPC) protocol. This is certainly the case for transferring relatively small amounts of data between machines as procedure parameters. Subroutine parameters and types are known in advance, however the Application Executive can access any data item in the program at any time. Due to this requirement and the large amount of data we need to access, we were not willing to pay the penalty of adhering to RPC. We have found (so far) that converting even binary

oating point data is not too dicult. We may have to reassess our decision if we later need to exchange more complicated user-de ned data structure across machines. 12

5.1 Network Interface We have implemented a layer above the Berkeley socket mechanism for network communication. Sockets use the TCP/IP protocols for reliable message transmission. Our network layer isolates the implementors from the intricacies of using sockets and isolates the implementation of the network routines from the rest of Vista. Our access is through a small number of routines: Net connect, init server, Net close, Net write, Net Read. These perform the standard functions of opening, closing, writing to and reading from a possibly remote channel. To allow non-blocking messages, we have Net waitforsomething which allows a timeout if no network events have occurred in a certain period. The return indicates which network connections have signaled the presence of data. To ensure that the module want to actually do the Net Read, Net inputpending returns the number of characters ready to be read. This network layer can be replaced with another underlying mechanism with little programming change to the rest of the system. For example, shared memory mailboxes could be used to implement the network instead of sockets. Above this lowest level is our transport layer packages the message content with headers.

5.2 Visualization Manager 5.2.1 Functionality The primary job of the Visualization Manager, VM, is to take raw data from the Data Manager, transform it according to parameters speci ed by the user, and graphically display it using Display Methods chosen by the user. The translation of data and the Display Methods form the core of VM. The Data Manager is VM's only interface to the rest of Vista. From VM's perspective, the Data Manager provides control information and data to VM. The most important control information includes the application symbols that are in the current scope and text messages concerning the status of the executing application or the data or trace les; it is relatively transparent to VM whether the Data Manager is passing information from an actually executing application of from data or trace les. VM expects the control and data information from the Data Manager to arrive asynchronously. A Display Method is a module that takes transformed data and displays in graphically on the user's screen. The part of the Display Method that actually performs the graphics may be a widget (described below). A Display Method also includes the controls to de ne the translation and, optionally, controls to adjust parameters speci c to the Display Method. To understand some of the design decisions and implementation of VM, it is important to understand abstractly what a widget is. A widget is is a code module associated with a 13

window. What appears in a widget's window depends on the basic functionality intended for the widget. For example, a Push Button widget provided by the Motif Widget Set displays in it's window the text for the button and a 3D border to simulate whether or not the button is pressed. A widget's code must have functions, or methods in widget terminology, to handle redisplaying of the window, resizing of the window, destruction of the window, and, optionally, input events in the window. So a widget provides a window whose contents are provided by a widget Method that knows about the speci c purpose of the widget. But, a widget must also be painfully aware of the various things that can happen to a window.

5.2.2 The User Interface VM uses the X Window system and the Motif widget set. A single main window is displayed throughout the lifetime of a VM invocation. (lower left of Figure 4) This window has ve main areas: 1. Across the top of the window are several Push Buttons. Functionality provided by these buttons include exiting VM and ow control of data coming into VM from the Data Manager. 2. The large area in the left center if the main window lists the data lters VM currently is aware of. The user explicitly creates data lters from a set of primitive data lter elements provided by VM. When a symbol is bound to a Display Method, a data lter can optionally be bound to the symbol. The purpose of a data lter is to perform a transformation on the raw data that comes into VM from the AE. 3. The large area in the center of the main window lists the symbols currently known to VM. These symbols are a subset of those de ned by the current context of the application. This subset of symbols can change either through explicit user control or as the application changes contexts. The interface allows the user to select a symbol and view some attributes of the symbol, including its type and size. 4. The large area in the right center of the main window lists the Display Methods VM currently knows about. In the current implementation of VM, the available Display Methods are xed; that is, the user cannot decide to construct a Display Method and add it to VM without intimate knowledge of VM's program structure. An icon is used to represent a Display Method. Selecting an icon provides some details about the Display Method, including its basic functionality and the types and number of symbols it requires. 5. Across the bottom of the main window is the status area. The primary purpose of the status area is to notify the user of some of the asynchronous messages coming into VM from the Data Manager. 14

5.2.3 Data Translation VM receives raw data from the Data Manager; raw data means that the data values received by VM are exactly those that are generated by the application. A data translation is simply a function whose domain is the set of all possible values the raw data of a speci ed symbol can assume. As mentioned above, the user can de ne data lters that perform translations on raw data. Translation of raw data is performed for two reasons: 1. For the user's bene t. The user may linearly transform a range of values into a wider, narrower, and/or more palatable range of values. The log of data values may be taken to exaggerate smaller values. This might be used for example, in error analysis in numerical algorithms. 2. For a widget's bene t. Our experience is that most existing X Windows widgets require integer data as input. On re ection, this makes sense since a widget's graphical output must eventually map to pixel values in an X Window's coordinate system, which is, naturally, 0-relative on both the x and y axis. Our rst inclination was to create widgets that took real values as data input, since our rst applications generated some real data that was either in a very small or very large range relative to the number of pixels in a dimension of the widget's window. If possible, we want to avoid writing new widgets because of the complexities of widget writing described above. Therefore, if an existing widget met our needs for a new Display Method we would use it, regardless of the type and range of values it required for input. Data is then simply transformed prior to being passed o to the widget. For a widget's bene t, data translation is sometimes performed internally by VM and is transparent to the user. Section 8 discusses some of the planned enhancements to the Visualization Manager.

5.3 Application Executive The Application Executive is a large piece of code by itself. In fact, its author has used it for other applications. We presented an overview of the AE in Section 4, which described the program at the functional level. Section 6.2 describes the user interface to the AE from the application. We will not otherwise describe the implementation of the AE in this paper. Another CSRD report to be issued in the future will discuss this program.

5.4 Data Manager 15

control data transport network

Application Data Manager

Visualization Data Manager

Figure 5: Layers of the Data Managers The layers of both Data Managers is shown in Figure 5. The network, transport, and data layers are symmetric, but the highest layers di er. This layer provides the sequencing control, synchronization (if necessary), and automatically generating requests.

5.5 Data Files Two les are used by Vista currently. The AE reads the symbol table from the application object le. The VM uses the X Window System application resources mechanism to get many programming defaults. In the future, con guration les and playback les are planned.

5.6 Interprocess Messages Interprocess messages in Vista are sent with a message header and data. All aspects of the message are de ned in the include le .../include/mgr.h under the main Vista directory. The message structure and values are de ned in this le. It is important to remember that this entire section, and particularly this subsection apply only the implementation of Vista, and not to the user running an application. The messages are entirely internal, and should never be seen by users of the Vista software. The message header has the following structure:

16

struct { int msg; int from, to; int refID; int uniqueID; int length; }

/* a message type described below */ /* from and to modules, described below */

/* length of header plus any data */

The uniqueID value is a unique message number assigned to this message. No other message in the system has this value. The refID eld is used in sending a response to a message. The uniqueID of the original messaged becomes the reference ID (refID) of the response. The from and to elds identify the sender and receiver of each message as a short integer value. They are symbolically known as: No_Module VM_Module AV_Module DM_Module DM_ModuleLocal /* The part closest to VM */ DM_ModuleRemote /* The part closest to AV */

By compile-time choice, the maximum message length is 1024 bytes including the header. This is set by a constant in the include le. The messages themselves are presented here without explanation. Many of the message names will be somewhat self-explanatory to another systems implementor. The purpose of listing them is to provide some idea of the sort of information passed through the system without a total explanation (which can be found in the C code and comments).

17

NoMessage VM_Proceed VM_Init VM_Quit VM_IsName VM_SendNames VM_SendValue VM_SendValueCont VM_SendValueFail DM_Fail DM_Ack DM_Connect DM_NameBad DM_RemoveName DM_NameList DM_AddNameList DM_NameListFail DM_Value DM_ValueAdd DM_ValueFail

DM_IsName DM_SendNames DM_SendValue DM_SendValueCont DM_SendValueFail DM_Quit VM_Batch VM_EndBatch VM_KillApp AV_Init AV_Quit AV_NameBad AV_RemoveName AV_NameList AV_AddNameList AV_NameListFail AV_Value AV_ValueAdd AV_ValueFail AV_SendPid

AV_Break AV_Appl VM_ExecFile DD_EOQ DD_Sync VM_NoSendCont AV_Text DD_NextMsg DD_AutoProceed DD_Interrupt DD_EmptyQueue DD_Timer VM_Text DD_LostVM DD_LostAE DD_Ready VM_ScanSymbols VM_ScanMode

In the include le, there are also attributes associated with each message type, including its print-name for debugging and whether it normally expects or requires a reply. Messages that require a reply are saved in a list maintained by the sender until a reply is received. Each module can use the ddlib SetCallback procedure to set a default action for any message received. The call SetCallback(msg, func)

Has all arriving messages of type msg passed immediately to procedure func. This is implemented not unlike many window system callback mechanisms. The callback function is invoked with essentially all the header information of the message passed as parameters. All messages are issued by the various Vista modules by using the routines in ddlib, the data-layer library. The procedure IssueMessage(chan, msg, from, to, refID, cb, parms, length, response)

allows the module to send a message (msg) through a given communications channel (chan), using from, to, and refID described above. The uniqueID is assigned by the IssueMessage procedure. parms is length bytes long and contains parameters speci c to the message being transmitted. The sender can indicate whether a response is expected as the nal parameter. 18

This is usually set to the message attribute value unless one wants to unconditionally ignore a response. The most interesting parameter is cb, specifying a callback for any response to this message. The callback is saved with the uniqueID value. When a message comes to this module with a non-zero refID, then the callback procedure is executed, with the header information of the response message passed as parameters. A callback for a speci c message takes presidence over a general callback for a message type mentioned above. There are other functions supported in ddlib. It manages most of the structures passed globally through Vista modules: the messages, their headers, event structures, a work queue which contains events that must be processed at certain points in execution (for example, values to send at each visualization breakpoint), traps and handles program exceptions (interrupts), and provides miscellaneous utility functions. Underlying ddlib is lsocket, the IPC library based on Berkeley sockets. Although we have not tried another protocol, we believe that lsocket could be replaced by some (any) other IPC mechanism.

5.7 Process Control There are four main modules in Vista. Theis section describes how each is started and stopped. The user starts VM in an xterm window on a workstation. There is normally another window open to a remote machine where the application runs. A Vista application resource may have Vista issue a command to start this remote xterm if the user desires. The VM starts the VDM, (Visualization-side Data Manager), using a Unix fork and exec call. Actually, on Berkeley systems, vfork is used as it consumes fewer resources. The VDM disconnects itself from the controlling terminal, sets its callbacks (Section 5.6) and goes into a polling loop. This loop waits for input messages on any channel (only the VM at this point) or a connection request from a new channel (soon to be the ADM). The VM is also in a polling loop. Each iteration processes outstanding X Window System events and checks for messages on any channel (only VDM). With the Visualization Manager underway, we look to the application side. The user has compiled and linked the application with the Application Executive (AE) library. In the remote xterm, the user starts the application program. At the rst call to an AE routine the AE starts the ADM (Application-side Data Manager) process much as VM started VDM. ADM immediately trys to connect to VDM (see below), and upon successful connection, exchanges a few messages allowing VM to recognize that an application has joined the system. ADM sends an OK message back to AE, then goes into a polling loop similar to VDM's. The AE has no event loop. It's input messages are sampled now only when the application calls an AE procedure. 19

When AVM starts it needs to know where to nd the remote VDM. That is, the host machine it is on. If the environment variable VM is de ned in the environment of the application, then its value is the host to connect to. If not, then (in a Unix-speci c manner), the le /etc/utmp is scanned for the current session. If it is found, then we can often determine if this session is an rlogin from another host. If so, then that remote host is used. Finally, we give up and use the local machine. These hueristics handle most of the cases, with the fallback of allowing the user to set the environment variable.

5.8 Synchronization All messages are asynchronous, meaning that messages can be sent or arrive at any time. A response to one message may precede a response to an earlier message. All this is handled by the message layer and transparent to the modules of Vista. The association of response with request message is similarly handled. There are times however, when the modules must synchronize between themselves at a level above speci c messages. This can be handled by passing ags (or tokens) around as messages, and ignoring or saving input until a certain ag is reset or condition satis ed. One place that it is important to synchronize is between the application and the visualization of the application state. An application that executes much faster than the graphics can be generated (or network can communicate) will cause a backlog of data for images. In the worst case, the application may complete execution long before all the pictures are drawn. This defeats the notion of a graphical window into the application. At the other extreme, if we delay the application until the graphics are drawn, then we may be wasting valuable time during which we could be potentially overlapping application computation with visualization. Our current solution is to send the data from AE to VM, and proceed with the application once we know that VM has taken ahold of the values. The exact sequencing of messages may appear confusing, but incurs little overhead and works well. The messages are shown in Figure 6.

5.9 Special Topics This section mentions several miscellaneous implementation issues not covered elsewhere.

5.9.1 Library Sizes For use with Vista, the AE can be built three ways:

sym With symbol table data structure only 20

VDM State

Action

RECEIVE(DD_Ready)

if (not autoproceed) SEND(VM, DD_Text) else SEND(VM, DD_Sync)

RECEIVE(reply)

SEND(AE, VM_Proceed)

ADM State

Action

RECEIVE(AVBREAK)

SEND(AV, queued_messages) SEND(AV, work_queue) SEND(AV, DD_EOQ)

RECEIVE(reply)

SEND(VDM, DD_Ready)

AE State

Action

at visualization breakpoint

SEND(ADM, AV_BREAK)

RECEIVE(DD_EOQ)

REPLY(ADM, DM_Ack)

Figure 6: Synchronization Messages

stab With symbol table data structure and symbol table scanner all With symbol table data structure, symbol table scanner and C language interpreter. The size of the resulting library le (libae.a) and the size of a typical small application during execution are provided in Table 1. All numbers are given in bytes. Library Memory no ae 5864K sym 483K 6168K stab 615K 6192K all 962K 7568K Table 1: AE Size and Memory Requirements

5.9.2 Array Shape Fortran and C compilers allocate array storage in a di erent order for multi-dimensional arrays. We want this to be transparent to the user of Vista. It does not matter how arrays are actually stored as long as they are accessed correctly. VM will always use C's storage mechanism internally since it dynamically allocates space for arrays. VM needs to keep (in the type descriptor) the source language of each array so it can display the proper information in response to a user request to display a data type. 21

For some display methods we care about the array's shape. For example, a matrix display which shows each element as a small square colored according to its value. In this case it would disturb the user greatly to see the array shown transposed. In other cases, we care only about the array's data content. For example, when displaying a nite element mesh, a two-dimensional array may be used provide element connectivity information. In this case, we do not care about shape, but still need to know the base element for the indices in the array. The base element will usually be 1 for Fortran and 0 for C arrays. Vista maintains all this information and handles it all automatically.

5.9.3 Heterogeneous Data This is designed, but unimplemented. The two data managers will negotiate the speci c form of the data to be transferred between host computers when they rst join up. For example, suppose Vista's Visualization Manager is running on a Sun workstation with IEEE oating point data. An application running on a remote Cray X-MP has a di erent

oating point representation. The VDM will propose a set of several types of data it can understand. For oating point, this may be IEEE binary 32-bit, IEEE binary 64-bit, and ASCII-formatted numbers. The Cray's ADM will match this set against the formats it can support, through native data types or conversion routines. It may reply with any of the proposed formats. This information is then propagated up to the AE. Any actual conversion will occur in the AE at data access time. This saves additional data movement in later steps. The implementation will be by a set of pointers to functions in the AE. The negotiation between DM's will actually contain several sets: one for each type supported by the VM. This will initially include integer, single precision oating point, double precision oating point, complex, and logical data. The DM's may be unable to agree on some a format for some data type. If this happens, ASCII may be used, or they may just agree to not transfer data of this type. Non-program data is also passed. This is all in integer or byte format. We use network byte order throughout the system.

5.9.4 Parallel Programming Vista has been tested only on sequential machines. However, it was designed for small to medium scale parallelism. At the coarsest level the application, Data Managers, and Visualization Manager all execute in parallel. The VDM can accept multiple connections from many AE/ADM pairs on one or more machines, as shown in Figure 7. Also, the AE was designed so a single copy per process would be usable on a shared memory system. One of the biggest hurdles in implementation is designing a user interface that can display data coming in from several processes simultaneously and clearly identify each source. While our current mechanism can be used with two, or even four processes, it does not scale well beyond that. 22

Application Source

Application Object

Host

Application Source

Application Object

Application Source

Application Object

Application Executive

Application Executive

Application Executive

Data Manager

Data Manager

Data Manager

1

Host

2

Host

3

Workstation Data Manager

Visualization Manager

Display Tool

Display Tool

defaults resources

Display Tool

Network / Communication Interface

Figure 7: Vista with A Parallel or Distributed Application

5.9.5 Performance Impact The architecture of our system re ects decisions to maintain a high level of performance. The visualization tasks are all performed in remote processes, freeing the application from all but data access (and format conversion) tasks. All communications between tasks is through asynchronous message passing. No process is required to block or wait for a reply in general. However, a process may decide to block on a certain event if it makes no sense to continue until that event occurs. The networking routines may block if system bu er sizes cannot transfer data. The application only has the opportunity to process messages at visualization breakpoints. All other modules process messages frequently as a part of their main event loop. The application is frequently executing during user think-time and selection time. This overlap achieves our coarsest level of parallelism. The VM ensures that only needed data is requested. When a display is removed, the VM tells the AE (actually, the ADM) to no longer transfer any data items that are not needed by other display methods. All data is processed in its binary form. As we have discussed, these values may be manipulated by the AE to convert them to an acceptable format for the VM. Vista is designed to minimize performance impact at breakpoints. Data access does take some time, and this cannot be removed. When no data is accessed, the call overhead to the breakpoint is minimal, but may impact compiler-detected parallelism in the program if, 23

for example, the breakpoint call is within a loop. In some compilers, generating a compiler symbol table automatically disables optimization. Even where this is not true, we run the risk of variables being maintained in registers or optimized out of existence. In the long term, the performance issues open up several good research questions. And Vista is an appropriate mechanism to use to investigate them. We need to study the e ect of network bandwidth limitations on such a visualization system. An interesting study is of the tradeo s of executing graphics algorithms on the host versus a graphics workstation. The parameters in such a study would include processor speeds, supplemental hardware, data ampli cation or reduction, network bandwidth, data access times, and parallelism. The application also has a kind of global knowledge about the data that gives it better or more ecient access to complex data structures than may be possible on a remote system.

5.10 Porting Vista to Other Computer Systems Vista was developed using a Unix platform. Where possible, we adhered to standards or de facto standards to support portability. In this section we describe the main issues that would be involved in a port of Vista to a new machine, operating system, or architecture.

5.10.1 Machine-Dependencies in Modules Vista has four main modules, the Visualization Manager (VM), Visualization Data Manager (VDM), Application Data Manager (ADM), and Application Executive (AE). All use the same interprocess communications (IPC) support. The IPC issues are described in the next section. These are described in some detail in the Vista technical report. All modules are standard (K&R) C. The main machine dependencies of each are:

VM The VM provides the graphical user interface. It uses OSF's Motif and X11R4 to

provide these functions. Although possible to convert to a non-Motif platform, this would amount to a virtual rewrite of the VM. Support routines start the VDM process. They use vfork, a BSD version of fork, for eciency. A standard fork is also possible. But the VDM could also be started by any other means. VDM There are very few machine dependencies in this process. ADM The ADM should have the ability to interrupt the application process. This is not essential, but allowed in our implementation. As part of the initialization sequence, the AE sends its process identi cation (PID) to the ADM. AE This is really a subroutine library that is linked with the application. The AE provides the visualization breakpoints and tremendous underlying support. It is also the most machine-dependent module. 24

Support routines start the ADM process. They use vfork, a BSD version of fork, for eciency. A standard fork is also possible. But the ADM could also be started by any other means. The data de ning routines for Fortran depend on the types available in the machines language implementation. The Fortran run-time library is assumed to save away the name of the program being executed. The AE tries to connect to the VM by examining rst a shell-variable. If the variable is set, its value is used as the remote machine name. If not, then a le of currently active sessions (/etc/utmp on Unix) is searched. We nd the current session and see if it looks like a remote login. If so, we use the machine we did the remote login from as the target (in many cases this is a very reasonable guess). This algorithm depends both on the existence of shell variables and some \user le" with the appropriate information. In a pinch, you could just prompt for the remote machine. The symbol table scanner depends on the a.out le (in Unix), or in general, the symbol table format for the speci c machine. The C interpreter is not a required component of the AE, but is included in the distributed version. It can be very machine-dependent. For example, passing parameters to subroutines.

5.10.2 Networking and Data Representation Data representation is currently implemented as IEEE oating point and integers with the high-order byte at the left (all like a Sun Sparc and Motorola 68000 architecture). Actually, any consistent format should work. However, representations which di er in any way between the application host and visualization host will not work. Facilities for di erent representations would be possible to add (we have a good design for some future release). The networking library depends heavily on BSD sockets as implemented in Berkeley Unix 4.2 or later. There is an old dependency that was left in by accident in this code with C `ASM' directives used to generate a few no-ops in a delay loop. These can just be removed, or some other short delay added. This library has been run on other machine architectures successfully (although in one case, revealed a bug in the socket implementation on the new machine).

5.10.3 Interlanguage Communication The Fortran-C interface makes certain assumptions about linkage between the two languages. We have tried to make this as portable as possible at the expense of some awkward looking manipulations. 25

For example, character data such as variable names has an explicit (declared) length in Fortran, but an implicit length determined by a special (NULL) terminating character in C. Fortran compilers need to resort to hidden conventions to pass strings to even other Fortran subroutines so the string length can be known. Some compilers add extra parameters somewhere in the parameter list (beginning, end, or with each character parameter). Others use registers to point at a special information vector. We chose to pass Fortran character string parameters to C only via COMMON blocks. A special routine is then used to NULL-terminate the strings before C processes them. All user-callable Fortran routines are indeed implemented in Fortran. This moves the issue of Fortran-C compatibility to a level below the user. Underlying routines soon transfer control to C-language. This may require work by the party doing the porting and installation, but should isolate the user from any changes. The portability issues then are: 1. Names of Fortran COMMON blocks to be accessed by C. 2. Case of COMMON block and subroutine names for Fortran/C linkage. 3. We use the (at least BSD) Unix Fortran library routine CALL GETARG (0, string) to access argv[0], the name of the executing program. If this is not available a similar method should be used if possible. If not, then the Fortran user should use AVPROG instead of AVINIT to specify the program name.

6 Using Vista This section groups together topics in the use of Vista. Some are described in more detail in the user's manual.

6.1 Debugging For debugging, it is useful to see all message trac that is sent between processes in Vista. There is an environment variable that can be set to show these without recompilation of any module. setenv VISTA DEBUG string where string can have any number of the following tokens: 26

VM VMDM AEDM AVDM AV AE MSG ALL The tokens must be upper case. The AV* and AE* ones are synonymous. To see, for example, messages going through ae dm (the application-side Data Manager), use either of the following: setenv VISTA DEBUG AEDM setenv VISTA DEBUG ALL ALL means all debugging in all modules. MSG means message debugging in all modules, and for now ALL is identical to MSG. To debug many modules, you can use ALL or several tokens: setenv VISTA DEBUG "VM AVDM AE,VMDM" The strings can be separated by any white space or punctuation. The message output for each module is written to its error le. For the Visualization Manager and Application Executive, this is the le stderr of the process run by the user. For the data managers this is the le set by the environment variable DM FILE.

6.2 Application Routines This section presents the calls used by the application developer to instrument the application program.

6.2.1 Fortran Program Calls This section describes the Fortran-callable \av*" routines; those routines that may be called by the application to interact with the Application Executive. There are routines that help control the Application Executive, and those that add names into the symbol table. In using Vista, the only routine that must be used is avbreak. The others are optional and used either for convenience or in exceptional circumstances. The control routines are:

avbreak()

This is the only required av routine. It should be called to indicate that the application is at a visualization break point, and any graphics displays will be updated. The user is provided the option to interact with the application while it is halted. If this is the rst av routine called, the Application Executive will be initialized. 27

avinit()

Optional. Causes immediate initialization, making the connection to the remote Visualization Manager and scanning the application symbol table if possible. This routine is normally not used, but called implicitly when other av routines are called. avprog(exec- le-name) Optional. A Fortran application is unlikely to use this routine. If the Application Executive cannot determine the name of the le being executed (for example, a.out), the user may call this routine to provide the le name. avprog may be used instead of avinit.

The remainder of the routines allow the application to force symbols to be added to the symbol table and immediately transmitted to the Visualization Manager. It is not necessary to enter symbols in this way as much of the time Vista can read similar information from the symbol table of the program le. These then become convenience routines. You may want to use these routines to have certain variables always available in the Visualization Manager without having to explicitly enter them manually (described in later section). The rst two parameters in each of the following routines are the same and described here. name: A character string (or variable) giving the name of the variable you are de ning. addr: The address of the variable, which is often supplied by just using the name of the variable for this parameter.

The name de nition routines listed below have incomplete names, designated by an xx at the end for missing characters. They are completed by supplying the data type of the variable name being de ned. The following one, two, or three character abbreviations are used to sux the routines described below: c dc dp i l r v

complex double complex double precision integer logical real (for functions which do not return a value)

Many machine- or compiler-dependent non-standard types are supported also. The following suxes are used only if supported by a particular Fortran compiler: b

1-byte integer 28

c8 c16 i2 i4 l1 l2 l4 r4 r8

complex*8 complex*16 integer*2 integer*4 logical*1 logical*2 logical*4 real*4 real*8

av0xx(name, addr)

Identify a scalar variable to Vista and send it to the Visualization Manager. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av0r is used to insert a real (single-precision oating point) scalar variable name. av1xx(name, addr, lower, upper) Identify a one-dimensional array to Vista and send it to the Visualization Manager. The integer parameters lower and upper specify the dimensioned bounds of the array. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av1i is used to insert an integer one-dimensional array name. av2xx(name, addr, lower1, upper1, lower2, upper2) Identify a one-dimensional array to Vista and send it to the Visualization Manager. The integer parameters lower1, upper1, lower2, and upper2 specify the exact bounds of the array as dimensioned in the program for the rst and second dimensions, respectively. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av2dp is used to insert a double precision twodimensional array name. avfxx(name, addr) N.B. This is primarily for future use; it is not useful now. Identify a function name to Vista and send it to the Visualization Manager. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av is used to insert an integer function name.

6.2.2 C Program Calls This section describes the C-callable \av*" routines; those routines that may be called by the application to interact with the Application Executive. There are routines that help control the Application Executive, and those that add names into the symbol table. In using Vista, the only routine that must be used is avbreak. The others are optional and 29

used either for convenience. It is useful to use the routine avprog if the symbol table is to be scanned. The control routines are:

avbreak()

This is the only required av routine. It should be called to indicate that the application is at a visualization break point, and any graphics displays will be updated. The user is provided the option to interact with the application while it is halted. If this is the rst av routine called, the Application Executive will be initialized.

avinit()

Optional. Causes immediate initialization, making the connection to the remote Visualization Manager. This routine is normally not used, but called implicitly when other av routines are called. avprog(exec- le-name) Optional. A C application should use this routine to pass the le name of the executing application to the Application Executive. Most of the time, the best choice of parameter is argv[0]. Use avprog instead of avinit to cause the symbol table to be scanned.

The remainder of the routines allow the application to force symbols to be added to the symbol table and immediately transmitted to the Visualization Manager. It is not necessary to enter symbols in this way as much of the time Vista can read similar information from the symbol table of the program le. These then become convenience routines. You may want to use these routines to have certain variables always available in the Visualization Manager without having to explicitly enter them manually (described in later section). The rst two parameters in each of the following routines are the same and described here. name: A character string giving the name of the variable you are de ning. addr: The address of the variable.

The name de nition routines listed below have incomplete names, designated by an xx at the end for missing characters. Valid combinations of the following one character abbreviations supply the data type of the variable name being de ned and are used to complete each procedure name: c d f g

char double

oat signed 30

i l s u v

int long short unsigned void (functions only)

When more than one type name is used, as in short unsigned int, the short or long speci cation appears rst, then the signed or unsigned speci cation, followed by char, double,

oat, int, or void. There is an insert routine for all legal C type combinations: c i uli

d l us

f ld usi

g li

gc s

gi si

gl u

av0xx(name, addr)

gli uc

gs ui

gsi ul

Identify a scalar variable to Vista and send it to the Visualization Manager. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av0f is used to insert a oat ( oating point) variable name. av1xx(name, addr, lower, upper) Identify a one-dimensional array to Vista and send it to the Visualization Manager. The integer parameter size provides the length of the array. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av1i is used to insert an int (integer) one-dimensional array name. av2xx(name, addr, lower1, upper1, lower2, upper2) Identify a two-dimensional array to Vista and send it to the Visualization Manager. The integer parameters size1 and size2 specify the exact size of the array as declared in the program for the rst and second dimensions, respectively. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av2d is used to insert a double two-dimensional array name. avfxx(name, addr) N.B. This is primarily for future use; it is not useful now. Identify a function name to Vista and send it to the Visualization Manager. The last one or two characters of the subroutine name de ne the variable's type as described above. For example, av is used to insert an int function name.

31

7 Project Status At the start of this year we had made a version of Vista that could be installed on any Sun Sparc platform. In fact, this version has been available on CSRD systems to any user since January. We have a reasonably robust implementation with respect to system issues. One of the hardest parts in packaging such a distributed system is hiding from the user any notion of network errors and interprocess communication. Cooperating processes must carefully recognize non-standard conditions in their partners, then try to remedy the problem or gracefully terminate with a meaningful message. We feel that we have accomplished this. This version is packaged for distribution. Again, only on the Sun Sparc systems. We have a user's manual that can be sent out accompanying this tape. Professor Michael Berry has used Vista at the University of Alabama, Birmingham in his linear algebra research in a manner similar to his work in [TB90]. The weakest part of our system, and one that prevents its use by more users is the graphics methods themselves. Vista had more work put into the system aspects than the into the graphics. For example, we do not have a good ? plot display, 2-dimensional contours, or mesh surface display. Less critical, but still important is that we can only get display hardcopy by Unix utilities for dumping windows of the X Window System, then converting to PostScript. Use of Vista is also hindered because it has not yet been fully ported to another computer system. Our in-house systems include Alliant FX/2800 and FX/80 mini-supercomputers. We have access to a Cray almost next door. Both of these systems were targeted, but ports not completed. Vista cannot handle variable scope yet, or even dump di erent variables at di erent visualization breakpoints. All variables are dumped at all visualization breakpoints. These are just some of the areas we need to address to make Vista a successful, and more importantly, used program. We have a good, stable base and one which is on the verge of being useful. A list of all of our outstanding work to be done on the system is included in Section 8 describing future work. X

Y

8 Future Work Vista development should be a continuing e ort. In order to develop a system which is truly practical, we have released Vista to our own department. We have a list of immediate requirements that will a ect additional acceptance and porting of the system to other sites.

32

 The application executive is our most machine-dependent module. It could take quite

a bit of e ort to port this program to a host which does not support the Unix system. The symbol table scanner is quite sensitive to the host environment. We are trying to isolate as many machine dependencies as we can.  Many of our graphics tools are widgets based on the X toolkit intrinsics or Motif widget set. These are only sucient to demonstrate the utility of Vista. In the context of our current VM we need more graphical techniques and more robust ones.  We have a speci cation for a vis-defaults le which provide the default names to access in the application, the dimension of arrays, including variable dimension and non-unity stride. This development is underway.

Next we will extend Vista to its full design speci cation. Multiple AE/ADM's can connect to a single VM/VDM. The AE's may be from di erent machines or from many clusters of the same machine (Cedar). The playback le will be implemented, to see how well it will work as a session recorder. Also, we will document how one could write such a le directly when the application must execute with no AE at all. Planned short-term improvements to the Visualization Manager include a method for the application user to add new visualization functions to Vista without changing and recompiling the VM program. A user should be able to create a new program that represents a new Display Method; VM will de ne an inter-process communication interface for such processes. Also, data access needs to be improved. The user should be able to easily view partial data structures. Arrays are the most obvious case. It may be most bene cial to view just some cross section of a speci ed array. This will conclude the initial Vista development. Our next work will look at the system toward both application and research. For the Visualization Manager, we will interface to existing visualization systems and toolkits, several of which were mentioned in Section 2 and leverage the VM o of this body of software. On the application side, we must consider identi er scope, dynamicly allocated storage, and parallel applications. Finally, we will be able to study load balancing in visualization software, splitting work between the host and workstation. We include here our own work list of projects yet to be implemented in the current and future versions of Vista. This will provide the reader with some idea of our direction. This list is divided into two sections. Neither list is in any particular order. First are the for short term plans. These include both short (several day) and longer (several week) projects, but we know how to do them and they will only a ect locally well-de ned parts of Vista. Some are truly necessary for system usability, others provide better functionality. We then list long term plans. These include continuing or new research directions based on Vista technology.

33

Short term plans: 1. Handle di erent binary data types for heterogeneous machines, for example, oating point representation, precision, etc. We have a general solution to the problem for Vista. Discussed in more detail in Section 5.9.3. 2. Several places Vista may be passing byte order-dependent data between VM/AE as short or long integers. We must con rm that we are using network byte order everywhere. 3. Get Alliant FX/2800 network bug xed. Sharma was supposed to do this. 4. Port to Alliant FX/2800 host (depends on previous items) 5. Port to Cray after FX/2800. If we did our job right, this should not be a major e ort. Several people here and other places would love a Cray implementation, including Cray Research and Air Force (Wright-Pattersen). 6. Add a new function to get a list of all routine names from AE to VM. Simple scrolling list will display these to user. 7. With previous function, click a name and have all its variables presented in another scrolled list. 8. Cosmetic things, such as highlight the PROCEED button when appropriate to press it. 9. For maintenance and distribution, reorganization of include les. 10. Add adjustable length arrays. This is needed in general to handle even partially- lled arrays. 11. Build composed types for multiple variables to one method. All the user to specify, and save for future use a combination of variables that together determine a \piece of data". For example, a \super-type" describing a FE mesh would be composed of 9 simple variables (in our example). 12. Con guration (or startup) le for composed types, default associations, displays, and adjustable length arrays. This is for restarting sessions, also to establish a known state, or to quickly get to a desired state. 13. Data Managers: ADM, add priority queue, \batch of requests". The mechanism is already implemented, just unused as yet. 14. Control of external display methods. This is an important component of Vista. Allows user-provided modules without recompilation of Vista; new methods to be easily added. Partially implemented; we still need to fully de ne protocol and IPC conventions. 34

15. Hook up 3-d viewer (depends on external methods). We have this viewer already. It allows speci cation of a 3-d viewing transformation graphically. It can be tied into Vista integrally, but we prefer to hook it up using the external method protocol when available. 16. Add new display methods (depends on external methods). 17. Add proper X-Y graph method. This would be an internal method. 18. Add hardcopy mechanism to all displays. Default would be window dump for all new and existing methods, but more intelligent PostScript can still be generated on per-method basis. 19. Loop in VM is CPU intensive. Fix this with interrupt instead of polling, if possible. 20. Have distribution use GNU' make.

Long term plans: 1. Address variable scope issues. Probably use SIGMA as source preprocessor. SIGMA is from Prof. Dennis Gannon at Indiana University, Bloomington, Indiana. 2. Distributed or parallel application; allow VDM to accept multiple connections from multiple applications or more likely, from multiple instances of application running in a distributed manner. 3. Implement playback le. The potential uses are:  explore data les with same interface as interactive applications;  a session recorder to replay a session;  allow a batch program to have \interactive" graphics, although, really postprocessing;  Script user interface commands { trap during interactive session, then replay. 4. Disconnect/reconnect of VM from application. To allow a long running job to free up it's workstation or X terminal for a while, then reconnect to VM later. This may not be too dicult. 5. We need robust widgets. More of them. 6. One way to accomplish previous is to interface to AVS or similar system, as we have forecasted in our papers. 7. CSRD application personnel have asked for sparse matrix display methods. 35

8. Rewrite a minimal AE with just manual insertion and data access. AE as it stands is very large and complex. 9. Temporal data snapshots. Save data at one time step for display or comparison at a later one. 10. Data modi cation. Changing data values through current display methods or a new one. This is a move in some of three directions: debugging, interactive control, and application steering. 11. Program instrumentation environment to facilitate application source modi cations. 12. Integrate David Jablonowski's graph tool with Vista for program visualization and execution tracking. Depends on some source instrumentation also. 13. Display of arbitrary data structures. The purpose of Vista is to show data, not data structures, but sometimes the structure is the data. 14. Add named visualization breakpoints. More than just names, these allow the user to choose the data to show at each vispoint.

9 Conclusion We have presented Vista, an architecture for remote data visualization, along with its initial implementation. We have met our goals by providing a mostly point-and-click (object-action) graphical user interface, along with minimal program instrumentation, to access any program data items on demand and displaying them using dynamically-chosen graphics methods during execution of the application. Finally, the system is modular to facilitate development and provides some tailored modules for di erent environments, machines, and applications. We have a few components of our initial design left to implement and several challenging problems left to solve. The weakest areas of our current implementation are the graphics display methods themselves. Since we assume that good rendering software is available, commercially or otherwise, we put little e ort into this large development e ort. Our small selection of graphics display methods has been assembled in an ad hoc fashion from many sources. As such, they vary greatly in level of robustness. This implementation of Vista is relatively complete in itself. From this point we will work in several divergent directions including parallel remote graphical debugging, access to and selection of more complex data structures, and visualization of distributed applications. Our intent was to create a portable remote visualization environment without ties to any particular operating system, language, or graphics support library. The current implementation of Vista is built on existing standards, requiring sockets as implemented in Berkeley 36

Unix version 4.2 or later. With an appropriate IPC mechanism, much of Vista could be ported to any operating system and network environment. The visualization component of our implementation uses the X Window System and OSF's Motif for the graphical user interface and the graphics displays.

Acknowledgements The Vista implementation included the work of several people besides the authors. Sanjay Sharma implemented the network layer using sockets as described and supported by Berkeley Unix 4.2. Brian Bliss developed the ambitious C language interpreter we use with the Application Executive. His interpreter also scans the object le symbol tables. Allen Malony participated in many of the earlier discussions on the system design, and the authors thank him for his comments. This work was partially supported by the Air Force Oce of Scienti c Research Grant AFOSR-90-0044.

37

Appendix: Vista man page entry

38

39

40

References [Bro88] [Dye90] [HHR89]

[JHH+88] [JT91] [NT89] [TB90] [TJC91a] [TJC+91b] [UTK+89]

Marc H. Brown. Exploring algorithm using Balsa-II. Computer, 21(5):14{36, May 1988. D. Scott Dyer. A data ow toolkit for visualization. IEEE Computer Graphics and Applications, 10(4):60{69, July 1990. Esa Helttula, Aulikki Hyrskykari, and Kari-Jouko Raiha. Graphical speci cation of algorithm animations with Aladdin. In Proceedings of the TwentySecond Annual Hawaii International Conference on System Science, pages 892{ 901, Kailua-Kona, Hawaii, January 1989. IEEE. W. E. Johnston, D. E. Hall, J. Huang, M. Rible, and D. Robertson. Distributed scienti c video movie making. In Proceedings of Supercomputing '88, pages 156{161, Orlando, Florida, November 14-18 1988. David Jablonowski and Allan Tuchman. Vista users manual. Technical Report 1068, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, May 1991. Henry Neeman and Allan Tuchman. Simulation time animation system. Technical Report 859, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, February 1989. Allan M. Tuchman and Michael W. Berry. Matrix visualization in the design of numerical algorithms. ORSA Journal on Computing, 2(1):84{92, 1990. Allan M. Tuchman, David Jablonowski, and George Cybenko. Run-time visualization of program data. In Proceedings of Visualization '91, San Diego, CA, October 1991. To appear. Allan M. Tuchman, David Jablonowski, George Cybenkoi, Brian Bliss, and Sanjay Sharma. Vista: A system for remote data visualization. In Proceedings of the SIAM 5th Conference on Parallel Computing, Houston, TX, March 1991. Craig Upson, T Faulhaber, Jr., David Kamins, David Laidlaw, David Schlegel, Je rey Vroom, Robert Gurwitz, and Andries van Dam. The application visualization system: A computational environment for scienti c visualization. IEEE Computer Graphics and Applications, 9(4):30{42, July 1989.

41

Suggest Documents