Identifying and Reducing Critical Lag in Finite

1 downloads 0 Views 200KB Size Report
to-end lag time (i.e., the delay between a user's action and the display of the result ... Further, all of the computer components were located at one site, interconnected ..... W. Gropp, L. Curfman McInnes, and B. Smith, \PETSc 2.0 Users Manual,".
Identifying and Reducing Critical Lag in Finite Element Simulation Valerie E. Taylor Jian Chen EECS Department Northwestern University Milana Huang Electronic Visualization Laboratory University of Illinois at Chicago Thomas Can eld Rick Stevens Mathematics and Computer Science Division Argonne National Laboratory

IEEE Computer Graphics and Applications, July 1996.

1

Abstract Interactive, immersive virtual environments allow observers to move freely about computer-generated 3D objects and to explore new environments. The e ectiveness of these environments depends on the graphics used to model reality and the endto-end lag time (i.e., the delay between a user's action and the display of the result of that action). In this paper we focus on the latter issue, which has been found to be as important as frame rate for interactive displays. In particular, we analyze the components of lag time resulting from executing a nite element simulation on a multiprocessor system located in Argonne, Illinois, connected via ATM to an interactive visualization display located in San Diego, California. The primary application involves the analysis of a disk brake system that was demonstrated at the Supercomputing 1995 conference as part of the Information Wide Area Year (I-WAY) project, which entailed the interconnection of various supercomputing centers via a high-bandwidth, limitedaccess ATM network. The results of the study indicate that the critical component of lag is the simulation time. We discuss the use of network of supercomputers for reducing this time.

2

1 Introduction Interactive, immersive visualization allow observers to move freely about computer-generated, three-dimensional objects and to explore new environments. A critical issue facing interactive virtual environments is the end-to-end lag time (i.e., the delay between a user action and the display of the result of that action). Like any closed-loop adaptive system, if the lag is too great, users nd it dicult to maintain ne control over system behavior and complain that the system is unresponsive. Indeed, Liu et al. [1] found lag time to be as important as the frame rate for e ective use of immersive displays. Researchers have studied lag in the context of teleoperated machines, head-mounted displays, and telepresence systems [1, 2]. Liu et al. [1] conducted experiments on a telemanipulation system and found the allowable lag time to be 100 ms (0.1 s) and 1000 ms (1 s) for inexperienced and experienced users, respectively. In [3], the work on lag models was extended to include scienti c simulations with interactive, immersive environments. That work, however, focused on simulations executed on one processor; multiple processors for the simulations were not used or analyzed in the lag model. Further, all of the computer components were located at one site, interconnected via a local area network. In this paper we identify the critical components of lag resulting from simulations executed on multiprocessors connected to the virtual environment via wide area networks. In addition, we discuss methods to reduce this lag. With the emergence of high-speed networks and distributed computing resources, the frequency of remote access and distributed collaboration is rising rapidly. We have conducted an extensive case study of a visualization system to display the results of a nite element simulation of a contact-impact problem, in particular, a disk brake system. This application was demonstrated at the Supercomputing 1995 conference as part of the Global Information Infrastructure Testbed for the Information Wide Area Year (I-WAY) project. The display system consisted of an ImmersaDeskTM | one wall of a CAVETM (Cave Automatic Virtual Environment) [4]. The simulation was executed on the IBM SP located at Argonne National Laboratory; the display system was located at the San Diego Convention Center. The IBM SP was connected to the display system via an ATM OC-3c network as part of the I-WAY project. Although our analysis is speci c to this context, the concepts presented in this paper can be extended easily to other scienti c applications. 3

To identify the contributors to lag, we instrumented all major processes in the system and constructed a performance model of the contributors to end-to-end lag: rendering, tracking, local network connections to the parallel system, parallel simulation, and various types of synchronization lag. Our lag model decouples the viewpoint lag (not involving the simulation) from the interaction lag (using the simulation results). Our analysis indicates that the viewpoint lag is within the allowable range, but the interaction lag is two orders of magnitude beyond this range, with the critical component being simulation time. We discuss some of our current work on using network of supercomputers to reduce the simulation time.

2 Visualization and Simulation Environment Interactive, immersive visualization of scienti c simulations involves four major subsystems: display, graphics, simulation, and communications. The descriptions of these subsystems particular to our system are given below.

2.1 Display Component The ImmersaDeskTM , the display component, creates a wide eld of view by rear-projecting stereo images on a 4x5 foot translucent panel tilted at a 45 degree angle. This display system is a lower-cost, more portable, and smaller alternative to the CAVETM . The ImmersaDeskTM provides the illusion of data immersion via visual cues, including wide eld of view, stereo display, and viewer-centered perspective. A Silicon Graphics (SGI) Power Onyx computes the stereo display, with a resolution of 1024768 for each image. The Onyx alternately draws left and right eye images at 96 Hz, resulting in a rate of 48 frames per second per eye. These images are sent to an Electrohome video projector for display. Infrared emitters are coupled with the projectors to provide a stereo synchronization signal for a CrystalEyes LCD glasses worn by each user. These glasses have a sampling rate of 96 Hz that is matched to the projection system; the eyes and brain fuse the alternate left and right eye images to provide stereo views. Tracking is provided by an Ascension SpacePad unit with two inputs. One sensor tracks head movements; the other tracks a hand-held wand. These sensors capture position and orientation information on head and wand movements at a rate of 10 to 144 measurements 4

per second [5]. The existing system is con gured to operate at 100 measurements per second. The buttons on the wand are sampled at a rate of 100 Hz. The location and orientation of the head sensor are used by the SGI Onyx to render images based on the viewer's location in the virtual world. Hence, subtle head movements result in slightly di erent views of the virtual objects, consistent with what occurs in reality. The wand has three buttons and a joystick that are connected to the Onyx via an IBM PC, which provides A/D conversion, debounce, and calibration. Other observers can passively share the virtual reality experience by wearing LCD glasses that are not tracked.

2.2 Graphics Component The SGI Onyx is a shared-memory multiprocessor with a high-performance graphics subsystem. The system used in our experiments had 128 MB of memory, 10 GB of disk, four R4400 processors, and three RealityEngine2 graphics pipelines. Each RealityEngine2 has a geometry engine consisting of Intel i860 microprocessors, a display generator, and 4 MB raster memory [6]. The Onyx is used to drive the virtual environment interface (as discussed above). The ImmersaDesk, however, uses only one RealityEngine2 graphics pipeline connected to a Electrohome Marque 8000 high-resolution project to project images onto the one translucent screen.

2.3 Simulation Component The simulation component consists of a 128-processor IBM SP with a high-performance input/output system. Each SP node has 128 MB of memory and a 1 GB local disk and is connected to other SP nodes via a high-speed network. Some of the nodes are equipped with ATM and Ethernet interfaces. Collectively, the SP system is also connected to 220 GB of high-speed disk arrays and an Ampex DST-800 automated tape library. 1

This input/output system eventually will be used for recording and playback of visualizations. However, that work is beyond the scope of this paper. 1

5

2.4 Interconnections The interconnections used in the system consist of the devices used by a scientist to interact with the display system and the interconnection of the simulation and graphics components. The scientist controls the eld of view by changing his or her head position or manipulating the wand buttons and joystick; the simulation is also modi ed via the wand. The simulation and graphics components are interconnected via the ATM OC-3c (peak bandwidth of 155 Mbps) used with the I-WAY project.

3 End-to-End Lag Given two input devices, the head tracker and the wand, there are two classi cations of interactions:

 Movement of the head tracker. This type of interaction causes a change to the eld of

view; the data sent to the simulation process is not modi ed. This lag is de ned as view . Q

 Movement and clicking of wand buttons. This type of interaction can change the eld

of view or the simulation process, dictated by the code-de ned wand buttons and joystick. This lag is de ned as interact . In this paper we focus on wand modi cation to the simulation process to distinguish the two lags. Q

The operations that are executed based upon a user interaction are the following: 1. The sensors generate the position and rotation of the header tracker and wand; the personal computer records the position of the wand buttons ( track ) [input device lag]. T

2. The wand information is read by simulation side (

T

read )

[network lag].

3. The scienti c application is simulated on the multiprocessor system ( lag].

T

4. The update from the simulation is sent to the graphics process (

T

6

sim ) [simulation

write)

[network lag].

5. The graphics process uses the data from the simulation process and the tracker process to render a new image ( render ) [rendering lag]. T

In addition to the above lags there is also synchronization lag. We consider four sources of synchronization lag: (1) sync TR , the time from when the tracker measurement is available until the data is read by the rendering process, (2) sync RS , the time from when the rendering process has read the updated wand values until the values used by the simulation process, (3) sync SR , the time from when the data is available from the simulation process until written to the rendering process, and (4) sync F , the time from when the data is available in the frame bu er and the image is available on the screen. T

(

)

T

T

(

(

)

)

T

(

)

Given the above sequence of operations, the following equations represent the lag time for the head tracker ( view ) and the wand ( interact ) Q

view

Q Q

interact

Q

= =

T T

track + Tsync(TR) + Trender + Tsync(F )

(1)

track + Tsync(TR) + Tsync(RS ) + Tread + Tsim + Twrite

+

sync(SR) + Trender + Tsync(F )

T

(2)

4 Case Study: Analysis of a Disk Brake System To identify the critical components of the end-to-end lag for a combined supercomputervisualization system, we analyzed a nite element simulation of an automotive disk brake system. Braking times generally are on the order of seconds or tens of seconds. Each time step, which is generally on the order of a few milliseconds, can require tens of minutes of execution time on a single processor. Hence, parallel systems are necessary for this application. Some critical issues related to automotive disk brakes are the heating and stressing of the material used for the disk and pads, the wearing on this material, and the pitch of the sound that occurs when the pads are applied to a rotating disk. A number of aspects of the application are visualized in the virtual reality environment. The user is allowed to view and manipulate in three dimensions the geometry of the automative disk brake system(Figure 1). Results of the supercomputer simulation, such as temperature distribution over the material or domain decomposition for multiprocessing, are indicated by coloration of the geometry. To give a physical context for the disk brake, 7

for example to show relative size and placement in an automobile, the car frame of a Porsche with and without tires can be visualized with the disk brake system as well (Figures 2 and 3). With the wand and a menu system within the virtual reality environment, the user is able to change interaction modes and to remotely modify the scienti c simulation executed on multiprocessor system. The complete disk brake system consists of 3790 elements, 5636 nodes, and 22,544 degrees of freedom (4 degrees of freedom per node). Given the problem size, the nite element simulation was executed on eight processors of the IBM SP using the FIFEA code. FIFEA makes extensive use of the PETSc (Portable, Extensible, Toolkit for Scienti c computation) [7] to do linear algebra and to manipulate sparse matrices and vectors and the MPI library [8] for communication between the IBM SP processors. One IBM SP processors sends and receives results to and from the SGI Onyx using the CAVEcomm library [9]. The graphics and simulations codes were instrumented with function calls to the Pablo software tool [10].

Lag Time Table 1 is the average time for each lag source for the simulation executed on eight IBM SP processors with I-WAY interconnection between Argonne and the San Diego Convention Center. The network values in parentheses correspond to the values collected by using the Internet connection between the IBM SP at Argonne and the SGI Onyx at University of Illinois at Chicago. The mean lags and their standard deviations are based on 30 minutes of elapsed time. The values with no standard deviations correspond to the sources of synchronization lag, which are derived from other values. The sync F value is based on a frame rate of 48 frames per second per eye and an average scan rate of 120 Hz for the Electrohome projectors. The average of this synchronization lag is equal to half of the frame-induced time. T

(

)

The total network time, the summation of read and write, corresponds to the time to send the temperature data from the IBM SP to the SGI Onxy, modify the data structures on the SGI Onyx, and send the wand data back to the IBM SP. In particular, the time to send the simulation data from the IBM SP to the bu ers is equal to write and the time to receive the simulation data on the SGI Onxy, process it, send the wand data, and receive the wand data on the IBM SP side is equal to read . For the I-WAY, the total time is 822.26 ms; T

T

T

T

8

for the Internet the total time is 1360.57. This reduction is signi cant given that the data using I-WAY went from Illinois to California and through the myriad connections in the San Diego Convention Center as compared with going between two sites in the same state with the Internet. Recall from x1 that Liu et al. [1] found the allowable lag time was 100 ms and 1000 ms for inexperienced and experienced users, respectively. The total lag times for the eight processor case are view = 129 58 ms and interact = 57370 05 ms. The view lag is within the allowable tolerance, but the interaction lag is over one order of magnitude beyond the tolerance for experienced users and two orders of magnitude for the inexperienced users. For the interaction lag, the simulation and synchronization times compose over 95% of the total lag. Current work on the use of distributed parallel systems for reducing the simulation time are discussed below. The method to reduce the simulation time will also result in a reduction of the synchronization time, due speci cally to a reduction to sync RS ( sync RS is approximated as half of sim ). Hence this method will have a signi cant impact on reducing the interaction lag. Q

:

Q

:

T

(

)

T

(

)

T

5 Using Parallelism to Reduce Simulation Lag Recall that the nite simulation was executed on eight IBM SP processors. A reduction in execution time by two orders of magnitude would require a minimum of 800 processors, assuming optimal scaling conditions. This number exceeds the number of processors available at any one site (with the exception of Sandia National Laboratories), but is a small fraction of the resources available at a combination of the sites. With the I-WAY project, various supercomputer centers are interconnected by ATM (as described previously). Currently, we are investigating decomposition methods for ecient use of a network of supercomputers. For the case of one supercomputer, conventional data decomposition methods partition the domain so as to distribute the computational load evenly among the processors and decrease the interprocessor communication requirements. Such an approach is applicable when all the processors have the same performance and are interconnected by one local network. With network of supercomputers, however, di erent processor speeds as well as di erent networks (local and wide area networks) must be considered. Ecient execution on any multiprocessor machine occurs when the execution time is 9

approximately the same across all processors, and the interprocessor communication overhead is minimized. In [11] we proposed a two-level decomposition method that exploits the various features of the distributed supercomputers to balance the execution time|not computational load|across all the processors in the system. To illustrate the advantages of our method, we provided preliminary results from executing a nite element application on a network of two supercomputers, one located at Argonne National Laboratory and the other located at the Cornell Theory Center. The proposed decomposition scheme resulted in a 20% reduction in execution time as compared with naively applying conventional decomposition schemes that are applicable to single supercomputers. Work is in progress to automate the decomposition method and to apply it to the supercomputer-visualization system.

6 Summary In this paper we analyzed the lag time of a integrated supercomputer applications with interactive, immersive virtual interfaces. We studied a system used to display the results of a nite element simulation of the analysis of an automotive disk brake system. The simulation was executed on the IBM SP at Argonne National Laboratory and the results were displayed in a virtual environment at the San Diego Convention Center. The simulation and virtual system were interconnected via ATM in conjunction with the I-WAY project. The analysis entailed the comparison of a system using ATM versus Internet. The results of the study indicated that the view lag was within the allowable tolerance, but the interaction lag was over one order of magnitude beyond the tolerance for experienced users and two orders of magnitude for the inexperienced users. For the interaction lag, the simulation and synchronization times composed over 95% of the time. Further, the transmission time using an ATM network between Illinois and California was approximately half of that using the Internet between two sites in Illinois. The results of the study highlighted the importance of reducing the lag time to a tolerable range | 0.1 s to 1 s. We described some current work involving the ecient use of a network of supercomputers for providing the needed compute power for reducing the simulation time. Future work entails incorporating the decomposition methods into the supercomputervisualization system. 10

Acknowledgments We thank Michael Papka, Terry Disz, Warren Smith, Jonathan Geisler, Lois McInnes, and Steve Tuecke for their long hours of help in making the various libraries work in harmony. We also acknowledge Tom Defanti for making the I-WAY project possible, Jason Leigh for assisting with the development of the ImmersaDesk images, and Alan Cruz for assistance with the car modeling. The author at Northwestern University was supported by a National Science Foundation Young Investigator Award, under grant CCR{9357781. The authors at Argonne National Laboratory were supported by the Mathematical, Information, and Computational Sciences Division subprogram of the Oce of Computational and Technology Research, U.S. Department of Energy, under Contract W-31-109-Eng-38. The author at the University of Illinois in Chicago was supported by a Laboratory Graduate Grant from Argonne National Laboratory.

References [1] A. Liu, G. Tharp, L. French, S. Lai, and L. Stark, \Some of What One Needs to Know about Using Head-Mounted Displays to Improve Teleoperator Performance," IEEE Transactions on Robotics and Automation, Vol. 9, 1993, pp. 638{648. [2] M. Wloka, \Lag in Multiprocessor Virtual Reality, " Presence, Vol. 4, 1995, pp. 50{63. [3] V. Taylor, R. Stevens, and T. Can eld, \Performance Models of Interactive, Immersive Visualization for Scienti c Applications," Proceedings of the International Workshop on High Performance Computing for Computer Graphics and Visualization, Swansea, United Kingdom, 1995. [4] C. Cruz-Neira, D. J. Sandin, and T. DeFanti, \Surround-Screen Projection-Based Virtual Reality: The Design and Implementation of the CAVE," Proceedings of SIGGRAPH, 1993. [5] SpacePad Position and Orientation Measurement System Installation and Operation Guide, Ascension Technology Corporation, 1995. [6] Silicon Graphics Onyx Installation Guide, Document Number 108-7042-010. 11

[7] S. Balay, W. Gropp, L. Curfman McInnes, and B. Smith, \PETSc 2.0 Users Manual," Technical Report ANL-95/11, Argonne National Laboratory, 1995. [8] W. Gropp, E. Lusk, and A. Skjellum, Using MPI Portable Parallel Programming with the Message-Passing Interface, Massachusetts: MIT Press, Cambridge, 1994. [9] T. Disz, M. Papka, M. Pellegrino, and R. Stevens, 1995, \Sharing Visualization Experiences among Remote Virtual Environments," Proceedings of the International Workshop on High Performance Computing for Computer Graphics and Visualization, 1995, Swansea, United Kingdom. [10] D. A. Reed, K. A. Shields, W. H. Scullin, L. F. Tavera, and C. L. Elford, \Virtual Reality and Parallel Systems Performance Analysis," IEEE Computer, November 1995. [11] V. E. Taylor, J. Chen, T. Can eld, and R. Stevens, \A Decomposition Method for Ecient Use of Distributed Supercomputers," Technical Report CSE-96-001, EECS Department, Northwestern University.

12

Table 1: Lag values (8 processor IBM SP) using I-WAY with Internet values in parentheses

Lag Component track sync(TR) Trender Tsync(RS ) Tread T T

T T

T T

sim write sync(SR) sync(F )

Total:

Mean (ms)

5.85 2.92 105.81 18788.16 820.83 (1358.57) 37576.31 2.26 (2.00) 52.91 15.0

Std. Dev. (ms)

Q

view

Q

interact

2.63 | 11.01 | 49.38

(%) 4.51 2.25 81.66 NA NA

581.5 0.17

NA NA

65.50 0.004

| |

NA 11.58

0.09 0.03

|

(%) 0.01 0.005 0.18 32.75 1.43

| 129.58 ms 57370.05 ms

13

Figure 1: Virtual Automotive Disk Brake

14

Figure 2: Virtual Disk Brake in a Porsche

15

Figure 3: Virtual Disk Brake in a Porsche without Tires

16

Suggest Documents