derably less expensive than storage area networks of dedicated storage servers. Although every node has an entire copy of the data under study, a node needs ...
Interactive Volume Visualization of Fluid Flow Simulation Data Paul R. Woodward, David H. Porter, James Greensky, A. J. Larson, Michael Knox, James Hanson, Niranjay Ravindran, and Tyler Fuchs Laboratory for Computational Science & Engineering, University of Minnesota, 499 Walter Library, 117 Pleasant St. S. E., Minneapolis, Minnesota 55455, U.S.A. {paul, dhp, mikeknox}@lcse.umn.edu, {jjgreensky, jphansen}@gmail.com, {lars1671, ravi0022, fuch0057}@umn.edu
Abstract. Recent development work at the Laboratory for Computational Science & Engineering (LCSE) at the University of Minnesota aimed at increasing the performance of parallel volume rendering of large fluid dynamics simulation data is reported. The goal of the work is interactive visual exploration of data sets that are up to two terabytes in size. A key system design feature in accelerating the rendering performance from such large data sets is replication of the data set on directly attached parallel disk systems at each rendering node. Adaptation of this system for interactive steering and visualization of fluid flow simulations as they run on remote supercomputer systems introduces special additional challenges which will briefly be described. Keywords: scientific visualization, interactive steering, fluid flow simulation
1
Introduction
The visualization of fluid flow simulations using perspective volume rendering provides a very natural means of understanding flow dynamics. This is especially true in flow simulations, like those in astrophysics, that do not involve complex confining walls and surfaces. However, volume rendering is data intensive, and therefore even a modest flow simulation can easily result in quite a large amount of data to be visualized. Our team at the University of Minnesota’s Laboratory for Computational Science & Engineering (LCSE) has been exploiting volume rendering for fluid flow visualization for many years. In this paper we report our most recent efforts to accelerate this process so that even very large data sets can be interactively explored at full PowerWall resolution. Our earlier efforts at full resolution visualization were based principally on off-line generation of movie animations. The challenge of speeding up perspective volume visualization has been addressed by several groups. Like other teams (see, for example, [1-2] and references therein), our hierarchical volume rendering software HVR (www.lcse.umn.edu/hvr) utilizes a multiresolution, tree-based data format to render only coarsened data when there is little variation within a region or when the region occupies very few pixels. Like many other groups we use PC graphics engines to accelerate the parallel ray tracing calculations that we long ago
to appear in Proceedings of the PARA’06 Workshop on the State-of-the-Art in Scientific and Parallel Computing Umea, Sweden, June, 2006
2
Fig. 1. Jim Greensky and David Porter control 8 nodes of our system at IEEE Vis2005.
performed on vector CPUs like the Cray-2, then later implemented on SGI Reality Engines. This is in contrast to ray tracing on large parallel machines, as described in [4], which can be interactive when the data fits into the memory. Unlike more comprehensive graphics software packages like VisIt and SciRun [5], we concentrate exclusively on volume rendering at the LCSE and focus on the special problems that come with very large, multi-terabyte data sets that do not fit into the memories of essentially any available rendering resource. This focus has led us to our present system design, which exploits full data set replication at multiple rendering nodes as a key performance enabling feature.
2
Interactive Volume Rendering of multi-TB Data
To set up for a session of data exploration, we generally decide upon a limited set of variables we think will be useful and then process the sequence of compressed dumps from the simulation to create “HV-file” sequences for these variables that incorporate the multiresolution data structure that our renderer, HVR, requires for performance. A medium-scale run, which we used to demonstrate our system at the IEEE Visualization conference in October, 2005, in an exhibit booth sponsored by our industrial partner, Dell, (see Figure 1) serves as a good example. From this run, which took about 6 weeks on a 32-processor Unisys ES7000 in our lab, we saved
to appear in Proceedings of the PARA’06 Workshop on the State-of-the-Art in Scientific and Parallel Computing Umea, Sweden, June, 2006
3
Fig. 2. A rendering of our decaying Mach 1 turbulence vorticity data on the PowerWall.
about 500 compressed dumps of about 10 GB each. It took a few days to process these dumps into sets of 1.27 GB HV-files for the vorticity, the divergence of velocity, and the entropy. These HV-files thus constituted a data set of about 1.9 TB. This data set was sufficiently small that we could place a full copy of it at each node of our 14-node rendering cluster. Replication of the data to be explored is a key element of our system design. Each node has a directly attached set of twelve 400 GB SATA disks striped using a 3ware 12-channel disk controller and delivering up to 400 MB/sec sustained data transfers. The capacity at each node therefore allows 2 TB to be declared as scratch space and used for replication of whatever data might be under study at the time. Fourteen different 2-TB data sets can be held on the system while a user explores one of them replicated on all nodes. Our experience with building inexpensive fast disk systems over many years indicates that data replication is the only surefire way to guarantee the fast, sustained data delivery on every node from a shared data set that is required for interactive volume rendering. This system design allows each local disk subsystem to stream data unimpeded by requests from other nodes for different portions of the shared data. Locally attached disks are also considerably less expensive than storage area networks of dedicated storage servers. Although every node has an entire copy of the data under study, a node needs at any one time to read only a small portion of it. Each HV-file has an octree structure of voxel bricks, which overlap in order to enable seamless image rendering. Although our software enables an initial rendering in low resolution to be continuously improved, this strategy is not well suited to our large PowerWall display, shown in Figure 2, which measures 25 feet across. The resolution of an image rendering is very obvious on a display of this size. Therefore, on the PowerWall we generally would prefer to continue looking at the previous high quality rendering while waiting for the next high quality image to appear. Our solution is thus to optimize the throughput of the pipeline from data on disk through to the image on the screen with multiple renderers assigned their own segments of the final image. In order to speed up the rate at which voxel bricks can be loaded as 3-D textures into our Nvidia graphics cards, we
to appear in Proceedings of the PARA’06 Workshop on the State-of-the-Art in Scientific and Parallel Computing Umea, Sweden, June, 2006
4 have decreased the size of these bricks from our previous standard of 1283 to 643 voxels. Although this caused the data streaming rate from disk to drop by about 20%, the overall system pipeline speed increased. The smaller voxel bricks do, however, increase the efficiency of the parallel rendering by allowing each HVR volume rendering server to read and render a section of the flow more closely approximating that contained in its viewing frustum. For true interactive visualization of our example 1.9 TB data set, the time to produce a new, full resolution PowerWall image from data not in memory but on disk is the most essential measure of success. Ten nodes of our cluster can produce a 13 Mpixel PowerWall image of a billion-voxel data snap shot in just over 1 second. If we then choose to look at this same data from a different viewing position and/or with a different color and opacity mapping, we can generate the next image much faster, since our software takes advantage of the voxel bricks that are already loaded into each graphics card. We are now experimenting with increasing the interactivity by concentrating our 14 nodes of rendering power on less than the full 10 image panels of our PowerWall. This allows us to trade off interactivity with image size and resolution.
3
Remote Flow Visualization and Simulation Steering
Our software (see www.lcse.umn.edu/hvr) supports the visualization cluster members rendering portions of an image that they send to a remote node to be stitched together and displayed on a single screen. This rendering is generally faster, since the screen has many fewer pixels than the PowerWall, and it is therefore more interactive. This mode was used in the exhibit booth at IEEE Vis2005, as shown in Figure 1. Just 8 of our 14 nodes were able to render images from billion-voxel HV-files on disk at 1920×1200 resolution in about 0.7 sec. Later in the year, we combined this capability with a high performance implementation of our PPM gas dynamics code running on 512 or 1024 CPUs of the Cray XT3 at the Pittsburgh Supercomputer Center (PSC) to enable interactive steering and visualization of an ongoing gas dynamics simulation. We worked with the PSC team, and in particular Nathan Stone, who provided a utility, PDIO [6], that allowed us to write tiny quarter-MB files from each of the 512 CPUs on the machine that PDIO managed and transported to a node of our visualization cluster at the LCSE in Minnesota. Our software then constructed HV-files from the stream of little files, which landed on a RAM disk, and broadcast the HV-files to 10 rendering nodes connected to the PowerWall. However, the rendering nodes could also transmit portions of images to another machine located anywhere on the Internet, and that machine could display the result on its monitor, as shown in Figure 3. Figure 3 shows the image viewer pane, the opacity and color map control pane, the field (i.e. fluid state variable) and time level control pane, and viewpoint and clipping plane control pane. A separate GUI controls the simulation on the Cray XT3, which the user can pause, abort, restart, or instruct to alter the frequency of data file output. Running on 1024 CPUs at a little over 1 Tflop/s, this application computes a flow simulation of shear instability, as shown, to completion in about 20 minutes. A reenactment of this TeraGrid Conference demonstration shown in Figure 3 using the actual data generated by the XT3 at the event arriving according to its actual time stamps (every
to appear in Proceedings of the PARA’06 Workshop on the State-of-the-Art in Scientific and Parallel Computing Umea, Sweden, June, 2006
5
Fig. 3. LCSE-PSC demonstration of interactive supercomputing, TeraGrid Conf.
10 time steps, or about every 3 to 5 seconds, new voxel data arrived in the LCSE) and showing the interactive control can be downloaded as a Windows .exe file from www.lcse.umn.edu/TeraGridDemo. Grid resolutions of 5123 cells enable 30 to 40 minute simulations when computing with PPM at 1 Tflop/s, and this is a good length of time for an interactive session. The PDIO software enabled us to obtain 55 MB/sec throughput on the University of Minnesota’s Internet connection during the day and almost twice this much at night. This allows the transmission of a 128 MB voxel snap shot of the flow in a single fluid state variable in about 3 seconds. While a new snap shot is being transmitted, the interactive capabilities of our visualization cluster allow the user to view the previous snap shot from different angles and with different color and opacity settings using the control panes shown in Figure 3. The user may also explore the data that has accumulated to this point. Responsiveness is good, because the data size is small. However, if one wishes to view a different fluid state variable from the one that was sent, one can only request that this be transmitted in subsequent snap shots; it can be made available at this same time level only if multiple variables are transmitted. This would be possible on a 10 Gbit/s connection. As computer power and networking bandwidth grow, we anticipate that interactive flow visualization will be used more and more for steering simulations as they run. The data sets associated with runs that are short enough to be steered today are still under one TB, but as petascale computing systems are put into place this will change. Acknowlegements This work has been supported by NSF CISE RR (CNS-0224424) and MRI (CNS0421423) grants, through grant DE-FG02-03ER25569 from the MICS program of the DoE Office of Science, through a donation of an ES-7000 computer from Unisys
to appear in Proceedings of the PARA’06 Workshop on the State-of-the-Art in Scientific and Parallel Computing Umea, Sweden, June, 2006
6 Corp., and by local support to the Laboratory for Computational Science & Engineering (LCSE) from the University of Minnesota’s Digital Technology Center and Minnesota Supercomputer Institute. Our fluid dynamics simulations have also been supported by grants of time on the NSF TeraGrid cluster at NCSA and at the Pittsburgh Supercomputing Center. We gratefully acknowledge the work of Nathan Stone and Raghu Reddy at the Pittsburgh Supercomputer Center, who worked closely with us to make the data transfers from our fluid dynamics application on the Cray XT3 to a node of our local visualization cluster go smoothly and very fast using Stone’s PDIO utility. References 1. Ahearn, S., J. R. Daniel, J. Gao, G. Ostrouchov, R. J. Toedte, C. Wang, “Multiscale data visualization for computational astrophysics and climate dynamics at Oak Ridge National Laboratory,” J. of Physics: Conf. Series 46, 550-555 (2006). 2. Shen, H.-W., “Visualization of large scale time-varying scientific data,” J. of Physics: Conf. Series 46, 535-544 (2006). 3. LCSE volume rendering and movie making software is available at www.lcse.umn.edu/hvr. 4. Parker, S., M. Parker, Y. Livnat, P.-P. Sloan, C. Hansen, P. Shirley, “Interactive ray tracing for volume visualization,” IEEE Transactions on Visualization and Computer Graphics, 5, 238-250 (1999). 5. Bethel, W., C. Johnson, C. Hansen, S. Parker, A. Sanderson, C. Silva, X. Tricoche, V. Pascucci, H. Childs, J. Cohen, M. Duchaineau, D. Laney, P. Lindstrom, S. Ahern, J. Meredith, G. Ostrouchov, K. Joy, B. Hamann, “VACET: Proposed SciDAC2 Visualization and Analytics Center for Enabling Technologies,” J. of Physics: Conf. Series 46, 561-569 (2006). 6. Stone, N. T. B., Balog, D., Gill, B., Johanson, B., Marsteller, J., Nowoczynski, P., Porter, D., Reddy, R., Scott, J. R., Simmel, D., Sommerfield, J., Vargo, K., and Vizino, C., “PDIO: High-Performance Remote File I/O for Portals-Enabled Compute Nodes,” Proc. 2006 Conf. on Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, June, 2006, also available at www.psc.edu/publications/tech_reports/PDIO/PDIO-PDPTA06.pdf 7. Nystrom, N., D. Weisser, J. Lim, Y. Wang, Brown, S. T., Reddy, R., Stone, N. T., Woodward, P. R., Porter, D. H., Di Matteo, T., Kale, L. V., and Zheng, G., “Enabling Computational Science on the Cray XT3,” Proc. CUG (Cray User Group) Conference, Zurich, May, 2006.
to appear in Proceedings of the PARA’06 Workshop on the State-of-the-Art in Scientific and Parallel Computing Umea, Sweden, June, 2006