Sepia switches at the level of tiles, Lightning-2 at the level of scanline ... of these projects (Lightning-2, Metabuffer, Sepia) transfer pixels over a network and are ...
Scalable Interactive Ray-Casting of Volumes Using Off-the-Shelf Components Santiago Lombeyda1 1
Mark Shand2 Laurent Moll2
California Institute of Technology
2
David Breen1 Alan Heirich2
Compaq Computer Corporation
Abstract This paper describes an application of the second generation Sepia architecture implementation (Sepia-2) to support interactive visualization of scalar fields represented as very large 3D rectilinear grids. By employing pipelined sort-last associative blending operators a demonstration system yields scalable interactivity at 30 frames per second. We believe these results can be extended to support other types of structured and unstructured grids and a variety of GL rendering techniques. We show how to extend our single-stage demonstration system to larger multi-stage networks. This requires solving a dynamic mapping problem for blending operators that are similar to Porter-Duff compositing operators. We conclude by discussing technical and fundamental issues to address in future work.
1. Introduction In Heirich and Moll [hei99] a commodity-based architecture was presented for scalable visualization in distributed systems. This architecture allows large numbers of rendering servers to post images on rectilinear tiles of a large display, and to apply compositing, blending, and other pixel-level operators to those displayed images. This paper reports on the application of a second generation implementation (Sepia-2) of this architecture to structured volume visualization. The goal of this current work is to demonstrate that large volume datasets (3D rectilinear grids of scalars) can be interactively rendered with a networked configuration of off-the-shelf components. Volume rendering is performed by a proprietary raycasting engine, the VolumePro 500 [pfi99]. This engine implements object-order ray casting using a shear-warp factorization of the viewing matrix [lac94]. We present an algorithm for concurrent subvolume rendering followed by pipelined view-dependent tiling and blending (figure 1). This algorithm requires dynamic view-dependent remapping of the blending pipeline on each frame, similar to requirements posed by Porter-Duff compositing in XFree86 [por84]. We demonstrate how to map the algorithm onto a large multi-stage network configuration, consisting of PCs with blending/compositing (Sepia-2) boards and ray-casting engines, connected by a high speed network, and utilizing a GL-based graphics board for display (figure 2). Our test configuration demonstrates interactive rendering at 30 frames per second which is the maximum frame rate of the ray-casting engine. Extrapolations from a small demonstration cluster suggest that interactivity will be sustainable in clusters of 1024 servers and larger. The Sepia architecture is platform neutral beyond the unavoidable obligation to support device drivers and programming infrastructure. It is being prototyped for commercial use in various kinds of computing platforms. These range from clusters of workstations or PCs containing Intel CPUs, to clusters of Alpha servers, and large NUMA multiprocessor systems with hundreds of AGP graphics accelerators and Alpha CPUs. All of these environments support Linux and/or Unix. The configuration presented in this paper uses Pentium-III graphics workstations running Microsoft Windows NT. By clustering next generation ray casting engines we expect to visualize 3D gigavoxel data sets interactively using as few as 8 of these nodes.
2. Related work A number of projects are exploring hardware, software, and algorithms for scalable rendering using commodity components [sto01,bla00,eld00,sam00,hum00,hei99]. The Sepia architecture is probably most similar to the Lightning-2 project [sto01]. Both architectures distribute image pixels across a network with switching between source and display. Sepia switches at the level of tiles, Lightning-2 at the level of scanline fragments. Unlike Lightning-2 Sepia’s approach does not produce visible artifacts and can support
non-commutative operators, of which Porter-Duff operators are an example. The Metabuffer [bla00] is a more feature-rich approach to building a scalable display subsystem, with support for independent viewports that span tile boundaries and have independent multi-resolution scaling. Like Lightning-2 the Metabuffer is built on a half duplex mesh topology. In this paper we demonstrate how to implement pipelined blending operations in the Sepia architecture using a full-duplex multi-stage network. All three of these projects (Lightning-2, Metabuffer, Sepia) transfer pixels over a network and are complementary to coder-decoder approaches like WireGL [hum00] that use the network to distribute graphical primitives. Both of these architectural approaches are bandwidth limited in current generation components but will become practical in high bandwidth commodity technologies like HyperTransport and InfiniBand. According to the taxonomy introduced by Molnar [mol94] the Sepia architecture is compatible with sortfirst or sort-last approaches to building graphics systems. Our main interest has been in sort-last approaches due to their scaling advantages. The state of the art in algorithm development for both approaches is represented by companion papers in these symposium proceedings.
Figure 1. Shear-warp volume rendering (top), parallel clustering (bottom, left), Sepia-2 block diagram (bottom, right). Object-order ray casting provides optimal data throughput but results in a distorted image when the view vector is not axis-aligned with the data volume (top). In standard shear-warp rendering the distorted Base Plane (BP) is corrected by a 2D warp on an OpenGL accelerator. This approach can be applied in parallel with concurrent subvolume rendering, followed by blending and tiling (bottom, left) and then a final 2D warp. The result of tiling a set of BPs is a Superimage Base Plane (SBP). Blending is performed by the Sepia-2 card (bottom right) which merges a 30 Hz series of locally computed SBPs with SBPs arriving and departing through the network interface.
3. TeraVoxel rendering with shear-warp viewing factorization Our scientific interest in this problem is driven by two projects in large data visualization funded by the National Science Foundation. The TeraVoxel project is constructing an electronic flow visualization system to capture snapshots of physical fluid experiments in real time followed by interactive visualization and exploration. The Kilo-Frame per Second (KFS) camera collects regularly gridded 10242 cross sectional samples of a fluid pressure field, corresponding to data a rate of one gigavoxel per second. Our current goal is to visualize one second (1024 samples) of this data interactively. Another scientific interest, and a focus of the Large Data Visualization initiative, is interactive rendering for full-resolution MRI data sets. Current MRI data sets of one quarter to one half gigavoxel are typically rendered as 2563 reduced data sets. Both MRI systems and the KFS camera produce rectilinear data sets and can be supported by this demonstration.
Object-order ray casting makes efficient use of general purpose CPUs by organizing the ray casting computation for optimal data access. The same principle applies to a dedicated hardware ray casting engine like the VolumePro. In both cases a shear-warp factorization of the viewing matrix allows efficient image generation regardless of viewpoint [lac94]. Leaving aside issues of scaling and axis reordering, the shear matrix S defines a hexagonal projection of the volumetric data onto a volumetric axis-aligned Base Plane (BP). The warp matrix W corrects distortion in the Base Plane image that results from misalignment with the image plane, and is defined in terms of the standard modelview matrix M as W = M x S –1. This warp is a 2D operation and is usually computed by mapping the BP image onto a polygon as an OpenGL texture. Parallelizing this algorithm is straightforward. Data is mapped statically onto rendering servers (computers equipped with VolumePro cards) which all generate images simultaneously. When the images are ready they are merged in a view-dependent order using associative blending operators. The final tiled and blended result is corrected by the 2D warp just as in the serial algorithm.
3.1 Blending arithmetic The ray casting computation blends the successive contributions of an ordered set of voxels that intersect a line of sight. If V(acc, s) corresponds to a blending operator that combines a new voxel sample s with a total accumulated result accT then the result of a front to back ray casting computation is accT = V(V(…V(V(0, s1), s2)…), sn)
(1)
where s1… sn are successive voxel samples and 0 represents the contribution from a transparent voxel. VolumePro [pfi99] implements the following front-to-back blending operator V for all color channels C (where α is the opacity channel, Cacc the accumulated color, αs the opacity of sample s, etc.) Cacc