1 Introduction. Software implementing scientific algorithms is of- ... lems with corresponding synchronization difficul- ties. ... Graph [4, 5], the TAU environment [8] and Ari- ... system which allows users of a large class of paral- ... interactively pan and zoom the forest â pressing the .... 11th International Parallel Processing Sym-.
PARV IS: Visualizing Distributed Dynamic Partitioning Algorithms Karlis Kaugars, Rodger Zanny and Elise de Doncker Department of Computer Science Western Michigan University Kalamazoo MI 49008
Abstract PARV IS is a visualization system for distributed, adaptive partitioning algorithms. It allows data–driven examination of the behavior of the adaptive algorithm, even for large and complex problems. For the algorithm developers it supports the analysis of load balancing techniques, subregion error patterns, rate of algorithm convergence for specific function classes, and parallel work anomaly. It allows end users to visualize the subdivision trees representing the adaptive task partitioning, thereby supplying graphical support for an intuitive understanding of the distributed computation.
Keywords: Visualization; Interactive Visual Analysis; Distributed Task Pool; Adaptive Task Partitioning
1
Introduction
Software implementing scientific algorithms is often complex and difficult to understand — especially for end users from disciplines other than computer science. For the end user it is thus helpful to view the behavior of the algorithm for the problem at hand, to verify that the problem is posed appropriately, the inputs are entered properly and for assessing the algorithm’s efficiency (or its lack of it). This is particularly useful when a software package supplies what seems like more than one viable method to accomplish the same task. Scientific algorithms implemented for parallel architectures compound the difficulty by introducing concurrent, independent evaluation of subproblems with corresponding synchronization difficulties. While good parallel solutions can dramatically decrease the time taken to solve a particular problem, end users often have less confidence in a
parallel solution than the corresponding uniprocessor solution. There are many well-known systems for visualizing and debugging distributed parallel program performance – among the best known are ParaGraph [4, 5], the TAU environment [8] and Ariane [7, 10]. Programmer–oriented visualizations have been incorporated into several parallel programming systems such as XPVM [6] and AIMS [11]. Visualizations geared to programmers do not necessarily help end users understand computations or gain confidence in the results of such computations. The display of communications patterns among the processors or the idle/busy times of individual processors helps programmers improve overall efficiency, but does little to help nonprogrammers understand the results of the computation. What end users require is a high-level overview which yields an intuitive understanding of the computation. The utility of an overview visual display is not limited to the end users. Developers can utilize these types of displays to confirm their own understanding of the algorithm, test new approaches to solving the problem, or extend their understanding of the datasets used to test the program. Following the approach of interactive visual exploration [3], we are developing a visualization system which allows users of a large class of parallel programs to gain an intuitive understanding of program behavior and result validity without examining the code. We do so by focusing the user on the data rather than the process of computation, allowing the user to intuitively understand the re-
Figure 1: The forest display. The far left of the display surface is devoted to the presentation of run details and a zoom indicator window. The center panel shows the forest of region evaluations and the right panel contains the legend area. In this image the regions are colored by processor. sults and gain confidence in those results.
1.1 Problem Domain We target algorithms that are automatic, adaptive and distributed. Automatic computations terminate when the user-specified accuracy or error tolerance is attained. The term distributed simply refers to the parallel execution of the algorithm. In an adaptive algorithm, the course of the computations depends on the problem instance at hand. During execution, the initial problem domain is continually subdivided in half to refine the answer. This process can be modeled by a region subdivision tree [9]. Each node in the tree corresponds to a subregion that was evaluated during the execution of the algorithm. The root is the initial region; every other node has a parent node corresponding to the region from which it was formed and either zero or two children.
Automatic, adaptive algorithms are used in a variety of problem domains, such as ray tracing, function approximation, optimization, numerical integration, and finite element analysis. We have experimented with the visualization of these region subdivision trees in the domain of distributed numerical integration. Our integration package is PAR I NT 1.0 [1]. Further infrmation about the PAR I NT software project may be found at http://www.cs.wmich.edu/parint.
2
The PARV IS Tool
The PARV IS tool is a post-mortem event viewing prototype. The PAR I NT system can be instructed to track significant events during the computation of integrals, logging these events to local disk on a per-process basis. After a complete computation, a second parallel program collects the local log files
and assembles them into a coherent log of the execution. Logging events locally and performing all visualization after execution minimizes the perturbation applied to the computation to be visualized.
2.1 Region Trees Regions of the computation are commonly generated by the subdivision of larger regions into smaller ones for the purpose of reducing the error associated with a computation. Regions with the highest estimated error are split and a new estimated error value is computed for both of the derived regions. Each process of the computation is initially seeded with a region, and therefore the set of generated regions forms a forest of binary trees. The PARV IS tool allows the user to view this forest — a simple tree layout algorithm positions the full set of regions and the resulting diagram is scaled to fit the available display space. Figures 1 and 2 show a region forest resulting from computing the value of the four–dimensional integral Z 1
D
(x0 + x1 + x2 + x3 )2
d~ x;
where D is the four-dimensional unit hypercube. This function has a singularity at the origin of the hypercube. Figure 1 shows the complete interface, while Figure 2 only includes the region display area and legend areas. The forest display commonly contains between 200 and 10,000 regions. In order to display the entire forest, the region displays are almost inevitably scaled to the point where each region is represented by at most a few pixels (subject to a minimum size constraint, usually 3x5 pixels). These few pixels are still a scaled version of the region displays. While size constraints at these scales preclude the display of individual data values, PARV IS still attempts to display useful information by encoding one of several region attributes in the color of the node. The forest display gives an overview of the computation, which in and of itself has proved to be a valuable tool for understanding the activities of the algorithm. In order to be truly useful, the interface must also provide the user with the details of computation, allowing access to the contents of
Figure 2: Two views of estimated error values. The upper image shows the initial view of estimated errors, the lower shows the display after the user has zoomed in on both the forest and the legend. a region. The interface therefore allows the user to interactively pan and zoom the forest – pressing the left mouse button zooms in on the forest, pressing the right button zooms out and moving the mouse with the middle button pressed pans the image in the direction of mouse movement. The sequence of images in Figure 3 show three steps in the zooming process. All three images are displaying the computation of the same integral as before. The upper left image is taken shortly after the user begins zooming, the upper right image is taken during the middle of the zoom and the lower
image is taken at full resolution. The user may color the nodes of the forest using one of three region attributes: the estimated error value, the time of region evaluation or the process computing the region. The selection of coloring necessarily implies the need for a legend; a control area to the right of the main display allows the user to select the coloring type and shows the scale of colors currently in use. The selected coloring is applied as a background color to the region display. At full size, the region image retains the background color along the edges of the display and between visible data values. As the size of a region display is reduced, the attribute coloring occupies increasing amounts of space, until at the smallest scales, the only visible attribute of the region is the selected coloring. Categorical values are static, but continuous color scales in PARV IS are implemented as linear mappings under user control. By selecting a portion of the legend, the user zooms in on the color scale, applying the full ROYGBIV scale to a smaller range of values. The color legend interactively changes to reflect the new values and colors all nodes which no longer fall within the range of the color scale gray. The user may zoom out by clicking the right mouse button. Figure 2 shows two views of the forest: the upper is the default forest view, while in the lower image the user has partially zoomed both the legend and the forest, accessing the details of the computation. The PARV IS software is written in C++ for the Linux operating system. On a 400 MHz. Pentium III processor, the system can maintain smooth zoom response for forests of up to 10,000 nodes. When viewing trees containing around 30,000 regions, the redraw pause between zoom steps becomes quite noticeable.
3
Using PARV IS
PARV IS highlights several algorithmic behaviors without any examination of the PAR I NT code. The overview forest in Figure 1 immediately emphasizes the point singularity occurring at the origin of the integrated function. It is clear from the shape of
Figure 3: Zooming in on the region forest. the forest where the greatest amount of subdividing occurs. This type of behavior is characteristic for a point singularity. If the accuracy obtained for the problem is not deemed satisfactory, the user may at this point decide to resort to another numerical method, tailored to deal with this type of problem (such as extrapolation [2]), or apply the adaptive method after a transformation of the integral to alleviate the singularity. The two images in Figure 2 show the same forest as Figure 1, except that these images are colored according to the estimated error values of each region instead of the process evaluating the region. The zoomed view of Figure 2 gives additional information about the nature of the subdivisions taking place within the computation. Although difficult to discern in the gray scale format, comparing the light gray region near the top of the figure with the two children shows that the region division was accomplished by splitting the third variable of the function. Furthermore, the direction of subdivision alternates between the four directions as expected. Comparing the shades of the two children shows that the larger estimated error was in the left child, even though the error value labels are too small to read.
3.1 Visualizing Load Balancing The image in Figure 1 encodes the process evaluating the region as the background color (gray scale in this publication). Another important feature of this computation in the presence of singularities emerges: the “thrashing” behavior of load balancing. The majority of work in this case occurs in leftmost tree of the forest. This tree is initially “owned” by the “Controller” process. The other processes quickly complete their assigned tasks and offer to help. In the top part of the tree, this procedure appears to work fairly well, but eventually “Worker 1” is given the region containing the singularity. The region containing the singularity is then passed to “Worker 2” and eventually to “Worker 3”. At each transfer, work on the important part of the initial region is delayed by the inherent messaging latency. Although the presence of a point singularity at the origin causes “thrashing”, there are situations in which the load balancing works well. The annotated images in Figure 4 show region subdivision trees generated while solving a radial point singularity function using four workers in the upper image and five workers in the lower image. The function is f (x; y )
=
((x
1 2 0:5) + (y
0:5)2 )0:95
:
Note that the function has a singularity at (x; y ) = (0:5; 0:5). When 4 workers solve this problem, the initial region (the 2D unit square) is divided into 4 subsquares with a common point at the singularity. This allows each worker to immediately begin working on a region that includes the singularity. Each worker is kept busy doing important work, and little load balancing needs to be done. When 5 workers are used, PAR I NT does the initial division differently, handing each worker a vertical slice of the initial region (this is how initial region divisions are done in PAR I NT 1.0; future releases will use better techniques). This “gives” the singularity to just a single worker. In Figure 4, the very small trees (there are actually 2 in the upper left corner and 2 in the upper right corner of the figure) represent the initial work done by the workers without the singularity. They quickly go idle
Figure 4: Load balancing. The top image shows a run with four processors, the bottom image shows the same function run with five processors. Both images have been annotated to show the active processor. on these unimportant regions, but, as shown by the figure, the load balancing mechanism is able to reassign important regions to all of the workers. The coloring of regions by worker will be crucial in qualitatively evaluating new load balancing techniques with which we plan to experiment. Figure 5 shows another feature of PARV IS: the
Figure 5: Region subdivision view. ability to view the actual region subdivisions over the initial hyperrectangular region. The regions are shown for the integration represented in Figure 4. The two axes are the x and y axes; the cluster of small regions around the center clearly show the concentration of work near the singularity.
3.2 Visualizing Braking Loss The two images in Figure 6 show region subdivision trees for the solving of an oscillatory function using 1 and 4 workers, respectively. The function solved is f (x0 ; x1 ; x2 ; x3 ; x4 )
= os(x0 +x1 +x2 +x3 +x4 ):
It is interesting to note here that the parallel subregion tree is much bushier than the sequential tree. This represents additional, unnecessary work, and represents braking loss (as in a “loss” of efficiency). Braking loss is defined as the unnecessary work performed by workers in the time after the moment when some worker performs the last needed piece of work, and before the integral controller tells all the workers to stop working. With a quick to calculate function (as in this example), the number of unneeded region evaluations due to braking loss can be high.
Figure 6: Braking loss. The top image shows a run with a single processor while the lower image shows a run with four processors.
4
Future Work
The present version of PARV IS is a stand-alone application and can be executed in conjunction with PAR I NT. We plan on interfacing PARV IS with the current GUI of PAR I NT, which allows the user to interactively specify integration functions and parameters. The PARV IS prototype can be adapted for oth-
er applications. In a progressive radiosity application, the forest will represent the geometric subdivision of the scene into patches annotated with the information on their location and brightness. For branch-and-bound methods, the part of the state space tree generated by the strategy will be displayed, where each node is annotated with its problem information including the calculated lower bound. In terms of usability, the pan and zoom interface implemented here is not the optimal method for browsing the forest displays. As the user zooms in, the tree layout remains fixed, which can lead to situations where the only the selected node is visible. We are currently investigating alternate display methodologies which could potentially avoid this problem.
References [1]
[2]
D ONCKER , E., G UPTA , A., BALL , J., E ALY, P., AND G ENZ , A. PAR I NT: A software package for parallel integration. In 10th ACM International Conference on Supercomputing (1996), Kluwer Academic Publishers, pp. 149–156. DE
D ONCKER , E., G UPTA , A., Z ANNY, R., M AILE , J. Extrapolation in distributed adaptive integration. In Proceedings of the International Conference on High Performance Computing (HiPC ’98) (1998). DE
AND
[3] H ART, D., K RAEMER , E., AND ROMAN , G.-C. Interactive visual exploration of distributed computations. In Proceedings of 11th International Parallel Processing Symposium (Apr 1997), pp. 121–127. [4] H EATH , M. T. Recent developments and case studies in performance visualization using PARAGRAPH. In Performance Measurement and Visualization of Parallel Systems, G. Haring and G. Kotsis, Eds. Elsevier Science Publishers, 1990, pp. 175–200. [5] H EATH , M. T., AND E THERIDGE , J. A. Visualizing the performance of parallel pro-
grams. IEEE Software 8, 5 (Sept. 1991), 29– 39. [6] KOHL , J. A., AND G EIST, G. A. The PVM 3.4 Tracing Facility and XPVM 1.1. Oak Ridge National Laboratory, 1995. [7] K UNDU , J., AND C UNY, J. E. A scalable, visual interface for debugging with eventbased behavioral abstraction. In Frontiers of Massively Parallel Computing ’95 (1995), pp. 472–479. [8] M OHR , B., M ALONY, A., AND C UNY, J. TAU. In Parallel Programming using C++, G. Wilson, Ed. MIT Press, 1996. [9] R ICE , J. R. A metalgorithm for adaptive quadrature. Journal of the Association for Computing Machinery 22 (1975), 61–82. [10] S HENDE , S., C UNY, J., H ANSEN , L., K UN DU , J., M C L AUGHRY, S., AND W OLF, O. Event and state-based debugging in TAU: A prototype. In Proceedings of SPDT’96: SIGMETRICS Symposium on Parallel and Distributed Tools (1996), pp. 21–30. [11] YAN , J. C., S ARUKKAI, S. R., AND M EHRA , P. Performance measurement, visualization and modeling of parallel and distributed programs using the AIMS toolkit. Software Practice & Experience 25, 4 (Apr 1995), 429–461.