A Pattern Language for Parallel Application Programming - CiteSeerX

6 downloads 77701 Views 83KB Size Report

[PS]A Pattern Language for Parallel Application Programming - CiteSeerX

https://www.researchgate.net/...Application_Programming/.../A-Pattern-Language-for-...
by BL Massingill - ‎Cited by 59 - ‎Related articles
solve design problems are collected into structured hierarchical catalogs called pattern languages. A. pattern language helps guide programmers through the whole process of application design and. development. We believe this approach can be usefully applied to parallel computing; that is, that we can make.

A Pattern Language for Parallel Application Programming Berna L. Massingill, Timothy G. Mattson, Beverly A. Sanders UF CISE Technical Report 99-009

Abstract Parallel computing has failed to attract significant numbers of programmers outside the specialized world of supercomputing. This is due in part to the fact that despite the best efforts of many researchers, parallel programming environments are so hard to use that most programmers won’t risk using them. We believe the major cause for this state of affairs is that most parallel programming environments focus on how concurrency is implemented rather than on how concurrency is exploited in a program design. In other words, bringing parallel computing to a broader audience requires solving the problem at the algorithm design level. For software development in general, the use of design patterns has emerged as an effective way to help programmers design high-quality software. To be most useful, patterns that work together to solve design problems are collected into structured hierarchical catalogs called pattern languages. A pattern language helps guide programmers through the whole process of application design and development. We believe this approach can be usefully applied to parallel computing; that is, that we can make parallel programming attractive to general-purpose professional programmers by providing them with a pattern language for parallel application programming. In this paper we provide an early look at our ongoing research into producing such a pattern language. We describe its overall structure, present a complete example of one of our patterns, and show how it can be used to develop a parallel global optimization application.

Introduction Parallel hardware has been available for some decades now and is becoming increasingly mainstream. Software that takes advantage of these machines, however, is much rarer, largely because of the widespread and justified belief that such software is too difficult to write. Most research to date on making parallel machines easier to use has focused on the creation of parallel programming environments that hide the details of a particular computer system from the programmer, making it possible to write portable software; a good example of this sort of programming environment is MPI [MPI]. These environments have been successful overall in meeting the needs of the high-performance computing (HPC) community. Outside the HPC community, however, only a small fraction of programmers would even consider writing parallel software, so research into parallel programming environments cannot be considered a general success, and indeed this research seems to focus on the HPC community at the expense of the rest of the programming world.

-1-

The reasons for this state of affairs are complex. We believe the fault lies with the parallel programming environments, but we do not believe that they can be fixed with yet another high-level abstraction of a parallel computer. To get professional programmers to write parallel software, we must give them a programming environment that helps them understand and express the concurrency in an algorithm. In other words, we need to solve this problem at the level of algorithm design. At the same time, theoretical computer scientists have developed a substantial body of work that can be used to formally demonstrate correctness of parallel programs. Given the difficulty of debugging parallel programs, it would clearly be of benefit to programmers to help them avoid bugs in the first place, and what has been learned by the theoreticians could be of help in that regard --- if it can be presented in a way that is accessible to professional programmers, which so far has not generally been the case. Helping programmers design high-quality software is a general problem in software engineering. Currently, one of the most popular solutions is based on design patterns [Gamma95]. A design pattern is a carefully-written solution to a recurring software design problem. These patterns are usually organized into a hierarchical catalog of patterns. Software designers can use such a pattern catalog to guide themselves from problem specifications to complete designs. This collection of patterns is called a pattern language. An effective pattern language goes beyond the patterns themselves and provides guidance to the designer about how to use the patterns. Design patterns have had a major impact on modern software design. We believe the same approach can be used to help programmers write parallel software. That is, we believe that we can attract the general-purpose professional programmer to parallel computing if we give them a pattern language for parallel application programming. We also believe that such a pattern language can incorporate some of what has been learned by the theoretical community in a way that benefits our target audience. In this paper, we provide an early view of our ongoing research ([Massingill99]) into producing a pattern language for parallel application programming. We begin by considering related work on skeletons, frameworks, and archetypes. We then give an overview of our pattern language and its overall structure. Finally, we show how our pattern language can be used to write a parallel global optimization application. In the appendices, we describe the notation we use for expressing patterns and present an example design pattern.

Previous work Considerable work has been done in identifying and exploiting patterns to facilitate software development, on levels ranging from overall program structure to detailed design. The common theme of all of this work is that of identifying a pattern that captures some aspect of effective program design and/or implementation and then reusing this design or implementation in many applications.

Program skeletons Algorithmic skeletons, first described in [Cole89], capture very high-level patterns; they are frequently envisaged as higher-order functions that provide overall structure for applications, with each application supplying lower-level code specific to the application. The emphasis is on design reuse, although work has been done on implementing program skeletons.

-2-

Program frameworks Program frameworks ([Coplien95]) similarly address overall program organization, but they tend to be more detailed and domain-specific, and they provide a range of low-level functions and emphasize reuse of code as well as design.

Design patterns and pattern languages Design patterns, in contrast to skeletons and frameworks, can address design problems at many levels, though most existing patterns address fairly low-level design issues. They also emphasize reuse of design rather than reuse of code; indeed, they typically provide only a description of the pattern together with advice on its application. One of the most immediately useful features of the design patterns approach is simply that it gives program designers and developers a common vocabulary, since each pattern has a name. Early work on patterns (e.g., [Gamma95]) dealt mostly with object-oriented sequential programming, but more recent work ([Schmidt93]) addresses parallel programming as well, though mostly at a fairly low level. Work explicitly aimed at organizing patterns into pattern languages seems to be less common though not unknown, as some of the examples referenced at [Patterns98] indicate.

Programming archetypes Programming archetypes ([Chandy94], [Massingill96]) combine elements of all of the above categories: They capture common computational and structural elements at a high level, but they also provide a basis for implementations that include both high-level frameworks and low-level code libraries. A parallel programming archetype combines a computational pattern with a parallelization strategy; this combined pattern can serve as a basis both for designing and reasoning about programs (as a design pattern does) and for code skeletons and libraries (as a framework does). Archetypes do not, however, directly address the question of how to choose an appropriate archetype for a particular problem.

Our pattern language A design pattern is a solution to a problem in a context. We find design patterns by looking at high-quality solutions to related problems; the design pattern is written down in a systematic way and captures the common elements that distinguish good solutions from poor ones. We can design complex systems in terms of patterns. At each decision point, the designer selects the appropriate pattern from a pattern catalog. Each pattern leads to other patterns, resulting in a final design in terms of a web of patterns. A structured catalog of patterns that supports this design style is called a pattern language. A pattern language is more than just a catalog of patterns; it embodies a design methodology and provides domain-specific advice to the application designer. (Despite the overlapping terminology, however, note that our pattern language is not a programming language but rather a structured collection of language-independent patterns.) The pattern language described in this document is intended for application programmers who want to design new programs for execution on parallel computers. The top-level patterns help the designer decide how to decompose a problem into components that can run concurrently. The lower-level patterns help the designer express this concurrency in terms of parallel algorithms and finally at the lowest level in terms of

-3-

the parallel environment’s primitives. These lower levels of our pattern language correspond to entities not usually considered to be patterns. We include them in our pattern language, however, so that the language is better able to provide a complete path from problem description to code, and we describe them using our pattern notation in order to provide a consistent interface for the programmer. Our pattern language addresses only concurrency; it is meant to complement rather than replace other pattern-based design systems. Before programmers can work with this pattern language, they must understand the core objects, mathematics, and abstract algorithms involved with their problems. In this pattern language, we refer to design activities that take place in the problem domain and outside of this pattern language as reasoning in the problem space. The pattern language will use the results from problem space reasoning, but it will not help the programmer carry it out. Before plunging into the structure of our pattern language, we offer one additional observation. In addition to facilitating reuse of code and design, patterns allow for a certain amount of "proof reuse": A pattern can include a careful specification for its application-specific parts, such that a use of the pattern that meets that specification is known to be correct by virtue of a proof done once for the pattern. Patterns that include implementation can also have their implemented parts proved correct. Thus, patterns can serve as a vehicle for making some results from the theoretical community accessible and useful for programmers; our pattern language takes advantage of this.

Structure of our pattern language Our pattern language is organized around four core design spaces arranged in a linear hierarchy. The higher-level spaces correspond to an abstract view of parallel programming, independent of particular target environments; lower-level spaces are more implementation-specific, with the lowest-level space a pattern-based description of the low-level building blocks used to construct parallel programs in particular target environments. To use our pattern language, application designers begin at the top level and work their way down, transforming problem descriptions into parallel algorithms and finally into parallel code. The contents of the two top-level design spaces are relatively mature. The lower two levels, however, are still under development and will almost certainly change as we work through more examples using our pattern language.

The FindingConcurrency design space This space is concerned with exposing exploitable concurrency. That is, the patterns in this space help programmers decompose their problems into concurrently-executable units of work. The issues addressed at this level are at a high level and are well removed from specific implementation issues. Figure 1 shows the patterns in this design space and how they would be used by an algorithms designer or programmer.

-4-

Figure 1: This figure lists the patterns in the FindingConcurrency design space. Starting at the top, the arrows show how a designer would work through the patterns as they define the exploitable concurrency in a software design. The path through this space begins with the GettingStarted pattern, which tells the designer how to organize the information he or she has about the problem under consideration. The next step is to decompose the problem into relatively independent units. There are two parts to this analysis, an analysis by task (TaskDecomposition) and an analysis by data (DataDecomposition). As the double arrows suggest, the designer may need to cycle between these two decompositions before a final decomposition is found. Once a problem has been decomposed, the designer must decide how the tasks depend on each other. This is done by cycling between three patterns. The GroupTasks pattern helps the designer collect tasks into groups whose elements must execute concurrently. The OrderTasks pattern then helps order tasks (or task groups) based on data or temporal dependencies. Finally, the DataSharing pattern helps identify how data must be shared among tasks. The last stop in this design space is the CoordinationFramework pattern, which is really just a consolidation point to make sure the designer has correctly used the patterns in this design space.

The AlgorithmStructure design space This design space is concerned with the high-level structure of the parallel algorithm and overall strategies for exploiting concurrency. The designer working in this space reasons about how to use the concurrency exposed in the FindingConcurrency design space. The patterns in this space are organized by means of a decision tree, as shown in Figure 2, that helps lead the designer to the patterns that apply to his or her problem.

-5-

Figure 2: The terminal patterns in this figure are the ones used in an algorithm design. The decision tree in this figure shows the reasoning process a designer would use to arrive at a particular terminal pattern. The tree in Figure 2 has three types of boxes: intermediate patterns, decision points, and terminal patterns. The intermediate patterns help the designer navigate through the design space. The decision point boxes represent decisions made about the concurrency within the intermediate patterns. Finally, terminal patterns correspond to patterns that will be used in the parallel algorithm. Starting at the top of the tree, the designer first chooses a top-level organization. There are three choices at this point, resulting in three major branches to the tree: Concurrency can be organized in terms of the temporal ordering of the task groups, in terms of the problem’s task decomposition, or in terms of the problem’s data decomposition. The first major branch of the tree, the one below the "OrganizeByOrdering" pattern, has two main subbranches that represent two choices about how the ordering is structured. One choice represents "regular" orderings that do not change during the algorithm; the other represents "irregular" orderings that are more dynamic and unpredictable. These two choices correspond to two terminal patterns:

-6-

PipelineProcessing: The problem is decomposed into an ordered set of tasks connected by data dependencies. AsynchronousComposition: The problem is decomposed into groups of tasks that interact through asynchronous events. The second branch ("OrderByTasks"), containing patterns based on the task decomposition, is the most complex branch of the tree. Two choices are available to the designer, based on whether the tasks are structured as linear collections of tasks or recursively. The first subbranch (which we call Partitioning patterns) in turn splits into two branches, one in which the tasks execute independently (the EmbarrassinglyParallel pattern) and one in which the there are dependencies among tasks. We further split the latter branch based on whether the dependencies can be pulled out of the concurrent execution (the separable case) or whether they must be explicitly managed within the concurrent execution (the inseparable case). This leads to the following terminal patterns: EmbarrassinglyParallel: The problem is decomposed into a set of independent tasks. Most algorithms based on task queues and random sampling are instances of this pattern. Reduction: The parallelism is expressed by splitting up tasks among units of execution (threads or processes). Any dependencies between tasks can be pulled outside the concurrent execution by replicating the data prior to the concurrent execution and then reducing the replicated data after the concurrent execution. This pattern works when variables involved in data dependencies are written but not subsequently read during the concurrent execution. SharedMemory: The parallelism is expressed by splitting up tasks among units of execution. In this case, however, variables involved in data dependencies are both read and written during the concurrent execution and thus cannot be pulled outside the concurrent execution but must be managed during the concurrent execution of the tasks. Moving back up the tree of Figure 2 to the branch corresponding to recursive task structures, we have two cases to consider: BalancedTree: The problem (of size on the order of n to the k power) is mapped onto a balanced n-ary tree, and processing occurs to and from the root in k stages. DivideAndConquer: The problem is solved by recursively dividing it into subproblems, solving each subproblem independently, and then recombining the subsolutions into a solution to the original problem. Again moving back up the tree, we look at the third major branch ("OrderByData"), which corresponds to problems in which the concurrency is best understood in terms of data decomposition. Here there are two cases, corresponding to linear and recursive data structures: GeometricDecomposition: The problem space is decomposed into discrete subspaces; the problem is then solved by computing solutions for the subspaces, with interaction largely taking place at subspace boundaries. Many instances of this pattern can be found in scientific computing, where it is useful in parallelizing grid-based computations, for example. ReferenceFollowing: The problem is defined in terms of following links through a recursive data structure.

-7-

By working through this tree until one or more terminal patterns are reached, the designer decides how to structure a parallel algorithm to exploit the concurrency in his or her problem.

The SupportingStructures design space This design space is concerned with lower-level constructs used to implement patterns in the AlgorithmStructure space. There are two groups of patterns in this space: those that represent program-structuring constructs (such as SPMD) and those that represent commonly-used shared data structures (such as SharedQueue). We observe again that many of the elements of this design space are not usually thought of as patterns, and indeed some of them (SharedQueue, for example) would ideally be provided to the programmer as part of a framework of reusable software components. We nevertheless document them as patterns, for two reasons: First, as noted earlier, documenting all the elements of our pattern language as patterns provides a consistent notation for the programmer. Second, the existence of such pattern descriptions of components provides guidance for programmers who might need to create their own implementations. As mentioned earlier, we expect the contents of this design space to change as we gain experience with the pattern language. In particular, the list of shared data structures is incomplete; eventually, it should expand to include all major shared data structures. The following patterns represent common program structures: SPMD (single program, multiple data): The computation consists of n processes or threads executing in parallel. All n processes or threads execute the same program code, but each operates on its own set of data. ForkJoin: A main process or thread forks off some number of other processes or threads that then continue in parallel to accomplish some portion of the overall work before rejoining the main process or thread. Programs that make use of this pattern often compose multiple instances of it in sequence. MasterWorker: The program is structured in terms of a master thread of execution and some number of identical workers. The following patterns represent common shared data structures: SharedQueue: This pattern represents a "thread-safe" implementation of the familiar queue abstract data type (ADT), that is, an implementation of the queue ADT that maintains the correct semantics even when used by concurrently-executing processes or threads. SharedCounter: This pattern, like the previous one, represents a "thread-safe" implementation of a familiar abstract data type, in this case a counter with an integer value and increment and decrement operations. DistributedArray: This pattern represents a class of data structures often found in parallel scientific computing, namely arrays of one or more dimensions that have been decomposed into subarrays and distributed among processes or threads.

-8-

The ImplementationMechanisms design space This design space is concerned with the low-level details of how tasks coordinate their behavior. We use it to provide pattern-based descriptions of common mechanisms for process/thread management (e.g., creating or destroying processes/threads) and process/thread interaction (e.g., monitors or message-passing). The members of this design space are largely used to implement the patterns from the higher-level design spaces. While the other design spaces are high-level and do not constrain the programmer to a particular programming environment, the patterns in this design space are much closer to the underlying mechanisms used to implement the design. Patterns in this design space, like those in the SupportingStructures space, describe entities that strictly speaking are not patterns at all. As noted previously, however, we include them in our pattern language to provide a complete path from problem description to code, and we document them using our pattern notation for the sake of consistency. The following are examples of patterns in this space. Process management: These constructs manage the creation and destruction of distinct threads of execution. Forall indicates a parallel loop construct that maps loop iterations on threads of execution. Spawn launches an independent thread of execution. Remote procedure call calls a procedure at a remote location. Synchronization: These constructs protect shared resources or bring one or more threads to a known point. Barriers cause a group of threads to pause until all threads can proceed together. Locks provide a mechanism to provide mutual exclusion to protect a resource. Semaphores (binary and counting) are a low-level mechanism used to implement locks. Serialization. These constructs cause multiple threads to serialize their access to sections of code. Critical sections defines sections of code that are accessed by only one thread of execution at one time. Monitors are a high-level construct that define data or code that can be accessed by only one thread at a time. Asynchronous control. These constructs service asynchronous interactions between threads of execution. Future variables are single-assignment variables used to synchronize and pass values to later executions. Active messages are one-sided communication events that launch handlers on the remote location. Asynchronous events (signals, callbacks, etc.) include miscellaneous asynchronous interactions. Communication. These constructs are used to pass information between multiple threads of execution. Send and receive pass explicit messages among threads. Collective communication events involve multiple threads. Reductions are global operations in which replicated data is reduced into a single item.

-9-

As with the SupportingStructures design space, the patterns in the ImplementationMechanisms design space will evolve as we gain experience with our pattern language. The goal is to have a sufficiently complete set of constructs with only minimal overlap of functionality.

Example application: global optimization Problem description As an example of using our pattern language to design a parallel algorithm, we will look at a particular type of global optimization algorithm. This algorithm uses the properties of interval arithmetic [Moore79] to construct a reliable global optimization algorithm. Intervals provide an alternative representation of floating point numbers. Instead of a single floating point number, a real number is represented by a pair of numbers that bound the real number. The arithmetic operations produce interval results that are guaranteed to bound the mathematically "correct" result. This arithmetic system is robust and safe from numerical errors associated with the inevitable rounding that occurs with floating point arithmetic. One can express most functions in terms of intervals to produce interval extensions of the functions. Values of the interval extension are guaranteed to bound the mathematically rigorous values of the function. This fact can be used to define a class of global optimization algorithms [Moore92] that find rigorous global optima. The details go well beyond the scope of this paper. The structure of the algorithm however, can be fully appreciated without understanding the details of interval arithmetic. To make the presentation easier, consider the minimization of an objective function. This function contains a number of parameters that we want to investigate to find the values that yield a minimum value for the function. This problem is complicated by the fact that there may be zero or many sets of such parameter values. A value that is a minimum over some neighborhood may in fact be larger than the values in a nearby neighborhood. We can visualize the problem by associating an axis in a multidimensional space with each variable parameter. A candidate set of parameters defines a box in this multidimensional space. We start with a single box covering the domain of the function. The box is tested to see if it can contain one or more minima. If the box cannot contain a minimum value, we reject the box. If it can contain a minimum value, we split the box into smaller sub-boxes and put them on a list of candidate boxes. (We will refer to this part of the computation as a "test-and-split-box calculation".) We then continue for each box on the list until either there are no remaining boxes or the remaining boxes are sufficiently small. (We will call testing for this condition the "convergence test".) Pseudocode for this algorithm is given in Figure 3. For efficiency reasons we perform the convergence test only after testing and splitting all boxes, and we therefore keep two lists, one for boxes that have not yet been examined and another for the results of any box-splitting operations. int done = FALSE; Interval_box B; List_of_boxes InList, ResultList; InList = Initialize(); While (!done){

- 10 -

While (!empty(InList)) { B = get_next_box(InList); if (has_minima(B)) split_and_put (B, ResultList); } done = convergence_test(ResultList); copy(ResultList, InList); } output(InList);

Figure 3: Pseudocode for the sequential global optimization algorithm.

Parallelization using our pattern language An experienced parallel programmer might immediately see this algorithm as an instance of one of our Partitioning patterns and could thus enter our pattern language at one of those patterns. Our pattern language, however, is targeted at professional programmers with little or no experience with parallel programming, and such programmers will need guidance from the pattern language to arrive at the right pattern. (Space constraints make it impossible to give the full text of all patterns referenced. We refer interested readers to our pattern language in progress at http://www.cise.ufl.edu/research/ParallelPatterns/.)

Using the FindingConcurrency design space The first step is to find the concurrency in the algorithm, using the FindingConcurrency design space. We first use the DecompositionStrategy pattern to help identify relatively independent actions in the problem (the tasks) and to define how data is shared or distributed between the tasks. This pattern and related patterns (TaskDecomposition and DataDecomposition) lead us to recognize that the major source of concurrency in our global optimization problem is the test-and-split-box calculations (i.e., computation of the interval extensions of the objective function for each of the boxes on the list), each of which can constitute a separate task. A second type of potentially concurrent task is the convergence test (the test of the list to see whether it meets the condition for ending the computation). The corresponding data decomposition is relatively straightforward: Each task operates almost exclusively on local data; the only shared data objects are the two lists of boxes (the input list and the results list). The next step is to understand the dependencies between concurrent tasks, using the DependencyAnalysis pattern and its related patterns (GroupTasks, OrderTasks, and DataSharing). For our global optimization problem, there are two well-defined groups of tasks: the tasks to perform the test-and-split-box computations (one per box), and the task to perform the convergence test. Also, the order constraints between these groups are clear: The convergence test must wait until all the test-and-split-box computations have finished. These two task groups must therefore be executed in sequence (composed sequentially), giving us a computation consisting of two phases. Note also that the computation as a whole will likely consist of many iterations of first doing test-and-split-box calculations (computing interval extensions of the objective function for each box) and then checking for convergence, and these iterations also must be executed in sequence. At this point we do not need to decide exactly how

- 11 -

we will meet these constraints; we just need to make a note of them and ensure that they are met as we proceed with the design. Finally, we look at how data is shared between the two groups and within each group. The only data objects shared among tasks are the two lists of boxes. Because of the ordering constraint between the test-and-split-box tasks and the convergence-test task, we do not need to consider data sharing between these two groups of tasks but only among the tasks within each group. Each test-and-split-box task consists of removing one box from the input list, analyzing it, and possibly splitting it and putting the result(s) on the results list. Thus, we potentially have multiple tasks updating these shared data structures simultaneously, and we must be sure the algorithm we design ensures that concurrent updates do not produce wrong results. Again we do not have to decide at this point exactly how we will manage access to these shared data structures; we just need to note the data sharing and ensure that we handle it properly as we proceed. The last stop in the FindingConcurrency design space is the CoordinationFramework pattern, which simply helps us review the design decisions made so far and be sure we are ready to move on to the next design space. Most of the literature concerned with the use of patterns in software engineering associates code, or constructs that will directly map onto code, with each pattern. It is important to appreciate, however, that patterns solve problems, and that these problems are not always directly associated with code. Here, for example, the first set of patterns have not led to any code; instead they have helped the programmer reach an understanding of the best alternatives for structuring a solution to the problem. It is this guidance for the algorithm designer that is missing in most parallel programming environments.

Using the AlgorithmStructure design space At this point, we have determined what the concurrent tasks are, how they depend on each other, and what if any data must be shared between them; that is, we have "defined the exploitable concurrency". We are now ready to create a parallel algorithm that can use this concurrency to produce an effective parallel program, so the next step in the design process is to choose a high-level structure for the algorithm that organizes the concurrency, using the AlgorithmStructure design space. In looking at what we know so far about our optimization problem, we see that the very top level of the algorithm can be structured as a sequential program -- a while loop with each iteration consisting of two phases, a test-and-split-boxes phase and a convergence-test phase (which includes copying the results list back to the input list). All the concurrency in our algorithm will occur during the first of these phases, so we use the patterns of the AlgorithmStructure space to design a structure for this phase. Starting at the top of the decision tree in Figure 2, we choose the branch labeled OrderByTasks because overall a task-based decomposition seems to make the most sense. Note that an inexperienced parallel programmer might not make the best choice on the first pass through the decision tree; a designer may need several passes through the decision tree before an optimal design is produced. A systematic decision tree, however, provides a framework for designers to use in choosing a suitable algorithm structure.

- 12 -

Proceeding down the OrderByTasks branch, we must next decide whether the tasks in the first phase of our algorithm are organized linearly or recursively. When this phase of our algorithm begins, the size of the list of boxes is fixed, so we choose the Linear branch, which brings us to the Partitioning patterns. Next we consider dependencies among the tasks. Aside from their use of the input and results lists (and the input list is used only to be sure that each box is worked on exactly once), the tasks are completely independent. Therefore, we follow the Independent branch in the decision tree and arrive at our terminal pattern, the EmbarrassinglyParallel pattern. We then read through the EmbarrassinglyParallel pattern (a copy of which appears as Appendix B), which guides us through the process of designing our parallel algorithm. Mapping our problem onto the pattern, we see that the independent tasks of the pattern are our test-and-split-box tasks, the task queue is the input list of boxes, and the shared data structure to be used for storing results is the results list of boxes. Reviewing the Restrictions section of the EmbarrassinglyParallel pattern, we see that therefore we should implement both lists as "thread-safe" shared data structures. Once again, an experienced parallel programmer would know this right away, but someone with less experience with parallel programming would need to be guided to this conclusion. At this point we have used our pattern language to decide the following: The program will be structured in terms of two phases, computing interval extensions of the objective function (testing and possibly splitting boxes) and testing for convergence. The major source of productive concurrency is in the first phase (test-and-split-box calculations). To ensure correctness, some data structures should be implemented using thread-safe shared-data-structure patterns. We are now ready to start designing code.

Using the SupportingStructures design space As a first step in designing code, we consult the SharedQueue pattern. Ideally there would be a shared-queue library component that would provide exactly what is needed, a component that can be instantiated to provide a queue with elements of a user-provided abstract data type (in this case a box). If so, the pattern would explain how to use it; if not, the pattern would indicate how the needed shared data structure could be implemented. We next consider program structure. Since the overall program is a sequential structure containing an instance of the EmbarrassinglyParallel pattern, it makes sense to use a fork-join structure (the ForkJoin pattern) for the test-and-split-boxes phase of the computation, with a master process setting up the problem, initializing the task queue, and then forking a number of processes or threads to perform the individual test-and-split-box calculations. Following the join, the master carries out the second phase of the computation: testing for convergence, copying the results list to the input list, and then (if necessary) repeating the whole two-phase cycle until the convergence condition is satisfied. The following figure shows pseudocode for the resulting design. Note that although this pseudocode omits major portions of the program (for example, details of the interval global optimization algorithms), it includes everything relevant to the parallel structure of the program.

- 13 -

#define Nworkers N SharedQueue InList; SharedQueue ResultList; void main() { int done = FALSE; InList = Initialize(); While (!done) { // Create Workers to test boxes on InList and write // boxes that may have global minima to ResultList Fork(Nworkers, worker); // Wait for the join (i.e. until all workers are done) Join(); // Test for completion and copy ResultList to InList done = convergence_test(ResultList); copy(ResultList, InList); } output(InList); } void worker () { Interval_box B; While (!empty(InList)) { B = dequeue(InList); // Use tests from Interval arithmetic to see // if the box can contain minima. If so, split // into sub-boxes and put them on the result list. if (has_minima (B)) split_and_put (B, ResultList); } }

Figure 4: Pseudocode for the parallel global optimization program.

Using the ImplementationMechanisms design space Having arrived at a design, we now consider how we would implement it in a particular programming environment, using the ImplementationMechanisms design space. We envision the programmer choosing a target programming environment and then using the Implementation sections of appropriate patterns to turn the pseudocode of the design into syntax for the chosen programming environment. For example, consider how we would turn the pseudocode of Figure 4 into an OpenMP program, specifically how we would express the fork and join constructs in OpenMP. Recall that we chose these constructs to implement (in pseudocode) the ForkJoin pattern (in the SupportingStructures design space), so we would review the ForkJoin pattern to see how to produce the same result using OpenMP. The ForkJoin pattern would refer us to the Forall pattern in the ImplementationMechanisms design space, which in turn would tell us that it can be implemented using the OpenMP parallel for construct. The result would be the more OpenMP-specific pseudocode shown in Figure 5.

- 14 -

#define Nworkers N SharedQueue InList; SharedQueue ResultList; void main() { int done = FALSE; InList = Initialize(); While (!done) { // Create Workers to test boxes on InList and write // boxes that may have global minima to ResultList #pragma omp parallel for for(int i=0; i