Beyond Arrays | A Container-centric Approach For ... - CiteSeerX

11 downloads 0 Views 229KB Size Report
Aug 3, 1998 - Peng Wu and David Padua. Department ...... 1] William Blume, Rudolf Eigenmann, Jay Hoe inger, David Padua, Paul Petersen and Lawrence.
Beyond Arrays | A Container-centric Approach For Parallelization Of Real-world Symbolic Applications Peng Wu and David Padua Department of Computer Science University of Illinois at Urbana-Champaign pengwu, [email protected] August 3, 1998

Abstract. Parallelization of symbolic applications is dicult and a systematic approach has yet to be developed. In this paper, we introduce the concept container, which refers to any general-purpose aggregate data type, such as matrices, lists, tables, graphs and I/O streams. We propose the containercentric approach, in which containers are treated by the compiler as built-in data types. They are the target of data-parallelism and the focus of program analysis and transformations. We apply the container-centric approach to address the parallelization of symbolic applications. By hand-parallelizing a few real-world symbolic applications with the proposed techniques, we demonstrate that not only is there enough parallelism in symbolic applications, but that such applications exhibit as much regularity as we have observed in array-based applications and are highly amiable for automatic parallelization.

1 Introduction The last decade has witnessed extensive researches on automatic parallelization of numerical array-based applications. With aggressive analysis and transformation techniques[1] [2], state-of-the-art parallelizing compilers now can parallelize some large and real scienti c benchmarks[3][5][4]. However, there is a large set of non-numerical applications, some of which are fairly simple, that still cannot be parallelized by current techniques. The reasons, we believe, are our limited understanding of the characteristics of such applications in terms of their common aggregate data structures, and the consequent lack of e ective parallelization techniques designed accordingly. In this paper, we introduce the concept container, which refers to any general-purpose aggregate data type, such as matrices, lists, tables, graphs and I/O streams. Since containers, as the major storage media of data, are at the center of program data- ow and manipulations of container elements are the primary source of data-parallelism in real applications. We believe that parallelizing compilers have to be designed with full awareness of the characteristics of the underlying containers. We propose the container-centric approach in which containers are treated as intrinsic data types during the parallelization. Not only should the compiler recognize operations of containers, but containers are the target of data-parallelism and the focus of program analysis and transformations. We apply the container-centric approach to address parallelization of symbolic applications. Our 1

work begins with a detailed study of a benchmark suite that contains a few large real-world symbolic applications. We observe a prevalent use of containers, such as lists, stacks and hash tables, in the benchmarks. Operations of containers, even with di erent implementations and across programming languages, are surprisingly similar and, therefore, amiable for abstraction. In the discussion of the container-centric approach, we start by providing an abstraction of containers, abstract container operations which are used to describe the data- ow properties of container operations found in real programs. Then, we propose several key transformation and analysis techniques for the containers we are targeting. We apply by hand these techniques to parallelize some of the benchmarks; among them are javac; jar; javap and javadoc, which are standard utility applications from the Java Developer's Toolkit (JDK) package. Parallelism is exploited at a very coarse granularity, almost all of the major loops of the four applications are parallelizable. The experimental results are very encouraging. To the best of our knowledge, this is the rst work that characterizes real-world symbolic applications in terms of their underlying aggregate data structures and addresses their parallelization in a pragmatic way. The paper is structured as follows. Section 2 gives an overview of containers and the container-centric approach. Section 3 proposes the abstract container operations and the concrete container description technique. Sections 4 and 5 introduce several compiler transformation and analysis techniques. Experimental results are given in section 6. Section 7 compares our work with others. And section 8 summarizes and presents a conclusion.

2 The Container-centric Approach 2.1 Concept of container We de ne any general-purpose aggregate data type as a container[6]. Examples of containers are matrices, lists, stacks, trees, graphs, I/O streams and hash tables. In the paper we focus on two types of containers: linear containers and content-addressable containers. Containers of other types will be studied in the future. We call data types seen in real programs concrete, such as concrete containers and concrete iterators, as opposed to the abstract ones we de ne in section 3.1. A linear container is a container whose elements are accessed by positions within the container in an ordered manner. Commonly seen linear containers are lists, stacks and queues. Linear containers can be addressed through iterators[7], which allows container elements to be accessed in a way similar to how arrays are accessed through indexes. In real applications, iterators together with language constructs, such as for-loops and while-loops, compose common access patterns of linear containers. In a content-addressable container, elements are accessed by keys. Keys of a content-addressable container could be numerical (index-key), alphanumerical (name-key) or pointers (pointer-key). Examples of content-addressable containers are hash tables, sets and maps. Recall that the above classi cation is based on the behavior of the container rather than on its underlying implementation; it is possible that one type of container is implemented by di erent concrete data types. A content-addressable container, for instance, can be implemented as a linked-list, an array, or a binary tree.

2

2.2 Motivation We view programming as a chain of composing simple concept into larger and more complicated ones. Traditionally programmers work directly with instrinsic data types and language constructs which sit at the very bottom of the programming chain. With the thriving of object-oriented programming and the wide availability of general-purpose libraries, modern applications are composed from a much higher abstraction level. Containers such as lists, stacks, hash tables, and I/O streams are common components of today's real-world symbolic applications and are treated by most programmers as if they were intrinsic data types. Common containers are provided by almost every object-oriented language as part of the core class libraries, such as standard template library (STL) for C++ and java foundation classes (JFC) for java. Conventional parallelizing compilers, however, still work at the bottom of the programming chain by focusing on only language-prede ned data types. Such an approach has proved its success on the parallelization of array-based applications, but showed very limited e ect on others. for(element = list.getHead(); list.hasMoreElement(); element = list.nextElement()) { do something on element ...; }

Figure 1: Common loop that iterates a list Containers, sitting at the center of program data- ow and serving as primary source of data parallelism, play the same role in symbolic applications as arrays in numerical ones. However, conventional compilers treat such containers, which are mostly provided by libraries or composed as objects with their own internal states and method interfaces, the same as any other common object. Such unawareness of containers in program analysis and parallelization leads to either too conservative a data- ow analysis, or the elimination of further possibilities to exploit parallelism on such containers. Linear-containers such as lists, for instance, are one of the major sources of data parallelism for symbolic applications. Figure 1 shows a loop that manipulates a list, which we found very common in real applications. The compiler, without any knowledge of such container, will not even choose the above loop as a candidate for further analysis and parallelization. Moreover, even with certain awareness of such containers, parallelization of the above loop still requires container-speci c parallelization techniques to handle, for instance, the dependences between operations list.hasMoreElement() and list.nextElement(). We believe that containers should be made distinguishable to parallelizing compilers, their characteristics studied, manipulation patterns identi ed, and basic operations re ected in the parallelization algorithms.

2.3 About the container-centric approach In a container-centric approach, parallelizing compilers are designed with full knowlege of containers. To meet the goal, we extend the compiler with new data types abstract containers, which are speci ed by their operations abstract container operations. Abstract containers are "meaningful" entities to the compiler and are treated with no less importance and attention as any other intrinsic data type. Similar to the way we handle arrays, common manipulation patterns of abstract containers are studied, target loop patterns are identi ed, foundemantal parallelization techniques such as dependence test, false data dependence elimination and loop parallelization are re-designed to be container-speci c. For example, in 3

our work on linear containers and content-addressable containers, we have provided the dependence test for linear containers, commutativity analysis for content-addressable containers, container privatization, etc. Figure 2 illustrates the compilation steps with a container-aware compiler. Concrete containers are rst "described" using abstract container operations, and then fed into the compiler, together with the rest of the source program, as the inputs. The compiler then takes the abstract-container-based program, performs the container-speci c analysis and transformation techniques, identi es parallelizable sections and nally generates parallelized program. Specification

Parallelization

Concrete Containers Compiler Container Specification

Compiler Designer User

Concrete Container Description

with

Rest of Source Program

knowledge of abstract container

Parallelized Program

Figure 2: Program parallelization using a container-aware compiler We realize that the feasibility of such an approach lies in, greatly, the generality of abstract containers such that compilers based on them can beni t a reasonable large set of applications. Upon such consideration, in the de nition of abstract containers and their operations, only the very core behaviors of such containers should be speci ed. Containers with more complicated semantics can be described through composition. Moreover, in applying the container-centric approach, we rst choose those containers that are more standardly and more commonly used, such as linear containers and content-addressable containers. Such containers, either as library supplied or as user self-de ned, preserve relatively simple data- ow semantics, exhibit surprisingly similar behaviors and most of their basic manipulations are in form of methods, which is quite suitable for description. We also note that, in general, human e orts are necessary in describing concrete containers in terms of abstract ones. However, the process is fairly intuitive and localized. Description of the container can be fully decoupled from the source program using it and are highly reusable. For example, by providing descriptions for STL containers or JFC containers once, all the applications use these standard container libraries, which fortunately is the trend, can beni t from that. We now illustrate the container-centric approach using a simple example: the StringProcessor() as shown in Figure 3(i). StringProcessor() takes a set of strings and reverses them. Due to limited space, only implementation of StringProcessor() is shown in Figure 3(i). In StringProcessor(), there are concrete container V ector and concrete iterator Enumeration. Figure 4 shows the concrete container descriptions of both V ector and Enumeration. Methods with the pre x "abs " are abstract container operations. Although such descriptions look very similar to method declarations, they aim at providing a data- ow description of the methods and are equivalent to the latter only in terms of data- ow. Figure 3(ii) shows the compiler's view of stringProcessor() after methods of Vector and Enumeration being represented by their abstract counterparts. During the parallelization, the compiler recognizes and chooses the for-loop composed by e.abs get begin(), e.abs has more(), and e.abs next(), as the candidate loop for further analysis and transformations. 4

void StringProcessor(Vector strings) { ...

void StringProcessor(Vector strings) { ... for(e=strings.abs_get_begin(); e.abs_has_more(); e = e.abs_next()) { String s = (String)e.abs_access(); String rs = reverse(s); } } (ii) from the compiler's view

for(Enumeration e=strings.elements(); e.hasMoreElements();){ String s = (String)r.nextElement(); String rs= reverse(s); } } (i) original code

Figure 3: A simple example | StringProcessor() Boolean Enumeration::hasMoreElements() { return abs_has_more(); }

Enumeration Vector::elements() { return abs_get_begin(); }

Object Enumeration::nextElement() { Object o = abs_access(); this = this.abs_next(); return o; }

Figure 4: Concrete container description for Enumeration and Vector It, then, applies the data dependence test for linear containers. In the example, assuming that elements of strings are not duplicated and this is provable by the compiler, given the special semantics of e.abs next() and e.abs access(), the DD test will prove that there is no overlapped accesses on the container during the iterations. The loop is then marked as parallelizable, leaving the dependences introduced by manipulations on concrete iterator e to be handled later. During the loop parallelization, we handle the above dependences by distributing container elements into local containers beforehand, then let each thread take its share of data and execute the rest of the code concurrently.

2.4 The benchmark suite We begin our work by studying a few real-world symbolic applications. Since currently few benchmarks are chosen according to containers, we gathered the benchmarks by searching suitable applications on the Internet. As a consequence, the benchmark suite is composed of mostly real-world applications, some of which are fairly large or even in commercial use. For example, javac; jar; javadoc, and javap are standard utility applications from the javasoft JDK 1:1:5 package; guavac is one of most popular thirdparty java compilers; and polaris is a parallelizing compiler for conventional Fortran programs. We chose those that prevalently use linear containers and content-addressable containers. The implementations of the containers are diverse. Some are from standard container libraries; others are self-provided. The applications are coded in one of the three programming languages, C++, C, or Java. The benchmark 5

suite is summarized in Table 1. Column container summarizes the source of the containers in the application. Self-provided containers are indicated as "-" as opposed to standard container libraries. 1 2 3 4 5 6 7 8 9 10 11

bench description from javac a java compiler JDK jar a java zip utility program JDK javadoc html-document generator for java code JDK javap java bytecode disassembler JDK calculator calculator guavac third-party java compiler GNU feedback search engine with relevent feedback MARS polaris Fortran parallelizing compiler Polaris lamtex latext compiler kaduch nite-state machine deltablue incremental data ow constraint solver UCSB Table 1: The Benchmark Suite

lines 29,000 1886+ 31,000 4,600+ 135 33,000 1,480 4,283 1,155 1,509

lang container java JFC java JFC java JFC java JFC java JFC c++ STL c++ STL c++ c++ c++ c -

3 Container Speci cation 3.1 Abstract containers Abstract containers are used by the compiler to represent internally concrete containers. There are several properties that we assume for abstract containers of any type. (1) The structure of the container is fully encapsulated in method interfaces. The structure of the container is de ned as any internal state of the container other than the elements inside. Since the structure of the container is mostly implementation-dependent, assuming such property, generality of abstract containers will not be compromised by the diversities in concrete container implementations. (2) Abstract container operations change only the structure of the container. This property re ects the nature of containers as storage media, and keeps the semantics of abstract containers simple. (3) Except for primitive data types, elements into or out of the container are passed by reference. Parallelizing compilers require exact information about the data- ow of the container. For example, given the following code, the alias between a and b depends on the data- ow semantics of abs push back() and abs pop back(). Namely, it depends on the exact alias relation between the element to be put into the container and the element has been put into the container after the operation. abs_push_back(a); b = abs_pop_back();

6

Concrete containers vary in the de nition of such an alias relation. Some use pass-by-value semantics, while others use pass-by-reference. Still others leave the choice to the programmer; for example, STL uses operator "=" to copy elements into the container and allows the user to overload the "=" operator to de ne the exact copy semantics. We choose pass-by-reference semantics since it most closely re ects the nature of the store operation of containers and can easily describe other semantics as well.

3.2 Abstract container operations Abstract container operations specify the syntax and semantics of abstract containers, the speci cation is shown in Figure 5. Each operation is composed of a name, a return value, and a set of parameters, which is similar to a method declaration. Both the return value and the parameters are declared by abstract types, which enforces the semantics and type constraints of the operation. There are four abstract types that are general for all kinds of abstract containers. ABS void, ABS bool, and ABS int stand for void-type, boolean-type and integer-type respectively. ABS element represents the element of the abstract container, and due to the third assumption we make about abstract container, ABS element has to be of reference type. ABS_void abs_push_back(ABS_element) ABS_void abs_push_front(ABS_element) ABS_element abs_pop_back(ABS_void) ABS_element abs_pop_front(ABS_void)

ABS_elment abs_access(ABS_void) ABS_iterator abs_next(ABS_void) ABS_iterator abs_prev(ABS_void) ABS_bool abs_hasMore(ABS_void)

ABS_element abs_get_back(ABS_void) ABS_element abs_get_front(ABS_void)

(ii) abstract iterator operations

ABS_iterator abs_begin(ABS_void) ABS_iterator abs_end(ABS_void)

ABS_bool abs_put(ABS_key, ABS_element) ABS_element abs_get(ABS_key) ABS_bool abs_remove(ABS_key) ABS_bool abs_contains(ABS_key)

ABS_int abs_size(ABS_void) ABS_bool abs_empty(ABS_void)

(iii) abstract content-addressable container operations

(i) abstract linear container operations

Figure 5: Abstract container operations

3.2.1 Abstract linear container operations An abstract linear container can be viewed as a linear discrete space with two special positions: begin and end. We de ne front and back to be the element at position begin and end respectively. Position of a abstract linear container is represented by abstract iterators. There are two special values of abstract iterators, abs begin iterator and abs end iterator, which point to the begin and end of the container, respectively. For abstract linear containers, the rst six abstract operations declare the update and access behavior of the container on the begin and end positions. Abs begin() returns the abs begin iterator of 7

the current container, while abs end() returns abs end iterator. The meaning of abs empty() and abs size() are self-explained by their names.

3.2.2 Abstract iterator operations Abstract iterator operations provide abstractions of concrete iterators. Each abstract iterator is associated with an abstract linear container . Abs access() gets the container element at the position pointed to by the iterator. Abs hasMore() checks whether the iterator has traversed to the end of the container, while abs next()(abs prev()) will evolve the current iterator to point to the next (previous) position of the container.

3.2.3 Abstract content-addressable container operations Abstract content-addressable container operations specify the basic update and access behaviors of a content-addressable container. Abs put(), Abs remove() and abs get() update and access a container through keys, abs contains() tests whether elements with certain keys are in the container. We assume that keys of a content-addressable container are unique. Keys are represented by abstract type ABS key, which can be of three kinds: ABS index key, ABS name key and ABS pointer key, according to the three types of concrete keys we de ned in previous section.

3.3 Concrete container description Abstract container operations provide the compiler with only the core container operations. Concrete containers, which usually have much more complicated semantics, will be described in terms of abstract container operations. In this sense, abstract container operations are a description language, each of which speci es a data- ow relation between the associated concrete container, the return instance and the actual parameters bound. Figure 6 shows the concrete container description of append() of class List, which appends elements (by reference) of list2 to itself. The concrete container description is very similar to "re-implementing" the method append() by a set of abstract container operations. For example, in a concrete container description we can use any concrete variable visible in the scope, declare instances either as concrete or as abstract, or even invoke abstract operations on concrete instances. In the example, the new abstract instance iterator is declared and the abstract operation abs get begin() is invoked on the concrete instance list2. void List::::append(List* list2) { for(ABS_iterator iterator=list2.abs_get_begin(); iterator.abs_hasMore(); iterator=iterator.abs_next()) { abs_push_back(iterator.abs_access()); } }

Figure 6: list.append() described by abstract container operations 8

There are two important techniques, the abstract operation instantiation and the reverse speci cation, involved in concrete container descriptions.

3.4 Abstract operation instantiation An abstract container operation speci es a data- ow relation by "invoking" the abstract operation on a receiver object, "binding" actual parameters to formal ones, and "assigning" the return value to other instance. This process is called the abstract operation instantiation. In the instantiation, the receiver object, actual parameters, and the return value can be concrete if declared by concrete data types, or abstract if declared by abstract types. In the above example, abstract operation abs get begin() is invoked on concrete container list2, and its return value is assigned to an abstract instance iterator. Abstract operation instantiations are similar to method invocations in that both need to satisfy the type constraints speci ed in the interface. For example, in the abstract operation instantiation abs push back( iterator.abs access()) in the above example, both the formal and the actual parameter are of abstract type ABS Element. However, in some cases, concrete data types can be bound to abstract ones. For example, type int, void, and boolean of the target language can be bound to abstract type ABS int, ABS void, and ABS bool, respectively.

3.5 Reverse speci cation Reverse speci cation is to specify the code patterns that match the semantics of the abstract container operations. Such a speci cation is necessary if some basic container manipulations are not encapsulated in method interfaces. For example, it is very common that concrete iterators do not provide the concrete counterpart for the abstract operation abs hasMore(). Most of them use boolean expressions instead, such as "iterator != null", to test the abs hasMore() condition. Since if abs hasMore() is left unspeci ed, the iterative loops of linear containers will not be recognized, which signi cantly e ects our exploitation of loop-level parallelism on the type of containers. Abs hasMore() has to be speci ed. In this case, instead of using the abstract ones to describe the concrete ones, we use the code pattern to "describe" abstract method abs hasMore(), as shown in Figure 7. And the same as the abstract container instantiation, reverse speci cation is valid only if it satis es the type constaints declared by the interface. ABS_boolean ABS_iterator::abs_hasMore() { return this != null; }

Figure 7: Reverse Speci cation for abs hasMore()

4 Container-based Transformation Techniques In the following two sections, we will propose several compiler transformation and analysis techniques speci cally for linear containers and content-addressable containers. Unless stated otherwise, we use 9

abstract container operations and abstract types directly in the examples, but only for the purpose of illustrating exact semantics and demonstrating the patterns we are targeting.

4.1 Data dependences and loop-level parallelism Parallelizing compilers are enabled primarily by ecient data-dependence analysis to discover independent pieces of computation, as well as transformation techniques to eliminate false data dependences. Containers of di erent types exhibit di erent manipulation patterns which may greatly a ect our design of parallelization techniques. For example, in Scheme, enumeration of lists which are built-in data types, is through recursive functions rather than through loops. Such manipulations lead to new dependences that need very speci c parallelization techniques. In the work of parallelizing Scheme programs by Harrison[9][10], the recursive manipulation pattern is identi ed and dependences are eliminated through a technique called recursive splitting which transforms recursive functions into loops. So before we go into the details of any of the techniques, let's rst de ne several fundamental dependence-related concepts based on our container types.

4.1.1 Target loops for exploiting parallelism Linear containers are amiable for exploiting loop-level parallelism. Figure 8 shows two common iterative loops of linear containers: iterator-based loop, which iterates the linear container through an iterator; and pop-based loop, which iterates the container through the operation abs pop back(). Both are our target loop patterns for exploiting loop-level parallelism. for(ABS_iterator i=list.abs_begin(); i.abs_hasMore(); i.abs_next()) { Object o = i.abs_access(); ... }

for(;list.abs_empty();list.abs_pop_back()){ ... Object o = list.abs_back(); ... }

(i) iterator-based iterative loop

(ii) pop-based iterative loop

Figure 8: Common loop patterns of container-based applications

4.1.2 Container structural dependences Manipulating containers introduces new dependences. For example, in Figure 8(i), dependences are introduced by operations i.abs next(), i.abs hasMore(), and i.abs access(). We de ne container structural dependence as dependences due to change upon the structure of the container. There are two basic container structural dependences:(1) of any two computations, if at least one of them adds or removes elements from the container, there is an update structural dependence; (2) if two computations access di erent elements of the container through the same iterator, there is an access structural dependence. In Figure 8, the rst loop has access structural dependence and the second loop has update structural dependence. 10

Most container manipulations cause structural dependences, such as the sequential accessing of a list or an I/O stream and the recursive accessing of a tree. It explains, to some extent, why parallelization of container-based applications is much harder than that of array-based ones. Since accessing arrays does not lead to structural dependences, and conventional dependence test which is mostly based on memory dependence, is too restrictive to tolerate structural dependences common in other containers. In most of the cases, detection of structural dependences is trivial; the diculties, however, lie in program transformations to eliminate such dependences. Since structural dependences are inherent in container manipulations, the failure to handle them eliminates further possibilities to exploit any parallelism on the container. Transformation techniques proposed in this section will focus on the elimination of structural dependences.

4.1.3 Parallelizable loops A dependence is inherent to the loop if it is introduced by operations that control the iteration space of the loop. Both loops in Figure 8, for instance, have inherent dependences. Inherent dependences have to be handled properly to enable the exploitation of parallelism on the target loops. We consider a loop parallelizable if there are no loop-carried dependences other than inherent dependences and the inherent dependences can be handled later during loop parallelization.

4.2 Loop parallelization for inherent dependences The loop parallelization techniques we discuss here are pattern-aware, which aims at handling dependences inherent to the two loops in Figure 8. For simplicity of presentation, we assume in the examples that there are only two threads for parallel execution and that elements can be distributed evenly. The most general method of loop parallelization is to distribute container elements into several local containers and then have each parallel thread work on its own data. Figure 9(i) shows the container distribution for the loop in Figure 8(ii), in which elements of stack are distributed into two stacks, s1 and s2. Stack s1, s2; for(; stack.abs_empty();){ s1.push(stack.pop()); if(stack.empty()) break; s2.push(stack.pop()); }

Enumeration e=v.elements(); if(threadId==1) e.nextElement(); for(;e.hasMoreElements();e.nextElement()) { ... Object o=e.nextElement(); ... } (ii) With concrete iteratior

(i) Through container distribution

Figure 9: Loop parallelization Container distribution introduces the overhead of data copying even if only references are copied. Loop parallelization can be made more ecient, depending on the level of support from the speci c concrete container. For example, if the concrete container in the loop of Figure 8(i) allows concurrent access through multiple iterators, the loop can be parallelized through concrete iterators. Figure 9(ii) 11

shows the code that each parallel thread is executing, for two thread, threadId ranges from 0 to 1. Moreover, some containers provide random access interfaces. For instance, the java class V ector has the method elementAt() that allows elements to be accessed through indexes. Iterative loops of such containers can be parallelized in a way similar to that for arrays. Concrete containers can be implemented in a way that supports ecient loop parallelization as well. For example, implementing linear containers with arrays can provide very exible and ecient accessing; or, for linked-lists, access can be speeded up by providing auxiliary access pointers that directly reach elements with a long distance in between[9].

4.3 Container privatization Container privatization plays the same role in the parallelization of container-based applications as array privatization does for array-based applications. Both eciently eliminate dependences not due to true data- ow. However, containers observe patterns di erent from that of arrays in terms of privatization. For arrays and scalars, a variable is privatizable if every use of the variable must be dominated by a de nition of the same variable in the same loop iteration[1]. By contrast, manipulations of temporary linear containers observe the de ne-use-reset cycle, in which states of the container are reset to the original ones after the use. De ne-use-reset is common not only in manipulations of linear containers but also in those of common objects. This is because most objects are aggregate and re-de nition depends on previous states of the object. Reset is a clean way of killing all the previous de nitions of the object. We de ne privatizable containers as follows: a container is privatizable if, in the end of any iteration, states of the container are reset to the states before it enters the iteration. In real applications, we see two common scenarios, shown in Figure 10, that can lead to a privatizable container. list.abs_push_back(element); ... list.abs_pop_back();

(i) "paired" operations

list.abs_push_back(element1); list.abs_push_back(element2); ... while(!list.abs_empty()) { list.abs_pop_back(); } (ii) clean-up condition

Figure 10: Two privatizable scenarios In Figure 10(i), every abs push back() is "paired" with abs pop back(), with the latter recovering the e ect of the former. The concept of "paired" is recursively de ned. For instance, the above two operations are "paired" if all the other appearances of abs push back() and abs pop back() in between them are paired as well. In this example, de ne-use-reset is not just feasible, it is necessary. This is because, here, only computations that "de ne" (abs push back()) the list know how to "reset" (abs pop back()) it. In Figure 10(ii), the container is reset to be empty after the use. One thing worth mentioned is that the reset is done through a loop whose termination is controlled by a test operation abs empty(). 12

implies the state of the container after the loop, which can be illustrated more clearly by presenting the reset-loop in a program control graph as shown below.

Abs empty()

clean_up_test yes

abs_empty() no do something

clean-up-branch

Figure 11: Control ow graph of clean-up-condition and clean-up-branch We de ne

as the clean-up-condition and the yes-branch of the clean-up-condition as clean-up-branch. It is clear that the control- ow that follows the clean-up-branch will no longer see previous de nitions of the container. The privatization algorithm can take advantage of such test operations. Abs empty() is not the only clean-up-condition we have seen in programs. For example, the boolean expression abs size() == 0 is a clean-up-condition as well. abs empty()

4.4 Exploiting associativity We exploit associativity on operations, such as abs push back() and abs push front(), to eliminate update structural dependences. For example, in the loop shown in Figure 12(i), since the container is updated by operation abs push back() only, the structural dependences can be eliminated by letting each thread update its local container and joining the local containers together later. This is very similar to our handling of simple recurrences of scalars. Figure 12(ii) shows a more general case where we can exploit associativity; in the example the container is both accessed and updated in the loop. for(int i=0; i

Suggest Documents