Document not found! Please try again

Balanced, Locality-Based Parallel Irregular Reductions* 1 Introduction

2 downloads 0 Views 406KB Size Report
DPO class. In this paper we propose two techniques to introduce load balancing into a DPO method. The first technique is generic, as it can deal with any kind of ...
               

         !        

   "   # $ $   Æ  %   $  & " %  '   #   $ $       # ($  )  * +,- .       $/ $ ,- .       $/& 0  %$ #         * $     $  $ ) 1 $ #& + $ #        Æ %$       $    $      ,- & 2   )    ) '    $   $ #   ,-  $& 3 ( '       $ )  1$  $ #     # $ & 3  $ '  $     $ #   )      # )        $   & Æ      $        $ #   4 ,-  $  $& 4   $ $ 1  $ )  $ $ 1    )   $    $&   

                        !     "     #    $               %     &#  % '                  $    (        "   )*+,    '  $     )*-, &#  % (  '           .   $    /0'            Æ $           1   $     $   %  " $   "             )* **, 3 ) 1 )   $ #  $   $   . 2 53/ !    32 678

&       $      $ % $      $       2  "   $        '   $  $     ''   $       ! $    % '  '   $          "       $   Æ        !    0 $           "   %  0  3      4 5  " %       46 5  %     ( $            0  6  "   ( $           4                  0      0 (5 7 "            $   $  $  3   4 "   "5   '     0 (  2   '    8    '   $    $       9 0  0       %        $   '"       45 2               $   $          Æ                     6  0    '  '     !    0   0  %        6   2     %     ( $             0  ( $           2    %     $     0           $ 0       $      2     $          '  $   ( (     !    0  Æ  $         $    6     0      .  (      0       % 2     $              '    $  $  6   (  $    0    $              '       2   $     0     $        $       :              $   $ '  $ "  $  ; 0                '   0  Æ      6   7      '      



  







 

1

 

2

3

fDim

 

Loop iterations

   

½  ¾   

             



      

 

n write operations per iteration (fk(i), k=1,2,...,n)

½ ¾

 

A 1

2

3

ADim



    )   $   $           

         

6.              "     '          9  $    0    3       % 4 5        % 46 5 2               $      $    $      (  .      < 6  %            4 (5   $              $          0         ( 4           0   (5 2 $    $   '  0      $         $   0     0  7 * 4   $      ' $  5         4    5 0               6    "'    $       "             # 2(          7 =  0       $    %      0    6  2               0         $    >                2       '    "   '   %  '  0 Æ 2           4 ' 5   " '                       % 2          '  $   $              

thread 1

thread 2

1

3

2

thread p fDim

Loop iterations

1

2

3

reordering of iterations (inspector)

n write operations per iteration (fk(i), k=1,2,...,n)

1

2

3

Loop iterations

n write operations per iteration (fk(i), k=1,2,...,n)

synchronized writes ?

A

fDim

ADim reduction array privatization ?

synchronized writes ? ADim computation replication ?

A 1

2

3

thread 1 thread 2

LPO

thread p

DPO

  9     +,- $ ,-  $   %  '     '                            4            5 2      4   5           $     ;0          0  $      2    0          0          0   $       ;     0    $        $  "       "       $    '  $      /0'   '    $         '  .   0          4    '            $    5      6  '    '  $                    !      0           4     0     (5        0   .       $       4       $      5 2           0  8       4 $ 5  20                6     0      )? @ *A,               0    $      4(  5 2                0            $              $   0 ( ; 0'       $                 ( $      

        

    

 

  

 # # :$ : )     #  ) :$ : ) #   ,     +,- $ ,-    

$    $& 3 #            4 $ #   $ #   $$    $       0   2        $   4  5