A Parallel Watershed Algorithm - Semantic Scholar

4 downloads 0 Views 375KB Size Report
Andreas Bieniek, Hans Burkhardt,. Heiko Marschner, Michael N olle and Gerald Schreiber. Technische Universit at Hamburg{Harburg. Technische Informatik I.
A Parallel Watershed Algorithm Internal Report 6/96 Technische Universitat Hamburg{Harburg Technische Informatik I September 1996

Andreas Bieniek, Hans Burkhardt, Heiko Marschner, Michael Nolle and Gerald Schreiber Technische Universitat Hamburg{Harburg Technische Informatik I Harburger Schlostrae 20, D-21071 Hamburg Germany

Submitted to: 10th Scandinavian Conference on Image Analysis (SCIA97), Lappeenranta, Finland, June 2-11, 1997

A Parallel Watershed Algorithm Andreas Bieniek, Hans Burkhardt, Heiko Marschner, Michael Nolle and Gerald Schreiber Technical University at Hamburg-Harburg, TI-I Harburger Schlostr. 20, D-21079 Hamburg Germany

Abstract

The watershed transformation is a popular image segmentation algorithm for grey scale images. Sequential watershed algorithms perform a highly data dependent ooding process over the global image. Because of global data dependencies over the sub-domains parallel algorithms which distribute the image over the available processors and simulate the ooding process have a limited speedup. The achievable speedup is highly data dependent. In this paper we show that it is possible to achieve a data independent speedup for images without plateaus. This can be done by inserting temporary labels where the solution depends on neighboring results. We show that the local solutions can be merged to a global solution in a data-independent way.

1 Introduction The computation of watershed lines is an important method for the segmentation of grey scale images. Every pixel in the resulting image gets the label of the catchment basin it belongs to. A catchment basin corresponds to exactly one local minimum of the image. The way of constructing the catchment basins is illustrated in Figure 1. The grey scale image is interpreted as a topographical relief that will be ooded by water starting at the lowest altitude. The ooding process is done step by step globally increasing the altitude. After

ooding all elements at altitude h the process proceeds with altitude h + 1. dam S1

altitude S2

S3 h+1 h

Q1 Q ... source

Q2 S ... segment

Q3

Figure 1: ooding process in the one-dimensional cut The sources for the ooding process are placed on the local minima. Dams or watershed lines are built whenever water from di erent sources meet. Catchment basins can be also explained with water falling on the image surface. If a drop of water falls down on a pixel, it will rinse down the path of steepest descent into a local minimum. Then the pixel belongs to the catchment basin of that minimum and gets the label from it. The path of steepest descent is the path between two pixels having the shortest

topographical distance between them. On a plateau the path of steepest descent can not be decided using the topographical distance. In this case we have to use the geodesic distance to assign the pixel to the catchment basin with the shortest distance from the border of the plateau. The geodesic distance between two pixels of the same plateau is the length of the minimal path connecting the pixels within the plateau (De nition 9). In this paper we present a parallel watershed algorithm which achieves data independent speedup for images without plateaus. In Section 2 we introduce two common sequential algorithms. In Section 3 and 4 the parallel watershed algorithm is discussed in detail. The runtime behavior is examined in Section 5. In Section 6 some conclusions are drawn.

2 Sequential Algorithms In this paragraph we introduce two sequential watershed algorithms. The algorithm of Meyer [6] and of Vincent and Soille [12]. Some watershed algorithms, like Meyers algorithm [6], does not compute the watershed lines themselves. The segmentation is nished if every pixel in the result has a label from the catchment basin it corresponds to. The watershed lines can be computed easily in a post processing step using an edge detector that works on the segmented image. In this section Meyer's formalism which is based on a cost function for walking on the image surface [6] is compared with our approach based on a local condition. In the sequel we show the relation between these two approaches.

Algorithm from Vincent and Soille For details of this algorithm refer to [12]. The image is ooded step by step treating all pixels with the same height in each step. This means that all pixel at h +1 are processed after treating all pixel at altitude h. This divides the problem into hmax subproblems, solving the problem for all pixels with greyvalue  h at level h; h = 0 : : :hmax . Because all pixels processed at one step have the same altitude, the problem is reduced to calculate the geodesic skeleton of in uence zones (SKIZ) on each level. The SKIZ is based on the geodesic distance which is de ned for connected plateaus. In the rst step of the algorithm the pixels are sorted to guarantee quick access to all pixels at a de nite grey value. In the second step the SKIZ for each altitude is calculated by ooding the plateaus of the current height. For e ective ooding a queue mechanism is used. Meyer Algorithm The Meyer algorithm is the base for our parallel implementation, but it is possible to use any sequential algorithm which satis es the local condition developed in Paragraph 2. In contrast to the algorithm from Vincent and Soille all neighbor pixels are processed independently of their height. Hierarchical queues are used to select the next pixel to operate on. The queue is initialized with the border pixels of all local minima. The algorithm works on the hierarchical queue. All pixels p in the queue with the lowest altitude are investigated. A neighbor pixel of p which has not not been labeled already is inserted into the queue of its grey-value and gets the label from p. The algorithm simulates the global ooding of the image. A formal de nition of a catchment basins is given by Meyer in the following way [6]: Let f(p) be a function of grey-values, representing a digital image with the domain  ZZ 2 . Each pixel p 2 has a grey-value f(p) and a set of neighboring pixels p0 2 N(p) with a distance function dist(p; p0) to each neighbor. In most cases a 4 or 8-neighborhood is used with a constant distance of 1 to all 4 or 8 neighbor pixels. From functions on continuous 2

space Meyer derived a topographical distance for the digital space. He de nes a cost function based on the lower slope of f at a given pixel p: De nition 1 (Lower slope) The lower slope is the maximum of the gradients to all neighboring pixels:  0)  f( p ) ? f( p LS(p) = 8pmax with f(p0) < f(p) N(p) dist(p; p0) 02

and is not de ned for the case f(p0 )  f(p). The cost function for walking on the topographical surface from position pi?1 to a neighbor position pi is:

De nition 2 (Cost function) 8 >
f(pi ) cost(pi?1 ; pi) = > LS(pi)  dist(pi?1 ; pi) : f(pi?1 ) < f(pi ) : 1 (LS( p ) + LS( p ))  dist( p ; p ) : f(pi?1 ) = f(pi ): i?1 i i?1 i 2 Meyer de nes the topographical distance of a path on the image surface using this cost function: De nition 3 (Topographical distance of a path) The topographical distance TDf (p; q) of a path  = p1; p2; : : :; pn with pi 2 , p1 = p and pn = q is: TDf (p; q ) =

n

X

i=2

cost(pi?1 ; pi) :

The topographical distance of a path is minimal if the path follows the steepest slope. Otherwise extra costs are added. De nition 4 (Topographical distance) The topographical distance between the pixels p and q of the image is the minimal topographical distance among all paths  between p and q inside : TDf (p; q ) = 8inf TDf (p; q ) : 

If there exist an optimal path, having the steepest descent between all pairs of pixels, the topographical distance is equal to the di erence in height. In the following mi ; mj describe arbitrary local minima of the image. A catchment basin is de ned in the following way [6]: De nition 5 (Catchment Basin based on topographical distance) A catchment basin CBTD (mi ) of a regional minimum mi is the set of pixels p 2 where the topographical distance is closer to mi than to any other regional minimum mj , taking the heights of the di erent minima into account: o n CBTD (mi ) = p f(mi) + TDf (p; mi) < f(mj ) + TDf (p; mj ) 8j 6= i : With these de nitions Meyer proposed the following theorem (proposition 5 in [6]): Theorem 1 The topographical distance between a pixel p and the regional minimum mi in the depth of its catchment basin is minimal and equal to f(p) ? f(mi) and the geodesic line between them is a line of steepest descent. The reversal of Theorem 1 states that a path of steepest descent causes minimal costs. The construction of the catchment basins is reduced to a problem of nding a shortest path between each pixel and a local minimum. This can be solved with the classical algorithms for nding shortest paths. The algorithm can be further simpli ed incorporating the characteristics of the problem. A hill climbing algorithm is developed in [6]. It uses hierarchical queues to select the next pixel to operate on and to overcome the blindness of the algorithm on plateaus which is caused by the fact that the topographic distance is the same for all pixels on the plateau. It is necessary to nd out all local minima in the image to initialize the hierarchical queues before the ooding process can start [6, 8]. 2

3

Segmentation based on local condition In our approach we propose a local condition

for a correct image segmentation. If the local condition is true for every pixel of the image, the segmentation is called a watershed segmentation. In this paragraph we also show the relation between the local condition and the formalism based on the cost function. At rst we introduce a set of neighbors of a pixel p that can be a part of the path of steepest descent through p: De nition 6 (Neighbors on a path of steepest descent) NLS(p) is the set that contains pixels p0 2 N(p); p; p0 2 having a maximal gradient from p to p0 :   f(p) ? f(p0 ) 0 0 NLS(p) = p dist(p; p0) = LS(p) : f(p ) < f(p) : For the special case of 4 or 8-neighborhood with dist(p; p0) = 1 for all p0 2 N(p) we can simplify the set to: (



f(p00) : f(p0) < f(p) NLS(p) = p0 f(p0) = 8pmin N(p) 002

)

:

The path of steepest descent from a pixel p down to the local minimum mi will pass only S pixels of the set p NLS(p). The set NLS(p) was also introduced by Meyer as ?(p) [6]. De nition 7 (Watershed segmentation for images without plateaus) For any image without plateaus a segmentation is called watershed segmentation if every local minimum mi has an unique label L(mi ) and for every pixel p; p0 2 and NLS(p) 6= ; the condition: 2

9p0 2 NLS(p) with L(p) = L(p0) holds. The image is segmented into catchment basins giving each minimum mi a unique label L(mi ) and starting a ooding process that takes care of the proposed local condition in De nition 7. De nition 8 (Catchment Basin) For a watershed segmentation (De nition 7) a catchment basin CBLC (mi ) of the local minimum mi is the set of pixels with the label L(mi ):

CBLC (mi ) = fp j L(p) = L(mi)g .

CBLC (p ! mi ) denotes the catchment basin of mi containing pixel p. The following theorem shows the relationship between the two di erent de nitions of a catchment basin: Theorem 2 The catchment basin based on the topographical distance (De nition 5) is a subset of the catchment basin of De nition 8 based on the local condition (De nition 7).

Proof The construction of the catchment basin according to De nition 7 can be described

with a recursion. The recursion starts with the set of pixels belonging to the local minima mi. All these pixels are labeled with L(mi). In each step pixels are added to the previous set. The recursion ends if no pixels can be added. CBLC 0 (mi) = mi k k CBLC k+1 (mi) = CB LC (mi ) [ 4 CBLC (mi ) n o 4 CBLCk (mi) = p 9p0 2 NLS(p) and p 62 CBLCk (mi) and p0 2 CBLCk (mi) 4

Each added pixel p has a neighbor pixel p0 being a part of the catchment basin CBLC k (mi ) and NLS(p) and suces the following conditions: p0 2 NLS(p) p) ? f(p0) LS(p) def = 6 f(dist( p; p0)

LS(p)  dist(p; p0) = f(p) ? f(p0) cost(p; p0) = f(p) ? f(p0): According to Theorem 1 one recursion step adds only those pixels p building a path of steepest descent down to CBLC k (mi ) with the minimal costs f(p) ? f(p0); p0 2 CBLC k (mi). After the recursion is nished all paths between the pixels of the catchment basin and its minimum are paths of steepest descent. Therefore it is not possible to construct a steeper path to a di erent minimum mj . But there might exist an another steepest path to a di erent local minimum mj . Please note that in this case the pixel is a watershed pixel according to De nition 5. This proofs that CBTD (mi ) is a subset of CBLC (mi ).

The di erence between De nition 5 and 8 is the treatment of the pixels that have steepest paths to more than one minimum. According to De nition 5 these pixels are watershed pixels. Following De nition 8 based on the local condition these pixels are assigned by chance to a minima mi with a steepest path to mi and with p0 2 NLS(p); L(p0) = L(mi).

3 The Parallel Watershed Transformation Parallel algorithms on watershed transformation have been discussed in [7, 8, 5, 4]. Thereby two approaches are mainly under consideration. One approach is to distribute the image over the processors and to perform the global ooding process in parallel [7, 8]. Synchronization points have to be inserted where the result depends on a neighboring sub-domain to ensure a correct result. The achievable speedup is highly dependent on the content of the image. A di erent approach is taken in [5, 4]. The watershed calculation is translated in the problem of nding shortest paths of graphs in parallel. For this approach the speedup is limited by the number of local minima in the image. Also load imbalance according to di erent sizes of catchment basins arises in practice [4]. In our approach we partition the image into sub-domains which are distributed among P processors. For images without plateaus we prove that the algorithm does not depend on the image data. Load imbalance occurs only due to data dependent run-times of the sequential watershed algorithm, but the sequential algorithm guarantees a limited number of accesses to any pixel of the image. In practice the plateau correction step, which has to be added for images with plateaus, is the most limiting factor for the speedup of the parallel algorithm. The main idea is of the suggested algorithm is to assign temporary labels to pixels that will be ooded from a di erent sub-domain and therefore belong to an outside catchment basin. Instead of synchronization points a hypothetic solution is assumed. The watershed segmentation can be solved independently on every sub-domain without any synchronization. After the local ooding is nished the global segmentation problem is solved by identifying connected catchment basins and relabeling them with an global unique label. The problem to identify the global unique labels is strongly related to the connected component labeling problem [2, 11, 1]. Each processor works on a non overlapping sub-image f i with the sub-domain i and produces a sub-result Li . Let P be the number of processors in the network and i be a sub-domain of with: [ \

= i and i j = ; ; i 6= j ; 8i; j = 1; : : :; P: 5

+i = ([8p i N(p)) \ denotes the set of pixels of the sub-domain i including all global neighboring pixels. ?i = i \ ([8j 6=i +j ) describes the set of pixels which have outside neighbors. First we discuss the case of images without any plateau. The paths of steepest descent may be cut when partitioning the image into sub-domains. If there exist a pixel p0 2 NLS(p) \ ( +i n i ) which is neighbor of a border pixel p 2 ?i , the label of p depends on an outside catchment basin and a global unique temporary label is assigned to p. To nd out which pixels require temporary labels a communication step is used to get border information from all the neighboring sub-images. The communication step can be overlapped with the search for local minima of the sequential watershed algorithm. Now the watershed problem can be solved independently on each sub-domain. The use of temporary labels can be viewed as hypothetical solutions which have to be replaced aposteriori by the nal solution. The algorithm is summarized in Theorem 3. The proof also shows that it is possible to calculate the global solution by successively merging local solutions until the whole image is covered. For the merging process only border information has to be communicated [2, 11, 1]. After a global merging step all labels of the sub-domain are replaced with globally consistent labels. 2

Theorem 3 (Parallel Watershed Algorithm) It is identical to solve the watershed problem on the domain or to do the following on each sub-domain i with i = 1; : : :; P :

1. Find all local minima on i using +i as information base and assign a global unique label to each minima. 2. Label each border pixel p 2 Temp  ?i : i n



= p NLS(p) \ ( +i n i ) 6= ;

Temp i

o

with a globally unique temporary label Li (p) = LTemp (p), because Lj (p0 ) is unknown yet. 3. Solve the watershed problem on i with any sequential watershed algorithm that produces a segmentation according to De nition 7, treating the pixels p 2 Temp as addii tional local minima. 4. Get the correct segmentation for performing a recursion in parallel. Each step links the partitioned catchment basins of neighboring sub-domains i ; j to obtain a consistent resulting sub-domain ij = i [ j . For all p 2 Temp , the catchment i basin CBLC (p = mi ) has to be connected with the catchment basin CBLC (p0 ! mj ), . The connection of p0 2 NLS(p) \ j . The same operation is done for all p 2 Temp j catchment basins is done by replacing the identi ers of the set of connected catchment basins with one selected identi er from the set. The recursion ends if ij is equal to .

Proof Step 1 and 2 ensure that initially each sub-domain i has a disjunct set of labels.

After processing the local sequential watershed algorithm (step 3) on every sub-domain

i , the local condition (De nition 7): 9p0 2 NLS(p) with L(p) = L(p0) holds for every p; p0 2 i . In the recursion of step 4 two neighboring sub-domains i; j are merged to a larger subdomain ij = i [ j . For all pixels p 2 ij n ( Temp [ Temp ) the local condition is already i j true in ij This is because each pixel p and the neighbor set p0 2 NLS(p) is completely inside the previous sub-domain and the label sets are disjunct. All pixels p 2 Temp [ Temp have i j 6

been labeled with a new global unique temporary label in step 2 and therefore 6 9p0 2 NLS(p) with L(p) = L(p0). In step 4 CBLC (p = mi ); p 2 Temp is connected with the catchment basin CBLC (p0 ! i 0 mj ), p 2 NLS(p) \ j and vice versa. The connection is done by replacing the identi ers of CBLC (p = mi ) and CBLC (p0 ! mj ) with one selected identi er for each group of connected catchment basins. Therefore after the relabeling the local condition holds for all p 2 Temp [ i Temp

j . Because new labels are assigned to complete catchment basins inside ij , the local condition is still true for pixels that are already labeled correctly. Therefore the local condition holds for all p; p0 2 ij . The disjunct set of labels ensure that the relabeling has no e ect on

n ij . By repeating the recursion until the whole image is covered one can follow that the local condition (De nition 7) holds for every pixel p; p0 2 and therefore the segmentation is a watershed segmentation.

Treatment of plateaus The topographic distance has the same value for all pixels on a

plateau. Therefore we have to use the geodesic distance in addition to ensure that a pixel on a plateau gets the label from the nearest border pixel with a lower neighbor. In this paragraph we will show the inclusion of the geodesic distance in the formalism. A plateau is a set of pixels that have the same height and all pixels on a plateau are connected using the neighborhood N(p). Our approach is to treat every pixel as part of a plateau of its height. For minimum plateaus which are distended over more than one subdomain no re- ooding is necessary. They are linked by selecting a globally unique identi er in the nal recursion step. Let @P = fp0j NLS(p0 ) 6= ;; p0 2 P g the set of pixels on the border of plateau P which have a lower neighbor. The geodesic distance between two pixels p and p0 on a plateau is equal to the length of the shortest path between p and p0 within the considered area [12]. Therefore the minimal geodesic distance between a pixel p on the plateau P and all border pixels p0 2 @P is:

De nition 9 (Minimal geodesic distance to the edge of a plateau) distP (p; p0) = inf flenght() ,path  between p and p0 which is totally included in P g 8

distP (p; p0) distmin (p; @P ) = 8min p @P With the geodesic distance to the edge of a plateau we can extend the set of lower neighbors on a path of steepest descent NLS (De nition 6) to the case of images with plateaus: De nition 10 (Extended neighbors on a path of steepest descent) The set NLS0(p) contains the pixels of the sets NLS(p0) of all border pixels p0 2 @P that have the minimal geodesic distance to p: 02

8 < [

NLS0(p) = :

p @P 02

9 =

NLS(p0) : distP (p; p0) = distmin (p; @P ); :

De nition 11 (Watershed segmentation for images with plateaus) For any image with plateaus a segmentation is called watershed segmentation if every local minimum mi has an unique label L(mi ) and for every pixel p; p0 2 and NLS0 (p) 6= ; the condition: 9p0 2 NLS0(p) with L(p) = L(p0) holds.

7

With the transition from NLS(p) to NLS0(p) in De nition 7 the algorithm in Theorem 3 produces a watershed segmentation for any image with plateaus. The case of images without plateaus is included, because De nition 10 possesses the following property: distP (p0; p) = 0 ) p0 = p ) NLS0 (p) = NLS(p): One should note that the de nitions leaves open the metric which is used for the geodesic distance. The metric used for most implementations of the watershed algorithm is based on the 8-neighborhood which approximates the geodesic distances between the center pixel and all 8 neighbors with 1. In our parallel approach the minimal distance to the edge of the plateau is corrected iteratively until no change is detected. Section 4 introduces this step in more detail.

4 Implementation Details Our implementation is based on the algorithm proposed in Theorem 3 using the sequential watershed algorithm introduced by Meyer [6], but it is possible to use any sequential watershed algorithm which segments the image according to the local condition of De nition 11. We implemented the algorithm using the 8-neighborhood. This section describes in more detail the algorithms for eciently nding out the local minima, correcting the distances on plateaus and nding a global solution for the labels. For an introduction to the ooding process refer to [6]. The result of the parallel watershed algorithm is a sub-image L(pi) which has globally correct labels according to the condition of De nition 11 for each pixel pi of the input subimage i . First a sequential watershed algorithm is performed. Afterwards the plateaus which are completely inside a sub-domain are already segmented correctly. This is guaranteed by the Meyer algorithm using the principle of the hierarchical queues [6]. The next step is the plateau correction step. It iteratively corrects the labels and distances of plateaus that are stretched out over more than one sub-domain. In the following the term 'distance' is used for the actual known minimum distance from a pixel p 2 P to @P , the nearest border pixel of the plateau P with a lower neighbor. A separate temporary image is used to store the distance of each pixel. This additional step is needed, because the set NLS0 (p) (De nition 10) has a global character in opposite to NLS(p). The iteration corrects the distances until no changed is detected on all processors. The last step is a global recursion to select an global unique label for partitioned catchment basins as described in Theorem 3 step 4.

Detection of the local Minima We use the following special values for labels as proposed in [8]: INIT to initialize all untested pixels of L(pi) and NARM to mark pixels which are not a regional minima. The communication to get the border information from all neighbors is overlapped with the local detection of minima. A sequential scan of the pixels is performed. If a pixel p is found that has no neighbor with a smaller grey-value, the plateau which includes p is ooded and labeled with a new label. If a neighbor pixel with a smaller grey-value is detected during the ooding process, the whole plateau is marked with NARM. With the received neighbor information, minima areas which have outside neighbors with smaller grey-values are set to NARM as well. Afterwards the border pixels of all local minima are stored in the hierarchical queue with the grey-value of the minima. For e ective processing static hierarchical queues which are allocated according the local histogram of the grey-values are used [6, 8]. 8

Plateau Correction Theorem 3 shows that a watershed segmentation for any image without plateaus can be achieved in a data-independent way. The reason is that the set NLS(p) has a local character. The additional plateau correction step is needed because the set NLS0 (p) (De nition 10) has a global character. A separate temporary image is used to store the known minimum distance @P for each pixel. The iteration corrects the distances by communicating neighbor informations and partial re- ooding. The re- ooding updates the distance and labels according to the updated neighbor information. After each exchange of neighbor information and re- ooding a global synchronization is performed to detect if the plateau correction is nished. The correction is nished if no re- ooding has been performed on all processors. In the last synchronization step the local number of generated labels are collected. With the pre x sum over the numbers each processor knows the label o set to achieve a minimal set of global disjunct labels. The re- ooding is done in the following way: If the distance of a plateau pixel p 2 ?i at the border of the sub-image can be reduced, it is marked as a candidate for re- ooding. A new unique temporary label is assigned to p and it is marked for later linkage with the outside neighbor. The generation of new temporary labels at this point is essential to ensure that the sub-images still have disjunct sets of labels after the plateau correction. For an e ective re- ooding process all candidates are sorted after their grey-level and secondly after their known distance. All candidates are inserted into a hierarchical queue. A greedy re- ooding strategy is performed to ensure that the re- ooding process does not leave single pixel of previous segments untouched. The greedy strategy oods a pixel even if its distance stays the same. Find a global solution for the labels In the last step connected catchment basins are identi ed and relabeled with a global unique label. The problem is strongly related to the connected component labeling problem [2, 11, 1]. The equality of two catchment basins CBLC (p = mi ) = CBLC (p0 ! mj ); p0 2 NLS(p) is represented by the equation L(p) = L(p0). Each processor maintains an equation table which is an array with the size of the global number of labels. The equation L1 = L2 is represented in the table by storing a pointer to position L2 at position L1 . By de nition all labels of a partitioned catchment basin are replaced with the largest label number from all partitions [2, 11, 1]. In this paragraph we assume an N  N image distributed stripe-wise on P processors. The extension to block-wise distributed images is straight forward. Two approaches to select global representatives for each catchment basin are presented. The rst one is based on a total data exchange. It has the disadvantage that the processors receive more information as needed. The method of merging local sub-images avoids this disadvantage and the result is a reduced communication overhead. For all timing results the second method is used. Total data exchange The rst approach for this problem is to perform a total data exchange between all processors. The total data exchange can be done in log2 P communication steps constructing P binary trees to every processor in parallel. For example the de Bruijn communication topology is well suited for this kind of operation [3, 9]. The amount of data to be sent over each communication link in each step is proportional to the size of the table and therefore proportional to the total number of labels including all temporary labels. After each communication step, two tables are received on each processor. The two input tables are merged to a consistent output table which is sent to two processors. One can show that all processors nally receive the same globally consistent table. 9

The actual size of the table is data dependent. In the worst case the number of labels is N 2 plus some additional labels for the plateau correction. Therefore the number of data items sent on one edge of a de Bruijn communication network is O(N 2) in each step. By sending only equations of the image borders instead of the whole table, the amount data can be reduced. All equations are accumulated on each processor in each step. Therefore the maximum number of received equations over the levels is 2N; 4N; : : :; PN . The average amount of data items send over a communication link each step is reduced to O(2NP= log2 P ). The information received by the total data exchange is more than needed. Every processor uses only the part of the global solution which e ect the labels on the local sub-domain. By merging of sub-images this disadvantage is avoided:

Merging of sub-images The image oriented method corresponds directly to step 4 of Theorem 3). Every sub-domain is represented by its edges. A label and linkage information is assigned to every border pixel. One recursion step produces a new sub-domain that joins the area of two neighboring sub-domains. Consistent labels are calculated for the new sub-domain from the label and linkage information of the merged neighboring edges. With the help of a local equation table unique representatives for connected catchment basins are selected. The rule that the largest of all involved labels is selected, ensures identical results on all processors. The inner edges of the resulting subdomain are stored and not communicated further. Then the labels of the new outer edges are updated and the new larger sub-domain can be merged with an other larger sub-domain. The communication pattern of this process results in a binary tree. The global information is gathered at one processor. To distribute this information the split step is needed. The former inner edges are updated according to the global information and distributed further in opposite direction of the merge process. The split process can be avoided if the merge process is done redundantly with a butter y communication pattern. Figure 2 shows an example of the merge and split process for an image distributed stripe-wise on four processors. Merge

Merge

Split

Figure 2: Communication of the image oriented method for an image distributed stripe-wise on 4 processors, non-redundant and redundant version. 10

The merge and split operation needs log2 P communication steps each. In every communication step data corresponding to one sub-domain are sent, containing 2N data elements. The maximum number of data items sent on one edge of a de Bruijn or Butter y graph in each of the 2 log2 P communication steps is equal to 2N . For the redundant method the number of communication steps is reduced to log2 P . It is essential that one of the merged data packets is kept local to ensure that the information gathered in the local equation table corresponds to the sub-domain of the processor. Therefore the redundant communication pattern can not be embedded one to one in a de Bruijn graph. The maximum number of data items sent on one edge of the Butter y graph each step is equal to 2N .

5 Runtime Behavior and Results The implementation of the algorithm follows a single program multiple data (SPMD) approach. The algorithm is implemented in the context of the parallel image processing system Pips[10]. The services for distributing and collecting the images are provided by Pipsand are not discussed further in this paper. The hardware platform is the massively parallel system Parsytec(R) SC-128. It consists of 128 T-805 transputers. A de Bruijn topology is used to connect the processors. This section shows some results using six example images of the dimension 512  512 that are distributed stripe-wise or block-wise on the processors. For every image the parallel watershed algorithm was performed on P = 2i ; i = 2; : : :; 7 processors. The examined images and the result of the parallel watershed algorithm on 64 processors with stripe-wise data distribution are shown on page 15. The sequential time is estimated from the time for the sequential parts of the algorithm on the minimum number of processors. The sequential part of the algorithm is the detection of local minima on the base of local data and the local ooding step. The speedup for the stripe-wise and block-wise distributed example images are shown in Figures 3 and 4. The total times for the example images are in Tables 1, 2. Tables 3, 4 show the times for the di erent phases of the algorithms for the example image peppers and crystal. The results show that for images with large plateaus, (Chess and Simple), the speedup is limited due to the iteration cycles needed for the plateau correction step. For the images with small plateaus (Peppers,Gold ) which are a not pre-processed natural scene having a large number of local minima, the achievable speedup is much lager. This is also true for Sincos which is an image without any plateau and was generated by sinus and cosinus functions. It has the highest speedup over all examples in the case of 128 processors. Tables 3, 4 show that for a large number of processors the plateau correction step is the most limiting factor for the speedup. This is true even for images with small plateaus. One correction step with the exchange of neighbor data and global synchronization is needed to detect the termination for all images. In most cases at least an other step is necessary to correct small plateaus.

6 Conclusion For images without any plateau we have proven a transition from a data-driven sequential watershed algorithm to a structural-driven parallel algorithm. This provides an algorithm that is almost independent on the image data. For practical images which include plateaus, the algorithm falls back to a data-driven one. This is due to additional plateau correction steps. The time for the plateau correction depends on the size of the plateaus and how the image is distributed among the processors. 11

Nevertheless, the timing results show that for images with small plateaus a speedup of more than 25 is possible. Future work has to be done to reduce the dependencies of the algorithms on the size and shape of the plateaus. This can be done with the separation of the ooding process on plateaus and hills. Temporary labels can be used to merge both solutions. Using di erent algorithms for the plateau problem like parallel Voronoi type algorithms might further reduce or bound the data dependencies of the parallel algorithm.

References [1] H. Embrechts. MIMD Divide-and-Concquer algorithms for geometric operateions on binary images. Phd thesis, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, March 1994. [2] T. Johansson and E. Bengtsson. A new parallel MIND connected component labeling algorithm. In PARLE: Parallel Architectures and Languages Europe, pages 801{804. LNCS, Springer-Verlag, 1994. [3] F.T. Leighton. Introduction to Parallel Algorithms and Architectures. Morgan Kaufman Publishers, San Mateo, California, 1992. [4] A. Meijster and J. B. T. M. Roerdink. Computation of watersheds based on parallel graph algorithms. In P. Maragos, R. W. Shafer, and M. A. Butt, editors, Mathematical Morphology and its Applications to Image and Signal Processing, pages 305{312. Kluwer, 1996. [5] A. Meijster and J.B.T.M. Roerdink. A Proposal for the Implementation of a Parallel Watershed Algorithm. In CAIP Compute Analysis and Patterns, Prague, September 1995. [6] F. Meyer. Topographic distance and watershed lines. Signal Processing, 38(1):113{125, July 1994. [7] A. Moga and M. Gabbouj. A Parallel Watershed Algorithm Based on the Shortest Paths Computation. In P. Fritzson and L. Finmo, editors, Parallel Programming and Applications, 1995. [8] A.N. Moga, T. Viero, M. Gabbouj, M. Nolle, G. Schreiber, and H. Burkhardt. Parallel Watershed Algorithm Based on Sequential Scanning. In Proc. of the IEEE Workshop on Nonlinear Signal and Image Processing, Halkidiki, Greece, June 1995. [9] M. Nolle. Konzepte zur Entwicklung paralleler Algorithmen der digitalen Bildverarbeitung. PhD thesis, Technische Universitat Hamburg-Harburg, December 1994. [10] M. Nolle, G. Schreiber, and H. Schulz-Mirbach. PIPS{a general purpose Parallel Image Processing System. In G. Kropatsch, editor, 16. DAGM - Symposium Mustererkennung, Wien, September 1994. Reihe Informatik XPress, TU-Wien. [11] H. Samet. Connected component labeling using quadtrees. Journal of the ACM, 28(3):487{501, July 1981. [12] L. Vincent and P. Soille. Watersheds in Digital Spaces: A Ecient Algorithm Based on Immersion Simulations. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(6):583{598, 1991. 12

Time (s) Sequential 4 8 16 32 64 128 CHESS 75,21 21,48 11,28 6,88 5,72 6,51 9,67 GOLD 70,86 20,94 11,35 6,51 4,37 3,46 4,20 PEPPERS 72,12 21,11 11,10 6,41 4,01 3,14 2,62 CRYSTAL 71,77 21,39 11,90 6,76 4,09 3,34 3,65 SIMPLE 56,59 18,36 10,36 6,95 5,32 5,11 6,52 SINCOS 81,74 26,70 18,23 9,87 5,84 3,83 2,90 Table 1: Execution time for di erent images, distributed stripe-wise on the processors Time (s) Sequential 4 8 16 32 64 128 CHESS 74,41 21,38 11,10 6,11 3,68 2,51 2,72 GOLD 70,20 21,54 12,17 7,35 5,22 4,03 3,96 PEPPERS 71,63 21,18 11,07 6,98 4,62 3,66 3,31 CRYSTAL 74,00 22,09 11,59 8,59 5,93 4,79 4,14 SIMPLE 55,67 19,53 11,82 8,22 6,05 5,18 5,01 SINCOS 81,20 23,46 13,86 7,76 6,31 5,28 5,01 Table 2: Execution time for di erent images, distributed block-wise on the processors Peppers Time(s) 4 8 Local Minima 9,65 4,86 Local Flooding 8,06 4,03 Plateaus 1,58 1,08 Merge 0,04 0,06 Split 0,07 0,15 Relabel 0,83 0,40 Initializations 0,94 0,49 Total 21,18 11,07

Block 16 32 2,62 1,36 2,01 1,01 1,62 1,62 0,07 0,09 0,19 0,29 0,21 0,11 0,27 0,15 6,98 4,62

64 0,90 0,50 1,64 0,13 0,34 0,06 0,10 3,66

128 4 8 0,60 9,85 5,07 0,25 8,09 4,04 1,77 1,27 0,84 0,21 0,05 0,07 0,40 0,07 0,14 0,03 0,82 0,41 0,05 0,96 0,52 3,31 21,11 11,10

Stripes 16 32 2,63 1,47 2,04 1,03 0,93 0,78 0,08 0,12 0,23 0,30 0,21 0,11 0,30 0,20 6,41 4,01

64 0,85 0,52 1,06 0,12 0,40 0,06 0,14 3,14

128 0,57 0,28 1,00 012 0,48 0,04 0,13 2,62

Table 3: Time for phases of the processor with maximum total execution time (Peppers). Crystal Time(s) 4 8 Local Minima 10,10 4,82 Local Flooding 8,14 4,06 Plateaus 2,23 1,69 Merge 0,08 0,05 Split 0,04 0,17 Relabel 0,59 0,32 Initializations 0,92 0,48 Total 22,09 11,59

Block 16 32 2,55 1,58 2,03 1,02 3,27 2,66 0,08 0,13 0,22 0,28 0,20 0,12 0,25 0,14 8,59 5,93

64 0,87 0,51 2,77 0,17 0,32 0,07 0,08 4,79

128 4 8 0,60 9,73 4,80 0,25 8,07 4,05 2,51 1,90 1,95 0,18 0,06 0,05 0,50 0,08 0,19 0,05 0,63 0,36 0,05 0,92 0,49 4,14 21,39 11,90

Stripes 16 32 2,56 1,44 2,05 1,04 1,29 0,83 0,08 0,07 0,25 0,36 0,25 0,16 0,29 0,19 6,76 4,09

64 0,85 0,53 1,19 0,08 0,46 0,09 0,13 3,34

Table 4: Time for phases of the processor with maximum total execution time (Crystal).

13

128 0,57 0,28 1,98 0,15 0,50 0,05 0,11 3,65

30 CHESS GOLD

25

PEPPERS CRYSTAL SIMPLE

20

SINCOS

15 10 5 0 0

16

32

48

64

80

96

112

128

Figure 3: Speedup for di erent images, distributed stripe-wise on the processors

30 CHESS GOLD PEPPERS

25

CRYSTAL SIMPLE SINCOS

20 15 10 5 0 0

16

32

48

64

80

96

112

128

Figure 4: Speedup for di erent images, distributed block-wise on the processors

14

Example images and their results

Figure 5: input images

Figure 6: resulting images

15

Suggest Documents