Performance Comparison of Parallel Graph Coloring ... - IEEE Xplore

2015 International Conference on Computing, Networking and Communications, Cloud Computing and Big Data

Performance Comparison of Parallel Graph Coloring Algorithms on BSP Model using Hadoop Rajiv Misra Department of Computer Science & Engineering Indian Institute of Technology, Patna Email: [email protected]

Nishant M Gandhi Department of Computer Science & Engineering Indian Institute of Technology, Patna Email: [email protected]

graph processing platform which make it more efficient than iterative MapReduce job on top of Hadoop. The basic Pregel programing model comes from Bulk Synchronous Parallel (BSP) model [7]. As Pregel is proprietary system of google, open source community has developed similar systems like Stanford:GPS and Apache Giraph. These graph processing systems have library and API which are very intuitive for graph processing and algorithm design. Inmemory processing further improve the computation time. Internally, the whole job is converted into one MapReduce job with only Map function which reduce IO operations. In this paper, we have discussed programing model of Pregel like graph processing system. We have also discussed heuristic distributed graph coloring algorithm such as Local Smallest-Largest Degree First, Local Maxima First,Local Minima-Maxima First and Local Largest Degree First. The real world social network graph data are used for experiment. We had setup small Hadoop cluster on our premises to test the results and performance of these algorithms. The paper is organized as follow. In section 2, we have presented the related work and explained the background. In section 3, we have discussed the algorithms. In section 4, we have described our experiment setup, datasets and result. In section 5, we have discussed conclusion and future work.

Abstract—Nowadays, Hadoop is massively used to store large data generated by various sources. These data are often represented in large scale graphs to solve real world problems. To compute those data, many Bulk Synchronous Parallel(BSP) model based graph processing systems are available on top of hadoop such as Pregel and Stanford:Graph Processing System (GPS). The problem of graph coloring is to assign color to all the vertices such that no neighbor vertices have same color. The graph coloring problem has many practical application in real world data analytics. In this paper, we have compared the heuristic graph coloring algorithm with BSP model based on hadoop such as Local Maxima First, Local Minima-Maxima First, Local Largest Degree First, Local Smallest-Largest Degree First. We experimented our algorithms on real world graph dataset on our hadoop cluster. The result shows that Local Smallest-Largest Degree First algorithm perform better than other heuristic based algorithms, in term of runtime and number of color used. Keywords-Parallel Algorithm, Hadoop, Graph Coloring Problem, Bulk Synchronous Parallel model

I. I NTRODUCTION The graph coloring is very famous problem with many practical applications such as Frequency Assignment[1] and Scheduling[2]. The graph coloring problem is defined as follow. Let G = (V, E) be the large undirected graph where V is set of vertices and E is set of edges. The edges are of the form (i, j) where i,j ∈ E. The problem of graph coloring is to assign color to each vertex i ∈ V such that i and j does not get same color. Finding optimum solution for graph coloring problem is well known NP-Hard class problem [3]. However when it comes to BigData analytics, the speed of graph coloring is more important than coloring optimization. Today, Hadoop has been taken up by many companies as BigData solution for storage and computation [4]. The core computation model of Hadoop is MapReduce [5]. However MapReduce is not efficient solution for problems which requires multiple MapReduce iterations i.e. graph processing. Iterative MapReduce job requires lot of IO operations and that turn into performance bottleneck for MapReduce based graph algorithms. Google came out with the paper and introduced the graph processing system called Pregel [6]. Pregel is in-memory

978-1-4799-6959-3/15/$31.00 ©2015 IEEE

II. R ELATED W ORK AND BACKGROUND The significant work has been done in developing inmemory Pregel like graph processing systems. At Stanford University, Dr Jenifer Widow and her students developed the graph processing system called GPS(Graph Processing System) [8]. At university of Southern California, Dr Yogesh Simmhan et al. [9] built similar graph processing system called GoFFish. Giraph is another open source project for in-memory graph processing and now it is under apache software foundation [10]. The Graph Coloring problem is extensivly studied for solution using parallel algorithms. Luby’s MIS algorithm finds independent set of vertices in parallel [11]. Luby’s MIS algorithm is randomized algorithm which assign random number to each uncolored vertex in every iteration of algorithm. Jones-Plassman improved this approach and

110


instead of calculating random number for each vertex in each iteration, they proposed to calculate random numbers in beginning only and use it throughout the program [12]. Welsh and Powell proposed similar approach for graph coloring problem [13]. They proposed to used vertex degree instead of random number. In their algorithm, they identify the independent set of vertex with highest degree in neighbors and color them. Mutula proposed improvement over Welsh and Powell’s algorithm [14]. E G Boman et al. proposed Parallel Graph Coloring Algorithm for Distributed Memory Computers [15]. In their work, they partition and distribute graph among processors and each processor takes responsibility of coloring the received partition. The coloring task is done in parallel with multiple iterations and in each iteration processors color the graph, communicate for conflict and resolve it. Their approach do fast coloring but does not gives optimal solution. R Abbasian and M Mouhoub gave their effort to give solution using genetic algorithms approach [16].

Figure 1: Vertex State Transition Diagram

greedy in nature and does not guarantee optimal solution. Certain assumption are made in these algorithms. These algorithms are for undirected graph. Each vertex is assumed to be associated with unique number. This unique number of vertex is indeed its V ertexId. Each vertex also has one variable to store any kind of data, which is called V ertexV alue. At the end of computation, the assigned color for each vertex is stored in V ertexV alue. Instead of a color, we assign the number to vertices. A. Local Maxima First Local Maxima First algorithm is based on Luby’s algorithm. This algorithm follows greedy approach with greed over maximum value of V ertexId between neighbors. In the first Superstep, all vertices set their V ertexV alue as -1 as part of initialization and send its V ertexId to all neighbor for next Superstep. The V ertexV alue as -1 represent that vertex is not colored yet. In all the next Superstep, if it is not colored than only it take part in computation otherwise vote for halt. The uncolored vertices checks its uncolored neighbors using incoming message and determine if it has the largest V ertexId. If true than it colors itself by setting up its V ertexV alue as current Superstep number. Otherwise it send its V ertexId to all neighbor for next Superstep computation and vote to halt.

A. Programing Model of Pregel like System In Pregel like systems, program consist of a sequence of Supersteps < S0 , S1 , S2 , ..., Sn >. The input to the program is a Graph(G0 ) in the form of set of tuples. These tuples are represented as . The execution of the program on input G0 represented as shown below. For each Superstep Sr = S0 , S1 , S2 , ...., SR do: 1) Execute Compute: • For input Graph Gr , each vertex represented as tuple and corresponding incoming multiset of messages represented as tuple are pushed for computation. • Process each active input vertex. For S0 , all vertices are active. Figure 1 represent the vertex state diagram. • Each vertex can generate multiset of messages for any destination vertex,Those two does not have to be connected in graph ,..,. • The input Graph Gr is also mutable. 2) Vote To Halt: • For i=0 to n, each V ertexIdi vote for halt. 3)Termination: • For Superstep Sr , where r>0. if all the vertices are in inactive state and there isn’t any message generated for next Supertstep than platform terminates the algorithm. III. D ISTRIBUTED G RAPH C OLORING A LGORITHMS we have presented Local Smallest-Largest Degree First algorithms for large scale graph coloring on Pregel like system. We have also discussed other three approaches for solving this problem. All the algorithms discussed below are

Figure 2: Example:Local Maxima First

111


and minimum values of V ertexId between neighbors. The idea is selecting vertices with Minimul V ertexId and Maximul V ertexId at a same time in neighbor and color them. Those both vertices are assigned different color because Minimul and Maximul V ertexId can be directly connected with each other. The algorithm gives intuition that being coloring two vertices at a time, this algorithm should finish in near to half time of the first algorithm and should take nearly half Supersteps than first. Later the experiment also show the result which support our intuition.

Algorithm 1: Local Maxima First Input: Graph in the adjacency list representation Output: Graph with each node labeled with a color class LMF { void Compute(Vertex, Message){ Set Maxima=true if Superstep=0 then Set Vertex.Value=-1 Send Vertex.Id to all the Neighbors. else if Vertex.Value=-1 then for each Message: msg do if Vertex.Id < msg then Maxima=false

Algorithm 2: Local Minima-Maxima First Input: Graph in the adjacency list representation Output: Graph with each node labeled with a color class LMMF { void Compute(Vertex, Message){ Set Maxima=true Set Minima=true if Superstep=0 then Set Vertex.Value=-1 Send Vertex.Id to all the Neighbors. else if Vertex.Value=-1 then for each Message: msg do if Vertex.Id < msg then Maxima=false if Vertex.Id > msg then Minima=false

if Maxima=true then Set Vertex.Value=CurrentSuperstep else Send Vertex.Id to all the Neighbors. VoteToHalt() } } End

Example 1: Consider a graph in Figure 2. Initially each vertex sends its V ertexId to their neighbor vertices. In the next Superstep, vertex set {4,5} come out as winner with maximum V ertexId among their neighbors and get colored. Vertices, which are already colored, does not participate in any computation in further Supersteps and only uncolored vertices send their V ertexId to their neighbor vertices. In 2rd Superstep, vertex set {3} is colored and in last two Supersteps, vertex set {2} and {1} are colored respectively. Analysis: The soundness of the algorithm can be justified as follow. For any (i,j) ∈ E, if i and j are not colored in Superstep Si−1 than either i or j can be colored in Superstep Si because VertexId(i)6= VertexId(j). Both can not be colored at the same time. In this algorithm, vertices are maintained in two set, ColoredSet and NotColoredSet. Initially ColoredSet is empty and NotColoredSet contains all the vertices. In each Superstep, every vertices in NotColoredSet send O(degree(v)) messages, where v ∈ V. Each message here takes O(1) space. The vertices are selected on the basis of their VertexId and VertexId has no relation with graph structure. So the performance of the algorithm is very random.

if Maxima=true then Set Vertex.Value= 2 ∗ CurrentSuperstep − 1 else if Minima=true then Set Vertex.Value= 2 ∗ CurrentSuperstep else Send Vertex.Id to all the Neighbors. VoteToHalt() } } End Example 2: Consider a graph in Figure 3. Initialization is same as Example 1. In the next Superstep, vertex sets {4,5} and {1} becomes winner with maximum and minimum V ertexId among their neighbors and colored with two different colors. In last Superstep, vertex sets {2} and {3} colors them self. Analysis: The soundness of the algorithm follows same as of previous algorithm. For any (i,j) ∈ E, it is possible that i is selected for Local Minima and j is selected for Local Maxima independent set but we give different color to each independent set so both can not be colored with same color.

B. Local Minima-Maxima First Local Minima-Maxima First is second algorithm as improvement over Local Maxima First algorithm. This algorithm follows greedy approach with greed over maximum

112


Algorithm 3: Local Largest Degree First Input: Graph in the adjacency list representation Output: Graph with each node labeled with a color class LLDF { void Compute(Vertex, Message){ Set MaxVal=true if Superstep=0 then Set Vertex.Value=-1 Message=Vertex.Id+Vertex.getNumEdge() Send Message to all the Neighbors else if Vertex.Value=-1 then for each Message: msg do VertexIdMsg=msg.contain(1) NumEdgeMsg=msg.contain(2) if Vertex.getNumEdge() < NumEdgeMsg then MaxVal=false if Vertex.getNumEdge() = NumEdgeMsg then if VertexIdMsg > Vertex.id then MaxVal=false

Figure 3: Example:Local Minima-Maxima First

The message and message space complexity is same as above algorithm. The performance of this algorithm is also random because of the same reason as of Local Maxima First but it takes half time to complete than Local Maxima First algorithm for same input graph. C. Local Largest Degree First Local Largest Degree First algorithm works with a idea of using vertex degree instead of V ertexId. This algorithm follows greedy approach with greed over maximum degree of vertex between neighbors. The basic intuition is if we color the nodes with higher degree than it may lead to better solution in terms of number of color used. In this algorithm, initialization is done in first Superstep. The element V ertexV alue is set to -1 which represent that vertex is not colored. Each vertex send its V ertexId and degree to all the neighbors. The degree of vertex can be determined using getNumEdge() method. In the further Supersteps, each uncolored vertex determine if it contains the largest degree between neighbors. If yes than it colors it self otherwise it sends its V ertexId and degree to neighbors. In case of same degree, the tie-breaking is done with highest V ertexId. Example 3: Consider a graph in Figure 4. The graph is initialized and each vertex sends its V ertexId and V ertexDegree to their neighbors. In the next Superstep, vertex {3} come out as clear winner. Vertices, which are colored, does not participate in any computation in further Supersteps and only uncolored vertices send their V ertexId and V ertexDegree to their neighbors. Like wise in each further Supersteps, vertex sets {2,5} and {1,4} are colored. Analysis: The soundness of the algorithm can be justified as follow. For any (i,j) ∈ E, i and j are not colored in Superstep Si−1 . There are three case for i and j. (a) degree(i) > degree(j) (b) degree(i) < degree(j) (c) degree(i) = degree(j). For case (a) and (b), i and j can not be colored in the same Superstep. For case (c), as we know, VertexId(i) 6= VertexId(j). So either i or j can be colored based on VertexId. This algorithm also maintain vertices in two set, ColoredSet and NotColoredSet. In each Superset, independent set is identified from NotCol-

if MaxVal=true then Set Vertex.Value= CurrentSuperstep else Message=Vertex.Id+Vertex.getNumEdge() Send Message to all the Neighbors VoteToHalt() } } End

oredSet and moved them into ColoredSet. The vertices with higher degree in neighborhood are selected for independent set which leads to NotColoredSet having low vertex degree. The NotColoredSet subgraph with low vertex degree leads to better color optimization [17][18].So this approach performs better than above discussed. D. Local Smallest-Largest Degree First Local Smallest-Largest Degree First algorithm is modified and improved version of previous algorithm. The idea is similar to that of Local Minima-Maxima First algorithm, to create two independent set at a time using degree of vertex and color them with two different colors. For uncolored vertices, we select set of vertices with smallest degree and set of vertices with largest degree at a same time. In the conflict of same vertex degree of neighbors, the V ertexId is used to break the tie. In case of tie, vertex with larger V ertexId is further considered for set of largest degree vertices and vertex with smallest V ertexId is considered

113


Algorithm 4: Local Smallest-Largest Degree First Input: Graph in the adjacency list representation Output: Graph with each node labeled with a color class LSLDF { void Compute(Vertex, Message){ Set MaxVal=true Set MinVal=true if Superstep=0 then Set Vertex.Value=-1 Message=Vertex.id + Vertex.getNumEdge() Send Message to all the Neighbors else if Vertex.Value=-1 then for each Message: msg do VertexIdMsg=msg.contain(1) NumEdgeMsg=msg.contain(2) if Vertex.getNumEdge() < NumEdgeMsg then MaxVal=false else if Vertex.getNumEdge() > NumEdgeMsg then MinVal=false else if Vertex.getNumEdge()=NumEdgeMsg then if VertexIdMsg>Vertex.Id then MaxVal=false if VertexIdMsg