A parallel maximum clique algorithm for large and massive sparse graphs Pablo San Segundo1, Alvaro Lopez1, Jorge Artieda1, Panos M. Pardalos2 1
Centre for Automation and Robotics (UPM-CSIC) Jose Gutiérrez Abascal 2, Madrid 28006, Spain Email:
[email protected]
2
Center for Applied Optimization, University of Florida, 401 Weil Hall, P.O. Box 116595, Gainesville, FL 32611-6595, USA
ABSTRACT This paper describes BBMCPara, a new parallel exact maximum clique algorithm tailored for large and massive sparse graphs. The paper first presents a sequential algorithm BBMCSP, which builds on ideas from a leading bit-parallel published algorithm for middle-size graphs. It employs heavy pre-processing and a new sparse bitset encoding to outperform other state-of-the-art algorithms by up to several orders of magnitude over a set of real networks. BBMCPara parallelizes BBMCSP by splitting according to a preprocessing step of the latter. On a 20-core computer, it averages speedups close to an order of magnitude over real graphs of up to 3 million vertices. According to the reported results, BBMCPara appears to be the current fastest algorithm for large and massive real networks to the best of our knowledge. Keywords: massive, bitstring, branch-and-bound, maximum clique, parallel, combinatorial optimization
1
Introduction
The maximum clique problem is a challenging and deeply studied NP-hard problem in computer science [1] which has found applications in many fields. Given a simple undirected graph G = (V , E ) , a complete subgraph or clique is such that all its vertices are pairwise adjacent. The goal of the maximum clique problem is to find a clique of the largest cardinality ω (G ) . Known applications appear in a wide variety of fields, such as bioinformatics [2-4], computer vision, [5] and robotics [67]. The recent profusion of data sets in the form of large or massive networks (social, collaboration, infrastructure, etc.) has brought renewed interest in the problem’s scalability, since clique kernels can help to identify clusters or simply provide a deeper insight into structure. This paper describes a parallel exact maximum clique algorithm for such networks.
In the last decade, state of the art has been dominated by branch-and-bound algorithms which compute approximate color bounds for each subproblem traversed during search. Notable examples are the Tomita et al. family of algorithms MCS [8] and MCR [9], the bit-parallel family BBMC [10], BBMCR [11], BBMCL [12] and BBMCX [13] of San Segundo et al., and the MaxSAT-based bound algorithm of Li et al. first implemented in MaxCLQ [14]. For an overview of the overarching ideas and earlier relevant algorithms [15-18], we recommend two recent surveys [19-20]. None of the above-mentioned algorithms are, however, tailored for massive graphs; one major setback is that they encode the adjacency matrix in full, which normally results in unacceptable memory requirements. Much less common than sequential algorithms are algorithms which exploit multicore architectures. One example is the master-slave type algorithm of Pardalos et al. [21]. More recently, McCreesh and Prosser [22] and Depolli et al. [23] have reported good performance for multi-core parallelism of two sequential exact maximum clique algorithms, despite earlier claims in literature indicating the opposite. The two algorithms show consistent speedups over a number of DIMACS1 graphs. Moreover, superlinearity occurs in a few graphs when structure allows to prove optimality very quickly once a feasible solution is found. Both [22] and [23] use BBMC [10] as sequential algorithm in the worker threads, and thus are expected to suffer from the already mentioned memory problems when confronted with massive graphs. PMC [24] and FMC [25] are two of the few algorithms in literature tailored for large and massive networks. Both employ an adjacency list representation and unroll the first level of the search tree to enforce early pruning. Of the two, PMC is the only one which exploits multi-core parallelism, so it is considered as reference in this work. This paper first describes the sequential algorithm BBMCSP tailored for massive graphs. The algorithm takes bit-parallel BBMC as starting point and uses a novel bit-compressed representation of the adjacency matrix to reduce memory requirements, while preserving the efficient bitmask operations of BBMC. Moreover, it improves preprocessing techniques described by PMC and FMC, and adapts them to the new encoding. The paper then describes an exact parallel maximum clique algorithm BBMCPara, which uses BBMCSP as sequential algorithm, and evaluates it over a set of public real networks. In the reported results, BBMCPara clearly outperforms previously published algorithms (sometimes up to several orders of magnitude). The remaining part of the paper is structured as follows: section 2 covers useful definitions and other preliminaries; section 3 describes a related state-of-the-art algorithm. The contributions of the paper appear in sections 4 and 5, which describe the new multi-core algorithm. The last part of this work presents validation in section 6, and conclusions and a summary of contribution in section 7.
1
http://dimacs.rutgers.edu/Challenges/
2
Preliminaries
A simple undirected graph G = (V , E ) consists of a finite set of vertices V = {v1 , v2 , , vn } and a finite set of edges E made up of pairs (u, v) of distinct vertices ( E ⊆ V × V ). Two vertices (u, v) are said to be adjacent (or neighbors) if (u , v) ∈ E . The neighborhood of a vertex, N (v) or N G (v) when the graph needs to be
mentioned explicitly, is defined as N (v) = {u ∈V | (u, v) ∈ E} . Any subset of vertices
U ⊆ V , induces a subgraph G[U ] = (U , E[U ]) such that all its edges have both endpoints in U. A vertex k-coloring of G , C (G ) , is an assignment of a label or color from {1, 2, , k} to every vertex of G such that the endpoints of any edge receive different colors. Any feasible k-coloring partitions the vertex set V in k disjoint subsets C1 , C2 , , Ck such that
k i =1
Ci = V and G[Ci ] is an independent set (i.e. a set
of pairwise non-adjacent vertices) for all i ∈ k . We refer to the number of colors | C (G ) | as the size of the coloring. The smallest number of colors required to color a graph G is called chromatic number and denoted by χ (G ) . An important known property, which relates to vertex coloring and maximum clique, is that ω (G ) ≤ χ (G ) [26]. It follows, trivially, that the size of any feasible vertex coloring is also an upper bound for ω (G ) . Most leading exact branch-and-bound maximum clique algorithms are constructive, and combine exhaustive enumeration of maximal cliques (possibly with repetitions) with a pruning strategy based on approximate vertex coloring. At each step of the search, typically implemented as a new recursive call to the algorithm, a vertex is selected to enlarge a clique. Thus, each branch of the search tree contains a maximal clique, and all leaf nodes are checked to find the clique of largest cardinality. Before going into specific details, we list some useful definitions and notations which appear in the algorithms described in this work: − U: the list of vertices of the current subproblem. − Uv : the list of vertices of the child subproblem which results from branching on vertex v. − S: the clique under construction in the current step of the search. − Smax: the incumbent (best) clique found at any point during search. − L ⊆ U : the list of candidate vertices for branching in the current subproblem. − Lv ⊆ U v : the list of candidate vertices of the child subproblem which results from branching on vertex v. − c(v): the color number assigned to vertex v. − kmin: a pruning threshold; vertices assigned a lower color number in any child subproblem Uv will be pruned. − greedy sequential coloring procedure (SEQ): an approximate coloring heuristic which assigns to each vertex in a predefined order the lowest possible color number consistent with the current partial coloring. − greedy independent set coloring (or class coloring): An implementation of SEQ which produces color sets sequentially in increasing label order. − Cv: a vertex coloring of the subgraph induced by U v , i.e. C (G[U v ]) .
Finally, we also consider the following notations and definitions in connection with the initial sorting of vertices of a graph: − degree of a vertex v (deg(v)): the number of vertices adjacent to v. − maximum graph degree ( ∆G ): the maximum degree of any of its vertices. − width of a vertex ( w (vi ) ): the number of preceding vertices adjacent to vi. − width of a vertex ordering: the maximum width of any of its vertices. A minimum width ordering may be obtained by iteratively removing vertices with minimum degree from the original set of vertices, and placing them in reverse order in the new set. The resulting natural order is also called a degeneracy ordering [27].
3
The reference sequential algorithm
Algorithm 1 describes SEARCH, the branch-and-bound bit-parallel maximum clique algorithm employed as a starting point in the paper, which roughly corresponds with BBMCI [11]. The enumeration process, performed in the main loop of the algorithm (steps 1-2, 4-6, 12, 14-15), constructs one maximal clique for each branch of the search tree, and stores it in set S. At each step (a recursive call to the algorithm), S is enlarged by the vertex v selected for branching in step 2. Once a leaf node is reached, the algorithm checks if the new maximal clique improves the incumbent solution (step 8). For a given subproblem U, the vertices of the new child subproblem are those adjacent to the branched vertex which are also in U. In SEARCH this is implemented by an efficient set intersection operation in step 6. SEARCH employs the sequential greedy coloring procedure SEQ described in COLOR (Algorithm 2) to compute an upper bound for the size of the maximum clique in each subproblem. Branching on a vertex v becomes unnecessary if it can be inferred from its color label c(v) that any clique containing vertices S ∪ {v} cannot improve the incumbent solution (step 3 of SEARCH). The actual implementation of COLOR employs class coloring, i.e. computes color sets sequentially with the help of sets COL, the color set under construction, and UNCOL, the set of remaining uncolored vertices. The control flow contains two nested loops. The outer loop copies UNCOL to COL (step 3), and keeps track of the current color assignment. Each iteration of the inner loop branches greedily on the next vertex in COL and removes, by means of a set difference operation, its neighbor set (step 6). Coloring is not only used to prune the search space. Since [8], it has also been consistently employed as a branching strategy, i.e. vertices with maximum color are explored first. In practice, COLOR conveniently outputs the list L of candidate vertices of the child subproblem sorted by increasing color label, so that they are later selected by non-increasing color when taken in reverse order from L in step 2 of SEARCH. A bitstring (also bitset or bitmap) offers fast set operations, such as setintersection or set-union, and is more cache-friendly than a traditional list. On the other hand, it is slower than a list in operations which concern single elements, such as enumeration. Moreover, the order of its elements is predefined upon construction and cannot be changed.
Maximum clique algorithms are well known to benefit from a bitstring representation [10-12]. In SEARCH, bitstrings encode the adjacency matrix, the list of vertices which make up the current subproblem U and auxiliary sets COL and UNCOL. As a result, child subproblem construction (step 6 of SEARCH) and COL update (step 6 of COLOR) are set operations that benefit from bitmasks. We note that the list of candidate vertices L should not be bit-encoded, since it contains the branching order. We end this description with a brief note on initialization of data structures. SEARCH uses simple upper bounds C (V ) for maximum clique at the root node. Specifically, for all vi ∈ V , c(vi) is the minimum value between index and maximum graph degree incremented by one. Note that this does not constitute a legal coloring in the general case, but ω (G[{v1 ,..., vi }]) ≤ c(vi ) holds for all vi ∈ V , which ensures completeness if vertices are selected at the root node in reverse order. Much more effort goes into sorting vertices at the root node, which is well known to have a significant impact on the size of the search tree in exact maximum clique algorithms. A common strategy in literature is to branch on vertices with small degree first. To achieve this, and consistent with the initialization of C (V ) , vertices are initially sorted according to minimum width and taken in reverse order. Algorithm 1. Reference sequential maximum clique algorithm.
Input: A simple graph G = (V , E ) sorted according to degeneracy Output: A maximum clique in Smax Initial step U ← V , S ← φ , S max ← φ , c(vi ) ← min {i, ∆G + 1} ∀ i ∈ V , C ← C (V ), L ← V SEARCH (U , S , S max , C , L) 1.
repeat until L = φ
2. 3.
select a vertex v from L in reverse order and update L if ( S + c(v) ≤ S max ) then return
4.
U ← U \ {v}
5.
S ← S ∪ {v}
6.
U v ← U ∩ N G (v )
7.
if U v = φ then if S > S max then S max ← S
8. 9. 10.
else
kmin ← max ( S max − S , 1)
11.
{C , L } ← COLOR (U , k
12.
SEARCH(U v , S , S max , C v , Lv )
13. 14.
v
v
endif S ← S \ {v}
15. endrepeat
v
min
)
//max. color branching
Algorithm 2. COLOR procedure which performs an independent set sequential coloring.
COLOR (U v , kmin ) Initial step: UNCOL ← U v , COL ← φ , L ← φ , k ← 0 1. 2. 3. 4.
repeat until UNCOL = φ
k ← k +1
//index of a new color subset
COL ← UNCOL repeat until COL = φ
5. 6.
v select the next vertex from COL in order COL ← COL \ N (v)
7.
UNCOL ← UNCOL \ {v}
8.
if ( k ≥ kmin ) then
9.
Ck ← Ck ∪ {v}
10.
L ← L ∪ {v}
11. endif 12. endrepeat 13. endrepeat
{
//L is sorted by increasing color number
}
14. return { Ckmin , Ckmin +1 , , Ck , L}
4
A sequential algorithm for large and massive graphs
SEARCH is an efficient algorithm for small and middle size graphs. It does not scale well, however, for large or massive sparse graphs because the adjacency matrix is stored in full, a common occurrence in the vast majority of efficient exact maximum clique algorithms in literature. In particular, all vertex bitstrings in SEARCH allocate memory for |V|/w blocks of bits, where w is the size of the register word, typically 32 or 64 in commercial CPUs. Note that this is independent of the number of vertices they contain at any point in time.
4.1 A new bit-set encoding Dealing with large and massive graphs requires some form of compression to reduce memory requirements as well as spurious operations. Noteworthy existing algorithms are PMC and FMC. Both use row compression in the form of an edge list which maps each row j of the adjacency matrix to the neighbor set of vertex j. #include typedef unsigned long long BITBLOCK; struct element{ unsigned int block_index; BITBLOCK b64; }; vector mySparseBitstring;
//a 64-bit block
Figure 1. A possible sparse bitstring declaration in C++
The first (and critical) improvement to tailor SEARCH for massive networks is a new bitstring encoding for vertex sets which only allocates memory for non-empty blocks of bits, i.e. those which contain at least one vertex. From now on, they will be referred to as sparse bitstrings. We first attempted to implement them as pairs of an associative container, but it did not yield the desired results. The current implementation encodes sparse bitstrings as an ordered-by-index sequence container of blocks of bits. We further extend the encoding to a graph by mapping the rows of its adjacency matrix to the new sparse bitstrings. Figure 1 shows a possible declaration of a sparse bitstring in C++. The main advantages of the new sparse bitstrings are reduced memory allocation and the elimination of spurious bitmask operations over empty blocks. An obvious disadvantage is the additional overhead both to access individual vertices, now a logarithmic operation, and to perform set computations, since only blocks with corresponding indexes may be masked. Nevertheless, it is possible to reduce this overhead significantly for exact maximum clique considering the following: − The structure of large and massive networks: While it is true that set operations require index block alignment, few blocks are to be expected in most bitstrings due to the sparse nature of the problem. − No upkeeping of empty blocks: spurious blocks which become empty after a set deletion operation should be left in the sparse bitstring. As before, this relaxation should not introduce a significant overhead because the block density of bitstrings is expected to be low according to the problem’s structure. − Vertex enumeration does not incur in further overhead, since blocks are always kept sorted by index. − Insertion operation is not required: insertion only occurs in sets S (step 5 of SEARCH) and L (step 10 of COLOR). Both are implemented as conventional arrays. The source code for sparse bitstrings has been released as part of the BITSCAN library [28] and hosted in the dependency manager Biicode [29]. BITSCAN also implements the classical bitstring representation.
4.2 k-core decomposition The notion of core in graphs was introduced by Seidman in [30]. A k-core (core of order k) is a subset of vertices H that induces a subgraph such that all its vertices have degree at least k and H is maximal with this property (i.e. there is no other subset of vertices with this property that strictly contains H). The individual cores of a graph constitute a hierarchical partitioning of its vertex set, so that every core of order k contains the remaining cores of higher order. The core number of a vertex K(v) is the core of the highest order to which that vertex belongs. The core number of a graph K(G) is the highest order of any of its vertices. The importance of k-cores, concerning finding a maximum clique in massive graphs, is based on the following facts: − The core number of a graph (plus one) is a tighter bound for maximum clique than maximum graph degree (plus one): in many real networks, a very small number of individuals are highly connected, e.g. networks with power-law degree distributions; thus maximum graph degree does not capture the structure adequately. We note, however, that core-based bounds are expected to be less tight than those obtained through sequential coloring. − Core decomposition may be computed in O(|E|) [31]: this is very important in the case of the graphs considered in this work. Note that a direct implementation based on minimum width will run in O(| V |2 ) , which is inadequate. − A core decomposition algorithm also gives a degeneracy ordering, the natural order given by the cores of the vertices. It may therefore be used both for bounding and initial sorting.
4.3 The sequential algorithm Algorithm 3 outlines the new algorithm BBMCSP, which attempts to reduce the scale of the problem as much as possible before the actual search begins. It starts by computing core numbers for all vertices (step 1) and finds an initial solution (step 2) in the same way as the PMC algorithm [24]. Vertices with a core number lower than S max are then explicitly removed from the graph in step 3 and the new graph G′ is sorted according to degeneracy (as derived from core analysis). The main search loop (steps 5-9) branches on vertices in G′ in reverse order, unrolls the first level of the search tree and calls INITIAL_SEARCH to analyze each child root subproblem. This is an important idea, because subproblems are expected to be much smaller than the order of the input graph. In the best case, they may be again simplified; at worst, they will require less memory in the new sparse bit-encoding. Moreover, first level unrolling allows for an easy job distribution in the proposed parallel algorithm —see section 5. INITIAL_SEARCH is described in Algorithm 4. First, a simple degree-based pruning is attempted on the subproblem in step 2. If this fails, a sequential colorbased pruning is attempted (steps 3-4) followed by a core-based one (steps 6-7). Finally, vertices are removed from the candidate list if their core number is inadequate (step 8) and the remaining subproblem is ordered according to degeneracy (step 9). The actual NP-hard step occurs when SPARSE_SEARCH is called (step 11), which refers to the SEARCH procedure described in section 3, implemented now with the new sparse bit-encoding. Noteworthy is that vertex core numbers computed in step 6 of INITIAL_SEARCH are used as the initial color
labels of SPARSE_SEARCH. This utilizes most of the information obtained during pre-processing. Algorithm 3. BBMCSP, the new maximum clique algorithm for massive graphs.
Input: A simple graph G = (V , E ) Output: A maximum clique in S max BBMCSP(G) 1. Perform core analysis and compute K(G) 2. S max INITIAL_CLIQUE(G)
//an initial solution
3.
V ′ ← V \ {v ∈ V : K (v)