Garage of Code: Finding all maximal cliques Part II

Finding all maximal cliques Part I

In the previous post, I presented an algorithm for finding all maximal cliques in a graph. A subgraph Q is a clique if all nodes in Q are connected to each other. We call Q a maximal clique if there is no node outside of Q that is connected to all nodes in Q. The maximal cliques are not to be confused with the "maximum clique", the maximal clique that is largest in some sense (e.g. number of nodes or sum of edge weights).

The key idea of the algorithm was that we only need to consider sorted cliques, where the sorting key can be chosen arbitrarily. Sorting the nodes based on their degree, so that we first append lower degree nodes to cliques, should help us reduce the remaining pool of candidate nodes more quickly. The algorithm was indeed faster than networkx's implementation for large, random graphs. The speedup was about 30%. However, for graphs with more structure, particularly the complete graph, the algorithm is several times slower than the networkx implementation. To see why, we have to look at the method that networkx uses.

It is objectively trivial to find all maximal cliques in the complete graph (there is just one, the set of all nodes), so it should be a trivial case for a maximal clique-finding algorithm. The networkx implementation is based on research, such as [1, 2, 3]. The common key idea is to use a pivot element. During the recursion, we always keep track of the current clique, and the remaining candidates, the nodes that are connected to all the nodes in the current clique. The pivot element can be chosen arbitrarily among the candidates, but we shall see that there is an optimal way to choose it. The pivot element P partitions the candidates into two groups: those that are neighbours of P, call those P-minus, and those that are not, call those P-plus. Note that P is not a neighbour of itself, so P is in P-plus.

Now, adding any subset of P-minus to the current clique will not form a maximal clique, because we can always add P to make it larger (since P is a neighbour of all all nodes in P-minus). So, we don't need to branch on any element in P-minus. The optimal way to choose P is of course the one that produces the largest P-minus, since that prunes the most branches.

Let's examine what the pivot trick does to the complete graph, say K(10). In the beginning, all nodes are candidates. All candidates have 9 neighbours, so we pick a P at random. Now we loop through all non-neighbours of P, which happens to be just P! In the recursion, we feed the algorithm with the neighbours of P, which is K(9), the complete graph with 9 nodes. The same thing happens every time we recurse. In practice, the branching factor for the complete graph is just 1, and the running time is linear.

Can the pivot trick be used together with the node sorting? If there is, I haven't found it.

References

[1] Cazals, Frédéric, and Chinmay Karande. "A note on the problem of reporting maximal cliques." Theoretical Computer Science 407.1-3 (2008): 564-568.

[2] Tomita, Etsuji, Akira Tanaka, and Haruhisa Takahashi. "The worst-case time complexity for generating all maximal cliques and computational experiments." Theoretical Computer Science 363.1 (2006): 28-42.

[3] Bron, C. and Kerbosch, "Algorithm 457: finding all cliques of an undirected graph". *Communications of the ACM* 16, 9 (Sep. 1973), 575--577.

Garage of Code

codecogs equations

onsdag 9 oktober 2019

Finding all maximal cliques Part II

Inga kommentarer:

Skicka en kommentar