Garage of Code: Finding all maximal cliques Part I

I want to find all maximal cliques in a given graph G. A subgraph Q is a clique if all nodes in Q are connected to each other. In a clique, everyone knows everyone. We call Q a maximal clique if there is no node outside of Q that is connected to all nodes in Q. Example:

The two maximal cliques are circled. Note that e.g. {1, 2} is a clique but not a maximal clique, because both 1 and 2 are connected to 3.

I read a paper [1] which is about finding triangles in a graph. A triangle is a clique with exactly 3 members. An interesting observation is made that we only need to consider sorted triplets (u, v, w), u <= v <= w, where the sorting key can be anything we like. In [1], the degree of the node in ascending order is used, which is smart because it leads to a smaller branching factor. It should be possible to apply the same idea to cliques of any size.

I came up with an algorithm like:

Term explanation:

adj: adj[u] is a set of nodes adjacent to u, excluding u itself.

higher_degree_adj: higher_degree_adj[u] is a set of nodes adjacent to u, wit ha higher degree than u. If two adjacent nodes have the same degree, symmetry is broken randomly.

Q: is the current clique.

cand: is all nodes that are connected to all nodes in Q. If cand is empty, we have found a maximal clique, and we return Q.

When we add a node u to the current clique, we recurse but iterate only over the higher degree neighbours of u. Suppose u has a neighbour v with a lower degree, that is also in cand. If there is a maximal clique that contains both u and v, we will find it when we recurse from v.

Results

Let's try this out on some random graphs, and compare it to networkx's builtin clique finder [2]:

time / ms       networkx        v001
n=1e1, m=1e1         0.1         0.1
n=1e2, m=1e2         0.3         0.6
n=1e2, m=1e3         1.5         1.4
n=1e3, m=1e3         2.8        30.7
n=1e3, m=1e4        13.3        12.7
n=1e3, m=1e5       634.4       373.9
n=1e3, m=2e5      9507.6      6909.8

Wow! For the largest graphs, it is significantly faster! For the smaller graphs, probably the overhead of building the higher_degree_adj is not worth it.

However, most graphs in reality are not just "random" but they have some structure. Let's see how we do on these:

time / ms                     networkx               v001
n=1e1, m=1e1                       0.1                0.1
n=1e2, m=1e2                       0.4                0.6
n=1e2, m=1e3                       1.4                1.3
n=1e3, m=1e3                       2.7                4.8
n=1e3, m=1e4                      11.8               15.1
n=1e3, m=1e5                     613.3              350.7
n=1e3, m=2e5                    9242.6             6617.7
caveman(100,10)                   12.4               81.3
windmill(10, 10)                   0.9                7.1
nary_tree(3, 1000)                 3.3                5.5
complete(20)                       0.6              703.4

Whops! Dismal score on all the "special structure" graphs! Seems like I cheered too soon, here. Particularly the complete graph seems to be trivial for networkx, but is very cumbersome for my algorithm. So that is probably a clue to what the weakness is.

See Part II for the continuation of this!

References

[1] Roughgarden, Tim. "CS167: Reading in Algorithms Counting Triangles." (2014).

[2] networkx.algorithms.clique.py (find_clique)

Garage of Code

codecogs equations

tisdag 8 oktober 2019

Finding all maximal cliques Part I

Inga kommentarer:

Skicka en kommentar