![]() |
The two maximal cliques are circled. Note that e.g. {1, 2} is a clique but not a maximal clique, because both 1 and 2 are connected to 3. |
I read a paper [1] which is about finding triangles in a graph. A triangle is a clique with exactly 3 members. An interesting observation is made that we only need to consider sorted triplets (u, v, w), u <= v <= w, where the sorting key can be anything we like. In [1], the degree of the node in ascending order is used, which is smart because it leads to a smaller branching factor. It should be possible to apply the same idea to cliques of any size.
I came up with an algorithm like:
Term explanation:
adj: adj[u] is a set of nodes adjacent to u, excluding u itself.
higher_degree_adj: higher_degree_adj[u] is a set of nodes adjacent to u, wit ha higher degree than u. If two adjacent nodes have the same degree, symmetry is broken randomly.
Q: is the current clique.
cand: is all nodes that are connected to all nodes in Q. If cand is empty, we have found a maximal clique, and we return Q.
When we add a node u to the current clique, we recurse but iterate only over the higher degree neighbours of u. Suppose u has a neighbour v with a lower degree, that is also in cand. If there is a maximal clique that contains both u and v, we will find it when we recurse from v.
Results
Let's try this out on some random graphs, and compare it to networkx's builtin clique finder [2]:
time / ms networkx v001
n=1e1, m=1e1 0.1 0.1
n=1e2, m=1e2 0.3 0.6
n=1e2, m=1e3 1.5 1.4
n=1e3, m=1e3 2.8 30.7
n=1e3, m=1e4 13.3 12.7
n=1e3, m=1e5 634.4 373.9
n=1e3, m=2e5 9507.6 6909.8
Wow! For the largest graphs, it is significantly faster! For the smaller graphs, probably the overhead of building the higher_degree_adj is not worth it.
However, most graphs in reality are not just "random" but they have some structure. Let's see how we do on these:
time / ms networkx v001
n=1e1, m=1e1 0.1 0.1
n=1e2, m=1e2 0.4 0.6
n=1e2, m=1e3 1.4 1.3
n=1e3, m=1e3 2.7 4.8
n=1e3, m=1e4 11.8 15.1
n=1e3, m=1e5 613.3 350.7
n=1e3, m=2e5 9242.6 6617.7
caveman(100,10) 12.4 81.3
windmill(10, 10) 0.9 7.1
nary_tree(3, 1000) 3.3 5.5
complete(20) 0.6 703.4
Whops! Dismal score on all the "special structure" graphs! Seems like I cheered too soon, here. Particularly the complete graph seems to be trivial for networkx, but is very cumbersome for my algorithm. So that is probably a clue to what the weakness is.
See Part II for the continuation of this!
References
[1] Roughgarden, Tim. "CS167: Reading in Algorithms Counting Triangles." (2014).
[2] networkx.algorithms.clique.py (find_clique)
Inga kommentarer:
Skicka en kommentar