![]() |
The two maximal cliques are circled. Note that e.g. {1, 2} is a clique but not a maximal clique, because both 1 and 2 are connected to 3. |
I read a paper [1] which is about finding triangles in a graph. A triangle is a clique with exactly 3 members. An interesting observation is made that we only need to consider sorted triplets (u, v, w), u <= v <= w, where the sorting key can be anything we like. In [1], the degree of the node in ascending order is used, which is smart because it leads to a smaller branching factor. It should be possible to apply the same idea to cliques of any size.
I came up with an algorithm like:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def find_cliques_v001(G): | |
adj = {u: {v for v in G[u] if v != u} for u in G} | |
higher_degree_adj = get_higher_degree_adj(G) | |
Q = [] | |
def get_cliques(cand, hdeg_adj_v): | |
for u in cand & hdeg_adj_v: | |
Q.append(u) | |
cand_u = cand & adj[u] | |
if not cand_u: | |
yield Q[:] | |
else: | |
for clique in get_cliques(cand_u, higher_degree_adj[u]): | |
yield clique | |
Q.pop() | |
return get_cliques(set(G), set(G)) |
Term explanation:
adj: adj[u] is a set of nodes adjacent to u, excluding u itself.
higher_degree_adj: higher_degree_adj[u] is a set of nodes adjacent to u, wit ha higher degree than u. If two adjacent nodes have the same degree, symmetry is broken randomly.
Q: is the current clique.
cand: is all nodes that are connected to all nodes in Q. If cand is empty, we have found a maximal clique, and we return Q.
When we add a node u to the current clique, we recurse but iterate only over the higher degree neighbours of u. Suppose u has a neighbour v with a lower degree, that is also in cand. If there is a maximal clique that contains both u and v, we will find it when we recurse from v.
Results
Let's try this out on some random graphs, and compare it to networkx's builtin clique finder [2]:
time / ms networkx v001
n=1e1, m=1e1 0.1 0.1
n=1e2, m=1e2 0.3 0.6
n=1e2, m=1e3 1.5 1.4
n=1e3, m=1e3 2.8 30.7
n=1e3, m=1e4 13.3 12.7
n=1e3, m=1e5 634.4 373.9
n=1e3, m=2e5 9507.6 6909.8
Wow! For the largest graphs, it is significantly faster! For the smaller graphs, probably the overhead of building the higher_degree_adj is not worth it.
However, most graphs in reality are not just "random" but they have some structure. Let's see how we do on these:
time / ms networkx v001
n=1e1, m=1e1 0.1 0.1
n=1e2, m=1e2 0.4 0.6
n=1e2, m=1e3 1.4 1.3
n=1e3, m=1e3 2.7 4.8
n=1e3, m=1e4 11.8 15.1
n=1e3, m=1e5 613.3 350.7
n=1e3, m=2e5 9242.6 6617.7
caveman(100,10) 12.4 81.3
windmill(10, 10) 0.9 7.1
nary_tree(3, 1000) 3.3 5.5
complete(20) 0.6 703.4
Whops! Dismal score on all the "special structure" graphs! Seems like I cheered too soon, here. Particularly the complete graph seems to be trivial for networkx, but is very cumbersome for my algorithm. So that is probably a clue to what the weakness is.
See Part II for the continuation of this!
References
[1] Roughgarden, Tim. "CS167: Reading in Algorithms Counting Triangles." (2014).
[2] networkx.algorithms.clique.py (find_clique)
Inga kommentarer:
Skicka en kommentar