Best algorithm for detecting cycles in a directed graph
Best algorithm for detecting cycles in a directed graph
Question
What is the most efficient algorithm for detecting all cycles within a directed graph?
I have a directed graph representing a schedule of jobs that need to be executed, a job being a node and a dependency being an edge. I need to detect the error case of a cycle within this graph leading to cyclic dependencies.
Accepted Answer
Tarjan's strongly connected components algorithm has O(|E| + |V|)
time complexity.
For other algorithms, see Strongly connected components on Wikipedia.
Read more... Read less...
Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.
If that's the case, then a topological sort implementation may in any case detect cycles. UNIX tsort
certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.
So the question might become, "how do I most efficiently tsort", rather than "how do I most efficiently detect loops". To which the answer is probably "use a library", but failing that the following Wikipedia article:
has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|)
time complexity.
According to Lemma 22.11 of Cormen et al., Introduction to Algorithms (CLRS):
A directed graph G is acyclic if and only if a depth-first search of G yields no back edges.
This has been mentioned in several answers; here I'll also provide a code example based on chapter 22 of CLRS. The example graph is illustrated below.
CLRS' pseudo-code for depth-first search reads:
In the example in CLRS Figure 22.4, the graph consists of two DFS trees: one consisting of nodes u, v, x, and y, and the other of nodes w and z. Each tree contains one back edge: one from x to v and another from z to z (a self-loop).
The key realization is that a back edge is encountered when, in the DFS-VISIT
function, while iterating over the neighbors v
of u
, a node is encountered with the GRAY
color.
The following Python code is an adaptation of CLRS' pseudocode with an if
clause added which detects cycles:
import collections
class Graph(object):
def __init__(self, edges):
self.edges = edges
self.adj = Graph._build_adjacency_list(edges)
@staticmethod
def _build_adjacency_list(edges):
adj = collections.defaultdict(list)
for edge in edges:
adj[edge[0]].append(edge[1])
return adj
def dfs(G):
discovered = set()
finished = set()
for u in G.adj:
if u not in discovered and u not in finished:
discovered, finished = dfs_visit(G, u, discovered, finished)
def dfs_visit(G, u, discovered, finished):
discovered.add(u)
for v in G.adj[u]:
# Detect cycles
if v in discovered:
print(f"Cycle detected: found a back edge from {u} to {v}.")
# Recurse into DFS tree
if v not in finished:
dfs_visit(G, v, discovered, finished)
discovered.remove(u)
finished.add(u)
return discovered, finished
if __name__ == "__main__":
G = Graph([
('u', 'v'),
('u', 'x'),
('v', 'y'),
('w', 'y'),
('w', 'z'),
('x', 'v'),
('y', 'x'),
('z', 'z')])
dfs(G)
Note that in this example, the time
in CLRS' pseudocode is not captured because we're only interested in detecting cycles. There is also some boilerplate code for building the adjacency list representation of a graph from a list of edges.
When this script is executed, it prints the following output:
Cycle detected: found a back edge from x to v.
Cycle detected: found a back edge from z to z.
These are exactly the back edges in the example in CLRS Figure 22.4.
The simplest way to do it is to do a depth first traversal (DFT) of the graph.
If the graph has n
vertices, this is a O(n)
time complexity algorithm. Since you will possibly have to do a DFT starting from each vertex, the total complexity becomes O(n^2)
.
You have to maintain a stack containing all vertices in the current depth first traversal, with its first element being the root node. If you come across an element which is already in the stack during the DFT, then you have a cycle.
Start with a DFS: a cycle exists if and only if a back-edge is discovered during DFS. This is proved as a result of white-path theorum.
In my opinion, the most understandable algorithm for detecting cycle in a directed graph is the graph-coloring-algorithm.
Basically, the graph coloring algorithm walks the graph in a DFS manner (Depth First Search, which means that it explores a path completely before exploring another path). When it finds a back edge, it marks the graph as containing a loop.
For an in depth explanation of the graph coloring algorithm, please read this article: http://www.geeksforgeeks.org/detect-cycle-direct-graph-using-colors/
Also, I provide an implementation of graph coloring in JavaScript https://github.com/dexcodeinc/graph_algorithm.js/blob/master/graph_algorithm.js