Skip to content
Blog

Strongly Connected Components

A strongly connected component (SCC) in a directed graph is a maximal subgraph where every pair of vertices is mutually reachable - meaning for any two nodes u and v in the component, there exists a directed path from u to v and a directed path from v to u. This mutual reachability property means that nodes within an SCC form a cycle where they can all reach each other through directed paths, and no additional nodes can be added to the component while maintaining this property.

Kuzu implements two types of SCC algorithms:

You can choose between either of the two algorithms, depending on your use case. See below for more details.

Syntax

// BFS-based algorithm
CALL strongly_connected_components(
graph_name,
maxIterations := 100
)
RETURN node, group_id

Alias: scc

// DFS-based algorithm (Kosaraju's algorithm)
CALL strongly_connected_components_kosaraju(
graph_name
)
RETURN node, group_id

Alias: scc_ko

Parameters

ParameterTypeOptionalDefaultDescription
graph_nameSTRINGNo-Name of the projected graph to run the algorithm on
maxIterationsINT64Yes100Maximum number of iterations to run (This parameter is only for BFS-based algorithm)

Return

ColumnTypeDescription
nodeNODENode object
group_idINT64The ID of the SCC that the node belongs to

Example

Define schema

CREATE NODE TABLE Node(id STRING PRIMARY KEY);
CREATE REL TABLE Edge(FROM Node to Node);

Insert nodes and edges

CREATE (u0:Node {id: 'A'}),
(u1:Node {id: 'B'}),
(u2:Node {id: 'C'}),
(u3:Node {id: 'D'}),
(u4:Node {id: 'E'}),
(u5:Node {id: 'F'}),
(u6:Node {id: 'G'}),
(u7:Node {id: 'H'}),
(u8:Node {id: 'I'}),
(u0)-[:Edge]->(u1),
(u1)-[:Edge]->(u2),
(u1)-[:Edge]->(u4),
(u2)-[:Edge]->(u2),
(u2)-[:Edge]->(u5),
(u3)-[:Edge]->(u1),
(u3)-[:Edge]->(u6),
(u4)-[:Edge]->(u0),
(u4)-[:Edge]->(u5),
(u4)-[:Edge]->(u3),
(u5)-[:Edge]->(u7),
(u6)-[:Edge]->(u4),
(u6)-[:Edge]->(u5),
(u7)-[:Edge]->(u8),
(u8)-[:Edge]->(u5);

We first create a projected graph from the node and edge tables.

CALL project_graph('Graph', ['Node'], ['Edge']);

Then, you can run SCC using either of the two algorithms.

// BFS-based algorithm
CALL strongly_connected_components('Graph')
RETURN group_id, collect(node.id);
// DFS-based algorithm (Kosaraju's algorithm)
CALL strongly_connected_components_kosaraju('Graph')
RETURN group_id, collect(node.id);
┌──────────┬──────────────────┐
│ group_id │ COLLECT(node.id) │
│ INT64 │ STRING[] │
├──────────┼──────────────────┤
│ 3 │ [G,E,D,A,B] │
│ 0 │ [I,F,H] │
│ 8 │ [C] │
└──────────┴──────────────────┘

Frequently asked questions

What’s the difference between scc and scc_ko?

scc is a parallel BFS-based algorithm. While scc_ko is a DFS-based algorithm. Due to the nature of DFS, scc_ko can only run in single-threaded mode. Kosaraju’s algorithm is recommended in the following cases:

  • If your graph is very sparse, i.e. with a large number of nodes but very small number of edges; or
  • If your graph has a very high diameter

How is group_id assigned?

group_id is assigned based on Kuzu’s internal node offsets. Currently there is no way to assign group_id based on node properties.