Strongly Connected Components
A strongly connected component (SCC) is a subgraph such that there is a path from any vertex to every other vertex. A graph can contain more than one separate SCC. An SCC algorithm finds the maximal SCCs within a graph. Our implementation is based on the Divide-and-Conquer Strong Components (DCSC) algorithm[1]. In each iteration, pick a pivot vertex v
randomly, and find its descendant and predecessor sets, where descendant set D_v
is the vertex reachable from v
, and predecessor set P_v
is the vertices which can reach v
(stated another way, reachable from v
through reverse edges). The intersection of these two sets is a strongly connected component SCC_v
. The graph can be partitioned into 4 sets: SCC_v
, descendants D_v
excluding SCC_v
, predecessors P_v
excluding SCC
, and the remainders R_v
. It is proved that any SCC is a subset of one of the 4 sets [1]. Thus, we can divide the graph into different subsets and detect the SCCs independently and iteratively.
The problem of this algorithm is unbalanced load and slow convergence when there are a lot of small SCCs, which is often the case in real-world use cases [3]. We added two trimming stages to improve the performance: size-1 SCC trimming[2] and weakly connected components[3].
The implementation of this algorithm requires reverse edges for all directed edges considered in the graph.
[1] Fleischer, Lisa K., Bruce Hendrickson, and Ali Pınar. "On identifying strongly connected components in parallel." International Parallel and Distributed Processing Symposium. Springer, Berlin, Heidelberg, 2000.
[2] Mclendon Iii, William, et al. "Finding strongly connected components in distributed graphs." Journal of Parallel and Distributed Computing 65.8 (2005): 901-910.
[3] Hong, Sungpack, Nicole C. Rodia, and Kunle Olukotun. "On fast parallel detection of strongly connected components (SCC) in small-world graphs." Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 2013.
Specifications
tg_scc (SET<STRING> v_type, SET<STRING> e_type, SET<STRING> rev_e_type,
INT top_k_dist, INT output_limit, INT max_iter = 500, INT iter_wcc = 5,
BOOL print_accum = TRUE, STRING attr= "", STRING file_path="")
Characteristic | Value |
---|---|
Result |
Assigns a component id (INT) to each vertex, such that members of the same component have the same id value. |
Input Parameters |
|
Result Size |
V = number of vertices |
Time Complexity |
O(iter*d), d = max(diameter of components) |
Graph Types |
Directed edges with reverse direction edges as well |
Example
We ran scc
on the social26 graph. A portion of the JSON result is shown below.
[
{
"i": 1
},
{
"trim_set.size()": 8
},
{
"trim_set.size()": 5
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 0
},
{
"@@cluster_dist_heap": [
{
"csize": 9,
"num": 1
},
{
"csize": 1,
"num": 17
}
]
},
The first element "i"=1
means the whole graph is processed in just one iteration. The 5 "trim_set.size()"
elements mean there were 5 rounds of size-1 SCC trimming. The final "@@.cluster_dist_heap" object"
reports on the size distribution of SCCs.There is one SCC with 9 vertices, and 17 SCCs with only 1 vertex in the graph.