# Strongly Connected Components

A strongly connected component (SCC) is a subgraph such that there is a path from any vertex to every other vertex. A graph can contain more than one separate SCC. An SCC algorithm finds the maximal SCCs within a graph. Our implementation is based on the Divide-and-Conquer Strong Components (DCSC) algorithm[1]. In each iteration, pick a pivot vertex `v` randomly, and find its descendant and predecessor sets, where descendant set `D_v` is the vertex reachable from `v`, and predecessor set `P_v` is the vertices which can reach `v` (stated another way, reachable from `v` through reverse edges). The intersection of these two sets is a strongly connected component `SCC_v`. The graph can be partitioned into 4 sets: `SCC_v`, descendants `D_v` excluding `SCC_v`, predecessors `P_v` excluding `SCC`, and the remainders `R_v`. It is proved that any SCC is a subset of one of the 4 sets [1]. Thus, we can divide the graph into different subsets and detect the SCCs independently and iteratively.

The problem of this algorithm is unbalanced load and slow convergence when there are a lot of small SCCs, which is often the case in real-world use cases [3]. We added two trimming stages to improve the performance: size-1 SCC trimming[2] and weakly connected components[3].

The implementation of this algorithm requires reverse edges for all directed edges considered in the graph.

[1] Fleischer, Lisa K., Bruce Hendrickson, and Ali Pınar. "On identifying strongly connected components in parallel." International Parallel and Distributed Processing Symposium. Springer, Berlin, Heidelberg, 2000.

[2] Mclendon Iii, William, et al. "Finding strongly connected components in distributed graphs." Journal of Parallel and Distributed Computing 65.8 (2005): 901-910.

[3] Hong, Sungpack, Nicole C. Rodia, and Kunle Olukotun. "On fast parallel detection of strongly connected components (SCC) in small-world graphs." Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 2013.

## Specifications

``````tg_scc (SET<STRING> v_type, SET<STRING> e_type, SET<STRING> rev_e_type,
INT top_k_dist, INT output_limit, INT max_iter = 500, INT iter_wcc = 5,
BOOL print_accum = TRUE, STRING attr= "", STRING file_path="")``````
Characteristic Value

Result

Assigns a component id (INT) to each vertex, such that members of the same component have the same id value.

Input Parameters

• `SET<STRING> v_type`: Names of vertex types to use

• `SET<STRING> e_type`: Names of edge types to use

• `SET<STRING> rev_e_type`: Names of reverse direction edge types to use

• `INT top_k_dist`: top k result in SCC distribution

• `INT output_limit`: If >=0, max number of vertices to output to JSON.

• `INT max_iter`: number of maximum iteration of the algorithm

• `INT iter_wcc`: find weakly connected components for the active vertices in this iteration, since the largest SCCs are already found after several iterations; usually a small number(3 to 10)

• `BOOL print_accum`: If True, output JSON to standard output

• `STRING result_attr`: If not empty, store community ID values (INT) to this attribute

• `STRING file_path`: If not empty, write output to this file.

Result Size

V = number of vertices

Time Complexity

O(iter*d), d = max(diameter of components)

Graph Types

Directed edges with reverse direction edges as well

## Example

We ran `scc` on the social26 graph. A portion of the JSON result is shown below.

``````[
{
"i": 1
},
{
"trim_set.size()": 8
},
{
"trim_set.size()": 5
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 0
},
{
"@@cluster_dist_heap": [
{
"csize": 9,
"num": 1
},
{
"csize": 1,
"num": 17
}
]
},``````

The first element `"i"=1` means the whole graph is processed in just one iteration. The 5 `"trim_set.size()"` elements mean there were 5 rounds of size-1 SCC trimming. The final `"@@.cluster_dist_heap" object"` reports on the size distribution of SCCs.There is one SCC with 9 vertices, and 17 SCCs with only 1 vertex in the graph.