# Strongly Connected Components

A strongly connected component (SCC) is a subgraph such that there is a path from any vertex to every other vertex. A graph can contain more than one separate SCC. An SCC algorithm finds the maximal SCCs within a graph. Our implementation is based on the Divide-and-Conquer Strong Components (DCSC) algorithm. In each iteration, pick a pivot vertex `v` randomly, and find its descendant and predecessor sets, where descendant set `D_v` is the vertex reachable from `v`, and predecessor set `P_v` is the vertices which can reach `v` (stated another way, reachable from `v` through reverse edges). The intersection of these two sets is a strongly connected component `SCC_v`. The graph can be partitioned into 4 sets: `SCC_v`, descendants `D_v` excluding `SCC_v`, predecessors `P_v` excluding `SCC`, and the remainders `R_v`. It is proved that any SCC is a subset of one of the 4 sets . Thus, we can divide the graph into different subsets and detect the SCCs independently and iteratively.

The problem of this algorithm is unbalanced load and slow convergence when there are a lot of small SCCs, which is often the case in real-world use cases . We added two trimming stages to improve the performance: size-1 SCC trimming and weakly connected components.

The implementation of this algorithm requires reverse edges for all directed edges considered in the graph.

 Fleischer, Lisa K., Bruce Hendrickson, and Ali Pınar. "On identifying strongly connected components in parallel." International Parallel and Distributed Processing Symposium. Springer, Berlin, Heidelberg, 2000.

 Mclendon Iii, William, et al. "Finding strongly connected components in distributed graphs." Journal of Parallel and Distributed Computing 65.8 (2005): 901-910.

 Hong, Sungpack, Nicole C. Rodia, and Kunle Olukotun. "On fast parallel detection of strongly connected components (SCC) in small-world graphs." Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 2013.

## Specifications

``````tg_scc (SET<STRING> v_type, SET<STRING> e_type, SET<STRING> rev_e_type,
INT top_k_dist, INT output_limit, INT max_iter = 500, INT iter_wcc = 5,
BOOL print_accum = TRUE, STRING attr= "", STRING file_path="")``````
Characteristic Value

Result

Assigns a component id (INT) to each vertex, such that members of the same component have the same id value.

Input Parameters

• `SET<STRING> v_type`: Names of vertex types to use

• `SET<STRING> e_type`: Names of edge types to use

• `SET<STRING> rev_e_type`: Names of reverse direction edge types to use

• `INT top_k_dist`: top k result in SCC distribution

• `INT output_limit`: If >=0, max number of vertices to output to JSON.

• `INT max_iter`: number of maximum iteration of the algorithm

• `INT iter_wcc`: find weakly connected components for the active vertices in this iteration, since the largest SCCs are already found after several iterations; usually a small number(3 to 10)

• `BOOL print_accum`: If True, output JSON to standard output

• `STRING result_attr`: If not empty, store community ID values (INT) to this attribute

• `STRING file_path`: If not empty, write output to this file.

Result Size

V = number of vertices

Time Complexity

O(iter*d), d = max(diameter of components)

Graph Types

Directed edges with reverse direction edges as well

## Example

We ran `scc` on the social26 graph. A portion of the JSON result is shown below.

``````[
{
"i": 1
},
{
"trim_set.size()": 8
},
{
"trim_set.size()": 5
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 2
},
{
"trim_set.size()": 0
},
{
"@@cluster_dist_heap": [
{
"csize": 9,
"num": 1
},
{
"csize": 1,
"num": 17
}
]
},``````

The first element `"i"=1` means the whole graph is processed in just one iteration. The 5 `"trim_set.size()"` elements mean there were 5 rounds of size-1 SCC trimming. The final `"@@.cluster_dist_heap" object"` reports on the size distribution of SCCs.There is one SCC with 9 vertices, and 17 SCCs with only 1 vertex in the graph.