Label Propogation

Label Propagation is a heuristic method for determining communities. The idea is simple: If the plurality of your neighbors all bear the label X, then you should label yourself as also a member of X. The algorithm begins with each vertex having its own unique label. Then we iteratively update labels based on the neighbor influence described above. It is important that the order for updating the vertices be random. The algorithm is favored for its efficiency and simplicity, but it is not guaranteed to produce the same results every time.

In a variant version, some vertices could initially be known to belong to the same community. If they are well-connected to one another, they are likely to preserve their common membership and influence their neighbors,

Specifications

tg_label_prop (SET<STRING> v_type, SET<STRING> e_type, INT max_iter, INT output_limit,
BOOL print_accum = TRUE, STRING file_path = "", STRING attr = "")
Characteristic Value

Result

Assigns a component id (INT) to each vertex, such that members of the same component have the same id value.

Input Parameters

  • SET<STRING> v_type: Names of vertex types to use

  • SET<STRING> e_type: Names of edge types to use

  • INT max_iter: Number of maximum iteration of the algorithm

  • INT output_limit: If >=0, max number of vertices to output to JSON.

  • BOOL print_accum: If True, output JSON to standard output

  • STRING attr: If not empty, store community id values (INT) to this attribute

  • STRING file_path: If not empty, write output to this file.

Result Size

V = number of vertices

Time Complexity

O(E*k), E = number of edges, k = number of iterations.

Graph Types

Undirected edges

Example

This is the same graph that was used in the Connected Component example. The results are different, though. The quartet of Fiona, George, Howard, and Ivy have been split into 2 groups:

  • (George & Ivy) each connect to (Fiona & Howard) and to one another.

  • (Fiona & Howard) each connect to (George & Ivy) but not to one another.

Label Propagation tries to find natural clusters and separations within connected components. That is, it looks at the quality and pattern of connections. The Component Component algorithm simply asks the Yes or No question: Are these two vertices connected?

We set max_iter to 10, but the algorithm reaches a steady state after 3 iterations:

# Use _ for default/empty values
RUN QUERY tg_label_prop(["Person"], ["Coworker"], 10, -1, _, _, _)
Visualized results of example query on social10 graph with Coworker edges