Jaccard Similarity of Neighborhoods (Single Source)
The Jaccard index measures the relative overlap between two sets. To compare two vertices by Jaccard similarity, first select a set of values for each vertex. For example, a set of values for a Person could be the cities the Person has lived in. Then the Jaccard index is computed for the two vectors.
The Jaccard index of two sets A and B is defined as follows:
The value ranges from 0 to 1. If A and B are identical, then Jaccard(A, B) = 1. If both A and B are empty, we define the value to be 0.
Specifications
In the current
tg_jaccard_nbor_ss (VERTEX source, STRING e_type, STRING rev_e_type, INT top_k = 100,
BOOL print_accum = TRUE, STRING similarity_edge_type = "",STRING file_path = "")
Characteristic | Value |
---|---|
Result |
The top k vertices in the graph that have the highest similarity scores, along with their scores. The result is available in three forms:
|
Input Parameters |
|
Result Size |
|
Time Complexity |
O(D^2), D = outdegree of vertex v |
Graph Types |
Undirected or directed edges, unweighted edges |
The algorithm will not output more than K vertices, so the algorithm may arbitrarily choose to output one vertex over another if there are tied similarity scores.
Example
Using the movie graph, we run jaccard_nbor_ss("Neil", 5)
:
[
{
"@@result_topK": [
{
"vertex1": "Neil",
"vertex2": "Kat",
"score": 0.5
},
{
"vertex1": "Neil",
"vertex2": "Kevin",
"score": 0.4
},
{
"vertex1": "Neil",
"vertex2": "Jing",
"score": 0.2
}
]
}
]
If the source vertex (person) doesn’t have any common neighbors (movies) with any other vertex (person), such as Elena in our example, the result will be an empty list:
[
{
"@@result_topK": []
}
]