Node2Vec
Node2Vec is a node embedding algorithm that uses random walks in the graph to create a vector representation of a node.
A random walk starts with a node, and the algorithm iteratively selects neighboring nodes to visit, and each neighboring node has an assigned probability. This transforms graph structure into a collection of linear sequences of nodes. For each node we will be left with a list of other nodes from their local or extended neighborhoods.
Once the above step is complete, the algorithm uses a variation of the word2vec model from the language modeling community to turn each node into a vector of probabilities. The probabilities represent the likelihood of visiting a given node in a random walk from each starting node.
Specification
tg_random_walk(INT step = 8, INT path_size = 4,
STRING filepath = "/home/tigergraph/path.csv", SET<STRING> edge_types,
INT sample_num)
tg_node2vec_query(STRING filepath = "/home/tigergraph/path.csv",
STRING output_file = "/home/tigergraph/embedding.csv",
INT dimension)
Installing this query requires installing a UDF, which can be found in the Github repository of the query. If you are running the query on a cluster, you need to manually install the UDF on every node of the cluster.