PageRank
The PageRank algorithm measures the influence of each vertex on every other vertex. PageRank influence is defined recursively: a vertex’s influence is based on the influence of the vertices which refer to it. A vertex’s influence tends to increase if (1) it has more referring vertices or if (2) its referring vertices have higher influence. The analogy to social influence is clear.
A common way of interpreting PageRank value is through the Random Network Surfer model. A vertex’s PageRank score is proportional to the probability that a random network surfer will be at that vertex at any given time. A vertex with a high PageRank score is a vertex that is frequently visited, assuming that vertices are visited according to the following Random Surfer scheme:
-
Assume a person travels or surfs across a network’s structure, moving from vertex to vertex in a long series of rounds.
-
The surfer can start anywhere. This start-anywhere property is part of the magic of PageRank, meaning the score is a truly fundamental property of the graph structure itself.
-
Each round, the surfer randomly picks one of the outward connections from the surfer’s current location. The surfer repeats this random walk for a long time.
-
But wait. The surfer doesn’t always follow the network’s connection structure. There is a probability (1-damping, to be precise), that the surfer will ignore the structure and will magically teleport to a random vertex.
For more information, see the Google paper on PageRank.
Specifications
tg_pageRank (STRING v_type, STRING e_type, FLOAT max_change=0.001, INT max_iter=25, FLOAT damping=0.85, INT top_k = 100, BOOL print_accum = TRUE, STRING result_attr = "", STRING file_path = "", BOOL display_edges = FALSE)
Characteristic | Value |
---|---|
Result |
Computes a PageRank value (FLOAT type) for each vertex. |
Input Parameters |
|
Result Size |
V = number of vertices |
Time Complexity |
O(E*k), E = number of edges, k = number of iterations. The number of iterations is data-dependent, but the user can set a maximum. Parallel processing reduces the time needed for computation. |
Graph Types |
Directed edges |
Example
# Use _ for default values
RUN QUERY tg_pageRank("Person", "Friend", 0.001, 25, 0.85, 100 _, _, _, _)
We ran pageRank on our test10 graph (using Friend edges) with the following parameter values: damping=0.85, max_change=0.001, and max_iter=25. We see that Ivy (center bottom) has the highest pageRank score (1.12). This makes sense since there are 3 neighboring persons who point to Ivy, more than for any other person. Eddie and Justin have scores of exactly 1 because they do not have any out-edges. This is an artifact of our particular version pageRank. Likewise, Alex has a score of 0.15, which is (1-damping), because Alex has no in-edges.