SELECT Statement
This page discusses the SELECT statement in GSQL Syntax V2, which is the default syntax for TigerGraph version 3.5 and above.
To learn about the SELECT statement in the legacy V1 syntax, see SELECT Statement (Syntax V1)
|
Overview
The SELECT
statement is an assignment statement with a SELECT
block on the right-hand side.
The SELECT
block uses a path pattern to select some of the graph’s vertices and edges.
There are a number of optional clauses that define and/or refine the selection by constraining the vertex or edge set or the result set.
The final output of a query is either a vertex set known as the result set or a table.
There is a maximum size limit of 2 GB for the result set of a SELECT
block.
If the result of the SELECT
block is larger than 2 GB, the system will return no data.
No error message is produced.
gsqlSelectBlock := gsqlSelectClause
fromClause
[sampleClause]
[whereClause]
[accumClause]
[postAccumClause]*
[havingClause]
[orderClause]
[limitClause]
gsqlSelectClause := vertexSetName "=" SELECT vertexAlias
The initial clause is the SELECT
clause: SELECT vertexAlias
.
Its purpose is to specify which set of vertices from the FROM
clause is to become the output.
The SELECT
clause may contain only one item: a vertex alias defined in the FROM
clause.
The vertex alias may be from anywhere in a multi-hop pattern, not only an endpoint.
The FROM
clause defines a path pattern to traverse in the graph, and each vertex in the path pattern can be given a vertex alias.
Thus, the SELECT
clause picks the set of vertices at one of these points in the pattern — the source vertices, the target vertices, or those from an interior point in a multi-hop path — to be the output vertices.
The SELECT
block has many optional clauses, which fit together in a logical flow.
Overall, the SELECT
block starts from a source set of vertices and returns a result set that is either a subset of the source vertices or a subset of their neighboring vertices.
Along the way, computations can be performed on the selected vertices and edges.
FROM
The FROM
clause describes either a single hop or a multi-hop path pattern.
Path patterns also have many other options for finer control and greater flexibility.
FROM
clausefromClause := FROM ( pathPattern ["," pathPattern]*)
A hop or step consists of going from a starting set of vertices, crossing over a set of their edges, to an ending set of vertices.
Path pattern
A path pattern specifies sets of vertex types and how they are connected by edge types.
A path pattern starts with a source vertex set, traverses through specified path edge patterns to another step vertex set. This is called a hop. From the other step vertex set, it can perform multiple hops and traverse to other step vertex sets.
Notice that a path pattern can be just a single source vertex set; the subsequent path edge pattern and step vertex sets are optional.
pathPattern := sourceVertexSet ["-" "(" pathEdgePattern ")" "-" stepVertexSet]*
Source vertex set
The source vertex set is the vertex set from which a path pattern starts. A source vertex set can be denoted by one of the following:
-
_
orANY
, or omitted. If the source vertex type is omitted, you must give the source vertex set an alias. -
Vertex type
-
A vertex set variable
Optionally, you can give a source vertex set an alias by appending the alias after a colon:
.
Although declaring an alias is optional, it is strongly recommended that you declare them.
In the later clauses of the SELECT
block , you can only refer to vertex sets in the FROM
clause by their aliases.
sourceVertexSet := [sourceVertexTypes] [":" vertexAlias]
sourceVertexTypes := "_" | ANY | "(" sourceVertexSetType ["|" sourceVertexSetType]* ")"
sourceVertexSetType := vertexType | vertexSetVariableName
Belows are a few examples of valid source vertex sets in SELECT
statements:
Result = SELECT src
FROM Person:src -(<Likes_REVERSE)- (Comment|Post):tgt (1)
WHERE src.first_name == "Viktor" AND src.last_name == "Akhiezer"
ACCUM CASE
WHEN tgt.type == "Comment" THEN
src.@comment_cnt += 1
WHEN tgt.type == "Post" THEN
src.@post_cnt += 1
END;
1 | Person is a vertex type.
(Comment | POST) combines two vertex types. |
You can use _
or ANY
to represent any vertex types.
You can also choose to omit the step vertex type altogether to represent any vertex type.
If you choose to omit the type, you must give the step vertex set an alias.
Result = SELECT tgt
FROM :s -(<Likes)- Person:tgt
WHERE tgt.first_name == "Viktor" AND tgt.last_name == "Akhiezer"
A source vertex set can also be represented by a vertex set variable.
CREATE QUERY count_friends_of_2 (VERTEX<Person> seed) FOR GRAPH Friend_Net {
SumAccum<INT> @@num_friends = 0;
seed_set = { seed };
friends = SELECT v FROM seed_set:s -((Friend | Coworker):e)- :v
ACCUM @@num_friends +=1;
PRINT @@num_friends;
}
Step vertex set
A vertex set that represents a step in a path pattern. Compared with source vertex set, step vertex sets have more flexibility in how they are denoted. A step vertex set can be denoted by one of the following:
-
_
orANY
, or omitted. If the step vertex type is omitted, you must give the step vertex set an alias. -
Vertex type
-
A vertex set variable
-
A global accumulator
Optionally, you can give a source vertex set an alias by appending the alias after a colon:
.
Although declaring an alias is optional, TigerGraph strongly suggests that you declare them.
In the later clauses of the SELECT
block , you can only refer to vertex sets in the FROM
clause by their aliases.
stepVertexSet := [stepVertexTypes] [":" vertexAlias]
stepVertexTypes := atomicVertexType | "(" vertexSetType ["|" vertexSetType]* ")"
atomicVertexType := "_" | ANY | vertexSetType
vertexSetType := vertexType | vertexSetVariableName | globalAccumName
Belows are a few examples of valid step vertex sets in SELECT
statements:
Result = SELECT tgt
FROM Person:tgt -(<Likes_REVERSE)- (Comment|Post):src (1)
WHERE tgt.firstName == "Viktor" AND tgt.lastName == "Akhiezer"
ACCUM CASE
WHEN src.type == "Comment" THEN
tgt.@commentCnt += 1
WHEN src.type == "Post" THEN
tgt.@postCnt += 1
END;
1 | Person is a vertex type.
(Comment | POST) combines two vertex types. |
You can use _
or ANY
to represent any vertex types.
You can also choose to omit the step vertex type altogether to represent any vertex type.
If you choose to omit the type, you must give the step vertex set an alias.
Result = SELECT s
FROM Person:s -(Likes>)- :tgt
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
A step vertex set can also be represented by a vertex set variable.
CREATE QUERY count_friends_of_2 (VERTEX<Person> seed) FOR GRAPH Friend_Net
{
SumAccum<INT> @@num_friends = 0;
seed_set = { seed };
friends = SELECT v FROM :s -((Friend | Coworker):e)- seed_set:v
ACCUM @@num_friends +=1;
PRINT @@num_friends;
}
A step vertex set can be represented by a global accumulator of strings (the strings are vertex types).
The accumulator must be of type SetAccum
, BagAccum
or ListAccum
.
CREATE QUERY count_friends_of_2(STRING target_type) FOR GRAPH Friend_Net {
SumAccum<INT> @@num_friends = 0;
SetAccum<STRING> @@target_set;
@@target_set += target_type;
friends = SELECT s FROM :s -((Friend | Coworker):e)- @@target_set:v
ACCUM @@num_friends +=1;
PRINT @@num_friends;
}
Path edge pattern
The path edge pattern represents the relationship between a source vertex set to a step vertex set or from a step vertex set to the next step vertex set.
pathEdgePattern := atomicEdgePattern | "(" pathEdgePattern ")" | pathEdgePattern "." pathEdgePattern | disjPattern | starPattern atomicEdgePattern := atomicEdgeType | atomicEdgeType ">" | "<" atomicEdgeType atomicEdgeType := "_" | ANY | edgeSetType disjPattern := atomicEdgePattern ("|" atomicEdgePattern)* starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds] starBounds := CONST_INT ".." CONST_INT | CONST_INT ".." | ".." CONST_INT | CONST_INT
A path edge pattern can represent one hop or repeated hops.
A path edge pattern is denoted by -()-
, where the relationship between vertex sets is specified between the parentheses.
Atomic edge pattern
The most basic form for a path edge pattern is an atomic edge pattern. An atomic edge pattern can be one of the following:
-
_
orANY
. -
An edge type.
-
A string parameter. The value of the parameter must be an edge type and can be provided at runtime. You do not need to specify a direction when using a string parameter to specify the edge type.
-
A global
SetAccum
accumulator of strings. Each string is the name of an edge type.
If the edge is directed, an atomic edge pattern has either a left pointer <
on the left or a right pointer >
on the right to indicate edge direction.
If the edge is undirected, the atomic edge pattern does not have a pointer.
Suppose we have 3 edge types or parameters called A, B, C.
-
A>
is a rightward facing A edge -
<B
is a leftward facing B edge -
C is an undirected C edge. If C is actually a directed edge type, then there is no match.
For example:
-
-(STUDY_AT>)-
refers to forward traversal of the directed edge typeSTUDY_AT
. -
-(<STUDY_AT)-
refers to backward traversal of the directed edge typeSTUDY_AT
.-
This means that the right side of
-(<STUDY_AT)-
is expected to have the same type as the left side of-(STUDY_AT>)-
.
-
-
-(KNOWS)-
refers to forward traversal of the undirectedKNOWS
. -
-(_>)-
refers to forward traversal of any directed edge types. -
-(_)-
refers to forward traversal of any undirected edge types. -
-(<_)-
refers to backward traversal of any directed edge types.
Disjunction pattern
Pattern disjunction allows a path edge pattern to indicate an OR
relationship between two or more atomic patterns.
If an edge matches any of the atomic patterns, the edge matches the path edge pattern.
disjPattern := atomicEdgePattern ("|" atomicEdgePattern)*
For example:
-
-(KNOWS|STUDY_AT>)-
refers to traversing an undirectedKNOWS
edge or a directedSTUDY_AT
edge. -
-(KNOWS|_>)-
refers to traversing an undirectedKNOWS
edge or any directed edge from left to right.
Pattern repetition
The Kleene star*
and min..max
range specifiers repeat an edge pattern for a specified number of times.
The range specifiers must be integers and must be constants.
See 1ホップパターンの反復 for a tutorial on how to use pattern repetition in a path edge pattern.
starPattern := ([atomicEdgePattern] | "(" disjPattern ")") "*" [starBounds]
starBounds := CONST_INT ".." CONST_INT
| CONST_INT ".."
| ".." CONST_INT
| CONST_INT
-
Add
*
to the end of a pattern to have the star pattern match all paths where the edge pattern occurs one or more times.-
For example,
Person:s - (Friendship*) - Person
matches all paths between twoPerson
vertices connected by any number ofFriendship
edges. -
The vertices in the middle do not need to be
Person
vertices. For example, a path likeperson1 -(Friendship)- dog1 - (Friendship) - person2
matches the star pattern.
-
-
Add
*
to the end of a pattern, and then a number after the star to have the star pattern match paths where the edge pattern occurs for the specified number of times.-
For example,
Person:s - (Friendship*2) - Person
matches all paths between twoPerson
vertices connected by exactly twoFriendship
edges. The vertices in the middle do not need to bePerson
vertices.
-
-
Add
*
to the end of a pattern, and then a range after the star (*x..y
) to have the star pattern match all paths where the edge pattern occurs as many times as within the specified range.-
For example,
Employee:s - (Works_For>*2..4) - Employee
matches all paths between twoEmployee
vertices with 2 - 4 right-directedWorks_For
edges. The vertices in the middle do not need to bePerson
vertices.
-
Pattern concatenation
The dot operator.
means concatenate the two edge patterns into one.
The vertex joining the two edges is omitted from the syntax.
The dot operator is a shorthand, when you don’t care about the type of that intermediate vertex.
(A>.<B.C)
means a series of 3 edges, having the specified types and directions.
For example, the following FROM
clauses produce the same source and target vertex sets.
While the second FROM
clause is more concise, it does not give you access to the intermediate vertex and edge sets.
SELECT x
FROM X:x -(E2>:e2)- Y:y -(<E3:e3)- Z:z -(E4:e4)- U:u; (1)
SELECT u
FROM X:x -(E2>.<E3.E4)- U:u; (2)
1 | This FROM clauses uses a longer pattern, but gives you access to y , e2 , z and e4 . |
2 | This FROM clauses is more concise than the first FROM clause, but does not give you access to the intermediate vertex and edge sets. |
Conjunctive Pattern Matching
The optional repeating phrase ["," pathPattern]*
allows you to have multiple path patterns.
They form a conjunction, meaning all of them must be satisfied in order to have a valid match result.
See 連結型パターンマッチング(ベータバージョン) for more details.
fromClause := FROM (step | stepV2 | pathPattern ["," pathPattern]*)
Each step pattern or path pattern forms a match table, one row per matching path in the graph. Each vertex alias or edge alias is one column in the table. When we have a conjunctive path, each path must share at least one vertex alias with another path. This enables the two path sets (and match tables) to be joined. Formally, we make the natural join of the two tables.
Vertex and Edge Aliases
Vertex and edge aliases are declared within the FROM
clause of a SELECT
block, by using the colon :
, followed by the alias name.
Aliases can be accessed anywhere within the same SELECT
block.
They are used to reference a single selected vertex or edge of a set.
It is through the vertex or edge aliases that the attributes of these vertices or edges can be accessed.
For example, the following code snippets show two different SELECT
statements.
The first SELECT
statement starts from a vertex set called allVertices
, and the vertex alias name v
can access each individual vertex from allVertices
.
The second SELECT
statement selects a set of edges.It can use the vertex alias s
to reference the source vertices, or the alias t
to reference the target vertices.
results = SELECT v FROM all_vertices:v;
results = SELECT t FROM all_vertices:s -()- :t;
The following example shows an edge-based SELECT
statement, declaring aliases for all three parts of the edge.
In the ACCUM
clause, the e
and t
aliases are assigned to local vertex and edge variables.
results = SELECT v
FROM all_vertices:s -(:e)- :t
ACCUM VERTEX v = t, EDGE eg = e;
We strongly suggest that an alias should be declared with every vertex and edge in the FROM clause, as there are several functions and features only available to vertex and edge aliases. |
SAMPLE
The SAMPLE
clause is an optional clause that selects a uniform random sample from the population of edges or target vertices specified in the FROM
argument.
Known issue: The SAMPLE clause is not supported in syntax V2 in 3.7.
To use SAMPLE clauses, write your query in syntax V1.
|
If you want to sample from a set of vertices directly, not from edges or from neighboring (target) vertices, then the following technique is simpler and faster: Select k random vertices from a vertex set S
|
The SAMPLE
clause draws from the edge population consisting of those edges which satisfy all three parts — source set, edge type, and target type — of the FROM
clause.
The SAMPLE
clause is intended to provide a representative sample of the distribution of edges (or vertices) connected to hub vertices, instead of dealing with all edges. A hub vertex is a vertex with a relatively high degree.
sampleClause := SAMPLE ( expr | expr "%" ) EDGE WHEN condition (1)
| SAMPLE expr TARGET WHEN condition (2)
| SAMPLE expr "%" TARGET PINNED WHEN condition (3)
1 | Sample an absolute number (or a percentage) of edges for each source vertex. |
2 | Sample an absolute number of edges incident to each target vertex. |
3 | Sample a percentage of edges incident to each target vertex. |
The expression following SAMPLE
specifies the sample size, either an absolute number or a percentage of the population.
The expression in a SAMPLE
clause must evaluate to a positive integer.
There are two sampling methods:
-
Sampling based on edge ID
-
Sampling based on target vertex ID: if a target vertex ID is sampled, all edges from this source vertex to the sampled target vertex are sampled.
Currently, the |
Given that the sampling is random, some details of each of the example queries may change each time they are run.
The following query displays two modes of sampling: an absolute number of edges from a source vertex and a percentage of edges from a source vertex. We use the Computer_Net graph (see Appendix D). In Computer_Net, there are 31 vertices and 43 edges, but only 7 vertices are source vertices. Moreover, c1, c12, and c23 are hub nodes, with at least 10 outgoing edges each. For the absolute count case, we set the size to 1 edge per source vertex, which is equivalent to a random walk. We expect exactly 7 edges to be selected. For the percentage sampling case, we sample 33% of the edges for vertices which have 3 or more outgoing edges. We expect about 15 edges, but the number may vary.
CREATE QUERY sample_ex_3() FOR GRAPH Computer_Net {
// record each selected edge as (src->tgt)
MapAccum<STRING,ListAccum<STRING>> @@abs_edges;
SumAccum<INT> @@total_abs;
// record each selected edge as (src->tgt)
MapAccum<STRING,ListAccum<STRING>> @@pct_edges;
SumAccum<INT> @@total_pct;
start = {Computer.*};
// Sample one outgoing edge per source vertex = Random Walk
abs_sample = SELECT v FROM start:s -(:e)- :v
SAMPLE 1 EDGE WHEN s.outdegree() >= 1 // sample 1 target vertex from each source vertex
ACCUM
@@abs_edges += (s.id -> v.id),
@@total_abs += 1;
PRINT @@total_abs, @@abs_edges;
pct_sample = SELECT v FROM start:s -(:e)- :v
SAMPLE 33% EDGE WHEN s.outdegree() >= 3 # select ~1/3 of edges when outdegree >= 3
ACCUM
@@pct_edges += (s.id -> v.id),
@@total_pct += 1;
PRINT @@total_pct, @@pct_edges;
}
RUN QUERY sample_ex_3()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [
{
"@@total_abs": 64,
"@@abs_edges": {
"c31": ["c23"],
"c30": ["c23"],
"c11": [
"c12",
"c10"
],
"c10": [
"c11",
"c1"
],
"c13": ["c12"],
"c12": [
"c13",
"c14",
"c21",
"c16",
"c19",
"c17",
"c18",
"c22",
"c20",
"c15",
"c11"
],
"c15": ["c12"],
"c14": [
"c23",
"c24",
"c12"
],
"c17": ["c12"],
"c16": ["c12"],
"c19": ["c12"],
"c18": ["c12"],
"c20": ["c12"],
"c22": ["c12"],
"c21": ["c12"],
"c24": [
"c14",
"c23"
],
"c23": [
"c29",
"c31",
"c27",
"c26",
"c25",
"c24",
"c28",
"c30",
"c14",
"c4"
],
"c26": ["c23"],
"c25": ["c23"],
"c28": ["c23"],
"c27": ["c23"],
"c29": ["c23"],
"c1": [
"c7",
"c4",
"c3",
"c2",
"c10",
"c5",
"c9",
"c8",
"c6"
],
"c2": ["c1"],
"c3": ["c1"],
"c4": [
"c23",
"c1"
],
"c5": ["c1"],
"c6": ["c1"],
"c7": ["c1"],
"c8": ["c1"],
"c9": ["c1"]
}
},
{
"@@total_pct": 64,
"@@pct_edges": {
"c31": ["c23"],
"c30": ["c23"],
"c11": [
"c12",
"c10"
],
"c10": [
"c11",
"c1"
],
"c13": ["c12"],
"c12": [
"c13",
"c14",
"c21",
"c16",
"c19",
"c17",
"c18",
"c22",
"c20",
"c15",
"c11"
],
"c15": ["c12"],
"c14": [
"c23",
"c24",
"c12"
],
"c17": ["c12"],
"c16": ["c12"],
"c19": ["c12"],
"c18": ["c12"],
"c20": ["c12"],
"c22": ["c12"],
"c21": ["c12"],
"c24": [
"c14",
"c23"
],
"c23": [
"c29",
"c31",
"c27",
"c26",
"c25",
"c24",
"c28",
"c30",
"c14",
"c4"
],
"c26": ["c23"],
"c25": ["c23"],
"c28": ["c23"],
"c27": ["c23"],
"c29": ["c23"],
"c1": [
"c7",
"c4",
"c3",
"c2",
"c10",
"c5",
"c9",
"c8",
"c6"
],
"c2": ["c1"],
"c3": ["c1"],
"c4": [
"c23",
"c1"
],
"c5": ["c1"],
"c6": ["c1"],
"c7": ["c1"],
"c8": ["c1"],
"c9": ["c1"]
}
}
]
}
Below is an example of using SELECT
to only traverse one edge for each source vertex.
The vertex-attached accumulators @times_traversed_no_sample
and @times_traversed_with_sample
are used to keep track of the number of times an edge is traversed to reach the target vertex.
Without using sampling, this occurs once for each edge; thus @times_traversed_no_sample
has the same number as the in-degree of the vertex.
With sampling edges, the number of edges is restricted.
This is reflected in the @times_traversed_with_sample
accumulator.
Notice the difference in the result set.
Because only one edge per source vertex is traversed when the SAMPLE
clause is used, not all target vertices are reached.
The vertex company3
has 3 incident edges, but in one instance of the query execution, it is never reached.
Additionally, company2
has 6 incident edges, but only 4 source vertices sampled an edge incident to company2
.
SAMPLE
using an absolute number of edgesCREATE QUERY sample_ex_1() FOR GRAPH Work_Net {
SumAccum<INT> @times_traversed_no_sample;
SumAccum<INT> @times_traversed_with_sample;
workers = {Person.*};
// the 'before_sample' result set encapsulates the normal functionality of
// a SELECT statement, where 'times_traversed_no_sample' vertex accumulator is increased for
// each edge incident to the vertex.
before_sample = SELECT v FROM workers:t -(:e)- :v
ACCUM v.@times_traversed_no_sample += 1;
// The 'after_sample' result set is formed by those vertices which can be
// reached when for each source vertex, only one edge is used for traversal.
// This is demonstrated by the values of 'times_traversed_with_sample' vertex accumulator, which
// is increased for each edge incident to the vertex which is used in the
// sample.
after_sample = SELECT v FROM workers:t -(:e)- :v
SAMPLE 1 EDGE WHEN t.outdegree() >= 1 // only use 1 edge from the source vertex
ACCUM v.@times_traversed_with_sample += 1;
PRINT before_sample;
PRINT after_sample;
}
GSQL > RUN QUERY sample_ex_1()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [
{"before_sample": [
{
"v_id": "company2",
"attributes": {
"country": "chn",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 6,
"id": "company2"
},
"v_type": "Company"
},
{
"v_id": "company4",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company4"
},
"v_type": "Company"
},
{
"v_id": "company3",
"attributes": {
"country": "jp",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 3,
"id": "company3"
},
"v_type": "Company"
},
{
"v_id": "company1",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 6,
"@times_traversed_no_sample": 6,
"id": "company1"
},
"v_type": "Company"
},
{
"v_id": "company5",
"attributes": {
"country": "can",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company5"
},
"v_type": "Company"
}
]},
{"after_sample": [
{
"v_id": "company2",
"attributes": {
"country": "chn",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 6,
"id": "company2"
},
"v_type": "Company"
},
{
"v_id": "company4",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company4"
},
"v_type": "Company"
},
{
"v_id": "company3",
"attributes": {
"country": "jp",
"@times_traversed_with_sample": 2,
"@times_traversed_no_sample": 3,
"id": "company3"
},
"v_type": "Company"
},
{
"v_id": "company5",
"attributes": {
"country": "can",
"@times_traversed_with_sample": 1,
"@times_traversed_no_sample": 1,
"id": "company5"
},
"v_type": "Company"
},
{
"v_id": "company1",
"attributes": {
"country": "us",
"@times_traversed_with_sample": 6,
"@times_traversed_no_sample": 6,
"id": "company1"
},
"v_type": "Company"
}
]}
]
}
Since the |
WHERE
The WHERE
clause is an optional clause that constrains edges and vertices specified in the FROM
and SAMPLE
clauses.
WHERE
ClausewhereClause := WHERE condition
The WHERE
clause uses a boolean condition to test each vertex or edge in the FROM
set (or the sampled vertex and edge sets, if the SAMPLE
clause was used).
If the expression evaluates to false for vertex/edge X, then X is excluded from further consideration in the result set.
The expression may use constants or any variables or parameters within the scope of the SELECT
block.
The expression may use arithmetic operators, comparison operators, boolean operators, set operators and parentheses to enforce precedence.
The WHERE
conditional expression may use any of the variables within its scope (global accumulators, vertex set variables, query input parameters, the FROM
clause’s vertex and edge sets (or their vertex and edge aliases), or any of the attributes or accumulators of the vertex/edge sets.)
For a more formal explanation of condition, see the EBNF definitions of condition
and expr
.
Using built-in vertex and edge attributes and functions, such as .type
and .neighbors()
, the WHERE
clause can be used to implement sophisticated selection rules for the edge traversal.
In the following example, the selection conditions are completely specified in the WHERE
clause, with no edge types or vertex types mentioned in the FROM
clause.
WHERE
used as a filterresult_set1 = SELECT v FROM S:v-((E1|E2|E3):e)-(V1|V2):t;
result_set2 = SELECT v FROM S:v-(:e)-:t
WHERE t.type IN ("V1", "V2") AND t IN v.neighbors("E1|E2|E3")
The following examples demonstrate using the WHERE
clause to limit the resulting vertex set based on a vertex attribute.
SELECT WHERE
CREATE QUERY print_cat_posts() FOR GRAPH Social_Net {
cat_posts = SELECT v FROM Post:v // select only those post vertices
WHERE v.subject == "cats"; // which have a subset of 'cats'
PRINT cat_posts;
}
GSQL > RUN QUERY print_cat_posts()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"cat_posts": [
{
"v_id": "8",
"attributes": {
"subject": "cats",
"post_time": "2011-02-03 17:05:52"
},
"v_type": "Post"
},
{
"v_id": "3",
"attributes": {
"subject": "cats",
"post_time": "2011-02-05 01:02:44"
},
"v_type": "Post"
},
{
"v_id": "9",
"attributes": {
"subject": "cats",
"post_time": "2011-02-05 23:12:42"
},
"v_type": "Post"
},
{
"v_id": "10",
"attributes": {
"subject": "cats",
"post_time": "2011-02-04 03:02:31"
},
"v_type": "Post"
},
{
"v_id": "11",
"attributes": {
"subject": "cats",
"post_time": "2011-02-03 01:02:21"
},
"v_type": "Post"
}
]}]
}
CREATE QUERY find_graph_focused_posts() FOR GRAPH Social_Net {
results = SELECT v FROM Post:v // select only post vertices
WHERE v.subject IN ("Graph", "tigergraph"); // which have a subject of either 'Graph' or 'tigergraph'
PRINT results;
}
GSQL > RUN QUERY find_graph_focused_posts()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"results": [
{
"v_id": "1",
"attributes": {
"subject": "tigergraph",
"post_time": "2011-03-03 23:02:00"
},
"v_type": "Post"
},
{
"v_id": "6",
"attributes": {
"subject": "tigergraph",
"post_time": "2011-02-05 02:02:05"
},
"v_type": "Post"
},
{
"v_id": "5",
"attributes": {
"subject": "tigergraph",
"post_time": "2011-02-06 01:02:02"
},
"v_type": "Post"
}
]}]
}
The |
The following example shows the equivalence of using WHERE
as a type filter as well as its limitations.
SELECT
statement with WHERE
clause using AND
/OR
CREATE QUERY find_female_members() FOR GRAPH Social_Net
{
// Finds female person in the social network. all of the following statements
// are equivalent (i.e., produce the same results)
all_vertices = {ANY}; # includes all posts and person
females = SELECT v FROM all_vertices:v
WHERE v.type == "Person" AND v.gender != "Male";
females = SELECT v FROM all_vertices:v
WHERE v.type == "Person" AND v.gender == "Female";
females = SELECT v FROM all_vertices:v
WHERE v.type == "Person" AND
NOT v.gender == "Male";
females = SELECT v FROM all_vertices:v
WHERE v.type != "Post" AND
NOT v.gender == "Male";
/* does not compile. cannot use NOT operator in combination with type attribute
females = SELECT v FROM all_vertices:v
WHERE NOT v.type != "Person" AND
NOT v.gender == "Male";
does not compile. cannot use NOT operator in combination with type attribute
females = SELECT v FROM all_vertices:v
WHERE NOT v.type == "Post" AND
NOT v.gender == "Male"; */
person_vertices = {Person.*};
females = SELECT v FROM person_vertices:v
WHERE NOT v.gender == "Male";
females = SELECT v FROM person_vertices:v
WHERE v.gender != "Male";
females = SELECT v FROM person_vertices:v
WHERE v.gender != "Male" AND true;
females = SELECT v FROM person_vertices:v
WHERE v.gender != "Male" OR false;
PRINT females;
}
GSQL > RUN QUERY find_female_members()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"females": [
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"id": "person2"
},
"v_type": "Person"
}
]}]
}
The following example uses edge attributes to determine which workers are registered as full time for any company.
WHERE
using edge attributesCREATE QUERY full_time_workers() FOR GRAPH Work_Net {
// find all workers who are full time at some company
start = {Person.*};
full_time_workers = SELECT v FROM start:v -(Works_For:e)- Company:t
WHERE e.full_time; // full_time is a boolean attribute on the edge
PRINT full_time_workers;
}
full_time_workers
ResultsGSQL > RUN QUERY full_time_workers()
{
"error": false,
"message": "",
"version": {
"edition": "developer",
"schema": 0,
"api": "v2"
},
"results": [{"full_time_workers": [
{
"v_id": "person4",
"attributes": {
"interestList": ["football"],
"skillSet": [ 10, 1, 4 ],
"skillList": [ 4, 1, 10 ],
"locationId": "us",
"interestSet": ["football"],
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {
"interestList": [ "sport", "football" ],
"skillSet": [10],
"skillList": [10],
"locationId": "can",
"interestSet": [ "football", "sport" ],
"id": "person11"
},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {
"interestList": [ "football", "sport" ],
"skillSet": [3],
"skillList": [3],
"locationId": "us",
"interestSet": [ "sport", "football" ],
"id": "person10"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"interestList": [ "management", "financial" ],
"skillSet": [ 3, 2, 1 ],
"skillList": [ 1, 2, 3 ],
"locationId": "us",
"interestSet": [ "financial", "management" ],
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"interestList": [ "music", "art" ],
"skillSet": [ 10, 7 ],
"skillList": [ 7, 10 ],
"locationId": "jp",
"interestSet": [ "art", "music" ],
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"interestList": ["engineering"],
"skillSet": [ 6, 5, 3, 2 ],
"skillList": [ 2, 3, 5, 6 ],
"locationId": "chn",
"interestSet": ["engineering"],
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"interestList": ["management"],
"skillSet": [ 2, 5, 1 ],
"skillList": [ 1, 5, 2 ],
"locationId": "chn",
"interestSet": ["management"],
"id": "person8"
},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {
"interestList": [
"music",
"engineering",
"teaching",
"teaching",
"teaching"
],
"skillSet": [ 2, 5, 1 ],
"skillList": [ 1, 5, 2, 2, 2 ],
"locationId": "jp",
"interestSet": [ "teaching", "engineering", "music" ],
"id": "person12"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"interestList": ["teaching"],
"skillSet": [ 6, 1, 4 ],
"skillList": [ 4, 1, 6 ],
"locationId": "jp",
"interestSet": ["teaching"],
"id": "person3"
},
"v_type": "Person"
},
{
"v_id": "person9",
"attributes": {
"interestList": [ "financial", "teaching" ],
"skillSet": [ 2, 7, 4 ],
"skillList": [ 4, 7, 2 ],
"locationId": "us",
"interestSet": [ "teaching", "financial" ],
"id": "person9"
},
"v_type": "Person"
}
]}]
}
If multiple edge types are specified in edge-induced selection, the Multiple Edge Type
WHERE clause
The above query is compilable.
However, if we use line 5 as the |
ACCUM
The ACCUM
clause enables sophisticated aggregation and other computations across the set of vertices or edges selected by the preceding FROM
, SAMPLE
, and WHERE
clauses.
Syntax
The primary purpose of the ACCUM
clause is to collect information about the graph by updating accumulators (via +=
or =
).
However, other kinds of statements (e.g., branching, iteration, local assignments) are permitted to support more complex computations or to log activity.
The EBNF syntax below defines the allowable kinds of statements that can occur within an ACCUM
clause.
ACCUM
clauseaccumClause := [perClauseV2] ACCUM dmlSubStmtList (1)
dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*
dmlSubStmt := assignStmt // Assignment (2) (3)
| funcCallStmt // Function Call
| gAccumAccumStmt // Assignment
| lAccumAccumStmt // Assignment
| attrAccumStmt // Assignment
| vAccumFuncCall // Function Call
| localVarDeclStmt // Declaration
| dmlSubCaseStmt // Control Flow
| dmlSubIfStmt // Control Flow
| dmlSubWhileStmt // Control Flow
| dmlSubForEachStmt // Control Flow
| BREAK // Control Flow
| CONTINUE // Control Flow
| insertStmt // Data Modification
| dmlSubDeleteStmt // Data Modification
| printlnStmt // Output
| logStmt // Output
1 | DML-sub-statements do not include global accumulator assignment statement (gAccumAssignStmt) but global accumulator accumulation statement (gAccumAccumStmt).
Global accumulators may perform accumulation += but not assignment = within an ACCUM clause. |
2 | Global variable assignment is permitted in an ACCUM clause, but the change in value will not take place until the query completes.
Therefore, if there are multiple assignment statements for the same variable, only the final one will take effect. |
3 | Vertex attribute assignment = is not permitted in an ACCUM clause. However, edge attribute assignment is permitted.
This is because the ACCUM clause iterates over an edge set.
Vertex attribute assignment is permitted in the POST-ACCUM clause.
Like all updates, the change in value does not take place until the query completes. |
Iteration model
The ACCUM
clause is executed once for each set of vertices and edges in the graph which match the pattern and constraints given in the FROM
and WHERE
clauses.
You can think of FROM-WHERE
as producing a virtual table.
The columns of this matching table are the alias variables from the FROM
clause pattern, and the rows are each possible set of vertex and edge aliases (e.g. a path) which fit the pattern.
For a simple 1-hop pattern below:
FROM Person:A -(IS_LOCATED_IN:B)- City:C
The above FROM
clause produces a match table with 3 columns: A, B, and C.
Each row is a tuple (A,B,C) where there is a has_lived_in
edge B from a Person
vertex A to a City
vertex C.
We say that the match table provides a binding between the pattern aliases and graph’s vertices and edges.
A multi-hop pattern simply has more columns than a 1-hop pattern.
Since the ACCUM
clause iterates over edges, and often two edges will connect to the same source vertex or to the same target vertex, the ACCUM
clause can be repeated multiple times for one vertex.
Operations that are to be performed exactly once per vertex should be performed in the POST-ACCUM
clause.
Parallelism in ACCUM
clause
TigerGraph uses parallelism to improve performance.
The statements within the ACCUM
clause are executed sequentially for a given vertex or edge.
However, there is no fixed order in which a vertex set or edge set is processed.
Any inspection of an intermediate result within the ACCUM
clause is incomplete and may not be that meaningful.
Due to the parallel nature of the statements within an ACCUM
clause, you cannot reference updated variables within an ACCUM
clause since there is no guarantee that the variable was indeed updated when it’s referenced.
For example, in the following query, the second statement in the ACCUM
clause does not necessarily run after the first statement.
CREATE QUERY accum_wrong_example() SYNTAX v2 {
SumAccum<INT> @@count_total;
SumAccum<INT> @active_flag = 0;
result = SELECT p
FROM Person: p - (KNOWS) - Person: w
WHERE w.lastName == "Wang" AND p.firstName == "Peter"
ACCUM p.@active_flag += 1,
@@count_total += p.@active_flag; (1)
PRINT @@count_total, result[result.@active_flag];
}
1 | This statement does not necessarily occur ofter the previous line. |
The correct way to execute the logic is to put the second statement in the POST-ACCUM
clause.
This guarantees that the second statement executes after the first statement:
CREATE QUERY accum_example() SYNTAX v2 {
SumAccum<INT> @@count_total;
SumAccum<INT> @active_flag = 0;
result = SELECT p
FROM Person: p - (KNOWS) - Person: w
WHERE w.lastName == "Wang" AND p.firstName == "Peter"
ACCUM p.@active_flag += 1
POST-ACCUM @@count_total += p.@active_flag;
PRINT @@count_total, result[result.@active_flag];
}
The For example, consider the following clauses:
This finds the friends of the friends of |
Edge/Vertex type inference and conflict
If multiple edge types are specified in an ACCUM
clause, each ACCUM
statement in the ACCUM
clause checks whether edge types are conflicted.
If only a subset of edge types are effective in an ACCUM
statement, this statement is not executed on other edge types.
For example:
ACCUM
statement checkCREATE QUERY multiple_edge_type_check_ex (VERTEX<Person> m1) FOR GRAPH Social_Net {
ListAccum<STRING> @@test_list_1, @@test_list_2, @@test_list_3;
all_user = {m1};
all_user = SELECT s
FROM all_user:s - ((Posted|Liked|Friend):e) - (Post|Person):t
ACCUM @@test_list_1 += to_string(datetime_to_epoch(e.action_time)),
@@test_list_2 += t.gender,
@@test_list_3 += to_string(datetime_to_epoch(e.action_time)) + t.gender (1)
;
PRINT @@test_list_1, @@test_list_2, @@test_list_3;
}
1 | This statement causes a compilation error. |
In the above example, line 6 is only executed on Liked
edges, because action_time
is the attribute of liked
edge only.
Similarly, line 7 is only executed on Friend
edges, because gender
is the attribute of person
only, and only Friend
edge uses person
as target vertex.
However, line 8 causes a compilation error, because it uses multiple edges where some edges cannot be supported in a part of the statement, i.e., liked
edges doesn’t have t.gender
, Friend
edges doesn’t have e.action_time
.
We strongly suggest that if multiple edge types are specified in edge-induced selection, Multiple Edge Type
ACCUM statement check 2
The above query is compilable.
However, if we switch line 8 and line 10, the edge-type conflict checking generates errors because |
Examples
This example uses ACCUM
to find all the subjects a user posted about.
CREATE QUERY user_posts() FOR GRAPH Social_Net {
// For each person, make a list of all their post subjects
ListAccum<STRING> @person_posts;
start = {Person.*};
// Find all user post topics and append them to the vertex list accum
user_postings = SELECT s FROM start:s -(Posted)- :g
ACCUM s.@person_posts += g.subject;
PRINT user_postings;
}
GSQL > RUN QUERY user_posts()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"user_postings": [
{
"v_id": "person7",
"attributes": {
"gender": "Male",
"@person_posts": [
"tigergraph",
"cats"
],
"id": "person7"
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"@person_posts": [
"coffee",
"cats"
],
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"@person_posts": ["Graphs"],
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"@person_posts": ["cats"],
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"@person_posts": ["tigergraph"],
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"@person_posts": [
"cats",
"Graphs"
],
"id": "person8"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"gender": "Male",
"@person_posts": [
"tigergraph",
"cats"
],
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"@person_posts": ["query languages"],
"id": "person3"
},
"v_type": "Person"
}
]}]
}
This example shows each person’s posted vertices and each person’s like behaviors (liked edges).
ACCUM<VERTEX>
and ACCUM<EDGE>
ExampleCREATE QUERY user_posts_2() FOR GRAPH Social_Net {
// Show each user's post and liked post time
ListAccum<VERTEX> @person_posts;
ListAccum<EDGE> @person_liked_info;
start = {Person.*};
// Find all user post topics and append them to the vertex list accum
user_postings = SELECT s FROM start:s -(Posted)- :g
ACCUM s.@person_posts += g;
user_postings = SELECT s from start:s -(Liked:e)- :g
ACCUM s.@person_liked_info += e;
PRINT start;
}
GSQL > RUN QUERY user_posts2()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"start": [
{
"v_id": "person7",
"attributes": {
"gender": "Male",
"@person_posts": [
"6",
"9"
],
"id": "person7",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person7",
"to_id": "10",
"attributes": {"action_time": "2010-01-12 11:22:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"@person_posts": [
"4",
"11"
],
"id": "person5",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person5",
"to_id": "6",
"attributes": {"action_time": "2010-01-12 21:12:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"@person_posts": ["0"],
"id": "person1",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person1",
"to_id": "0",
"attributes": {"action_time": "2010-01-11 11:32:00"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"@person_posts": ["3"],
"id": "person4",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person4",
"to_id": "4",
"attributes": {"action_time": "2010-01-13 03:16:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"@person_posts": ["1"],
"id": "person2",
"@person_liked_info": [
{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person2",
"to_id": "0",
"attributes": {"action_time": "2010-01-12 10:52:15"},
"e_type": "Liked"
},
{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person2",
"to_id": "3",
"attributes": {"action_time": "2010-01-11 16:02:26"},
"e_type": "Liked"
}
]
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"gender": "Male",
"@person_posts": [
"5",
"10"
],
"id": "person6",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person6",
"to_id": "8",
"attributes": {"action_time": "2010-01-14 11:23:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"@person_posts": [
"8",
"7"
],
"id": "person8",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person8",
"to_id": "4",
"attributes": {"action_time": "2010-01-11 03:26:05"},
"e_type": "Liked"
}]
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"@person_posts": ["2"],
"id": "person3",
"@person_liked_info": [{
"from_type": "Person",
"to_type": "Post",
"directed": true,
"from_id": "person3",
"to_id": "0",
"attributes": {"action_time": "2010-01-16 05:15:53"},
"e_type": "Liked"
}]
},
"v_type": "Person"
}
]}]
}
This example counts the total number of times each topic is used.
CREATE QUERY user_posts_by_topic() FOR GRAPH Social_Net {
//Show number of total posts by topic
MapAccum<STRING, INT> @@post_topic_counts;
start = {Person.*};
// Append subject and update the appearance count in the global map accum
posts = SELECT g FROM start -(Posted)- :g
ACCUM @@post_topic_counts += (g.subject -> 1);
PRINT @@post_topic_counts;
}
GSQL > RUN QUERY user_posts_by_topic()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@post_topic_counts": {
"cats": 5,
"coffee": 1,
"Graphs": 2,
"query languages": 1,
"tigergraph": 3
}}]
}
POST-ACCUM
The optional POST-ACCUM
clause enables aggregation and other computations across the set of vertices (but not edges) selected by the preceding clauses.
POST-ACCUM
can be used without ACCUM
.
If it is preceded by an ACCUM
clause, then it can be used for 2-stage accumulative computation: a first stage in ACCUM
followed by a second stage in POST-ACCUM
.
When you reference a vertex alias in a DML-sub statement in a POST-ACCUM
statement, you bind that vertex alias to the POST-ACCUM
clause implicitly.
You can also explicitly bind a vertex alias with a POST-ACCUM
clause by putting the vertex alias in parentheses immediately after the keyword POST-ACCUM
.
Each POST-ACCUM
clause must be bound with one and only one vertex alias.
A SELECT
statement can have multiple POST-ACCUM
clauses.
If you need to run aggregation and other computations by referencing more than one vertex alias, you can use more than one POST-ACCUM
clause.
Each POST-ACCUM
clauses are processed in parallel; it doesn’t matter in what order you write them.
Syntax
POST-ACCUM
clausepostAccumClause := POST-ACCUM ["(" vertexAlias ")"] dmlSubStmtList
dmlSubStmtList := dmlSubStmt ["," dmlSubStmt]*
dmlSubStmt := assignStmt // Assignment
| funcCallStmt // Function Call
| gAccumAccumStmt // Assignment
| lAccumAccumStmt // Assignment
| attrAccumStmt // Assignment
| vAccumFuncCall // Function Call
| localVarDeclStmt // Declaration
| dmlSubCaseStmt // Control Flow
| dmlSubIfStmt // Control Flow
| dmlSubWhileStmt // Control Flow
| dmlSubForEachStmt // Control Flow
| BREAK // Control Flow
| CONTINUE // Control Flow
| insertStmt // Data Modification
| dmlSubDeleteStmt // Data Modification
| printlnStmt // Output
| logStmt // Output
Iteration model
The ACCUM
clause executes for each full path that matches the pattern in the FROM
clause.
In contrast, the POST-ACCUM
clause executes for each vertex in one vertex set; its statements can access the aggregated accumulator result computed in the ACCUM
clause.
You can think of the matching result of the FROM
clause and the WHERE
clause as a virtual table.
The columns of this matching table are the alias variables from the FROM
clause pattern, and the rows are each possible set of vertex and edge aliases (e.g. a path) which fit the pattern.
A POST-ACCUM
clause acts like a FOREACH
loop on the one of the columns of vertex result set specified in the SELECT
clause and only occurs once for each vertex.
If you want to perform per-vertex updates for more than one vertex alias, you should use a separate POST-ACCUM
clause for each vertex alias.
For example, below we have three POST-ACCUM
clauses.
INTERPRET QUERY () {
SumAccum<INT> @cnt1;
SumAccum<INT> @cnt2;
SumAccum<INT> @@global_t_count;
R = SELECT s
FROM Person:s-(Likes>) -:msg - (Has_Creator>)-Person:t
WHERE s.first_name == "Viktor" AND s.last_name == "Akhiezer"
AND t.last_name LIKE "S%" AND year(msg.creation_date) == 2012
ACCUM s.@cnt1 +=1 //execute this per match of the FROM pattern.
POST-ACCUM s.@cnt2 += s.@cnt1 (1)
POST-ACCUM t.@cnt2 +=1 (2)
POST-ACCUM(t) @@global_t_count += 1; (3)
PRINT R [R.first_name, R.last_name, R.@cnt1, R.@cnt2];
}
1 | The first one iterates through s , and for each s , we do s.@cnt2 += s.@cnt1 . |
2 | The second POST-ACCUM iterates through t . |
3 | The third POST-ACCUM also iterates through t , but its DML-sub statements do not reference t .
Rather, the POST-ACCUM is bound to t explicitly. |
However, the following is not allowed, since it involves two aliases (t and s) in one POST-ACCUM
clause.
POST-ACCUM t.@cnt1 += 1,
s.@cnt1 += 1
Also, you may not use more than one alias in a single assignment. The following is not allowed:
POST-ACCUM t.@cnt1 += s.@cnt + 1
Multiple POST-ACCUM
clauses
A SELECT
statement can have multiple POST-ACCUM
clauses. Each POST-ACCUM
may refer to only one vertex alias.
See the POST-ACCUM
section in the Pattern Matching tutorial for more details.
Examples
This is an example of using ACCUM
and POST-ACCUM
in conjunction.
The ACCUM
traverses the graph and finds all people who live and work in the same country.
After this is determined, POST-ACCUM
examines each vertex (person) to see if they work where they live.
POST-ACCUM
ExampleCREATE QUERY resident_employees() FOR GRAPH Work_Net {
// Show all persons who both work and live in the same country
ListAccum<STRING> @company;
OrAccum @works_and_lives;
start = {Person.*};
employees = SELECT s FROM start:s -(Works_For)- :c
// If a person works for a company in the same country where they live, add the company to the list
ACCUM CASE
WHEN (s.location_id == c.country) THEN
s.@company += c.id
END
// Check each vertex and see if a person works where they live
POST-ACCUM CASE
WHEN (s.@company.size() > 0) THEN
s.@works_and_lives += TRUE
ELSE
s.@works_and_lives += FALSE
END;
PRINT employees WHERE (employees.@works_and_lives == TRUE);
}
GSQL > RUN QUERY resident_employees()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"employees": [
{
"v_id": "person1",
"attributes": {
"skill_set": [
1,
2,
3
],
"skill_list": [
1,
2,
3
],
"@works_and_lives": true,
"interest_list": [
"management",
"financial"
],
"id": "person1",
"interest_set": [
"financial",
"management"
],
"location_id": "us",
"@company": ["company1"]
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"skill_set": [
2,
3,
5,
6
],
"skill_list": [
2,
3,
5,
6
],
"@works_and_lives": true,
"interest_list": ["engineering"],
"id": "person2",
"interest_set": ["engineering"],
"location_id": "chn",
"@company": ["company2"]
},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {
"skill_set": [10],
"skill_list": [10],
"@works_and_lives": true,
"interest_list": [
"sport",
"football"
],
"id": "person11",
"interest_set": [
"football",
"sport"
],
"location_id": "can",
"@company": ["company5"]
},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {
"skill_set": [3],
"skill_list": [3],
"@works_and_lives": true,
"interest_list": [
"football",
"sport"
],
"id": "person10",
"interest_set": [
"sport",
"football"
],
"location_id": "us",
"@company": ["company1"]
},
"v_type": "Person"
}
]}]
}
This is an example of a POST-ACCUM
only that counts the number of people with a particular gender.
CREATE QUERY person_gender(STRING gender) FOR GRAPH Social_Net {
// Count the number of persons of a given gender
SumAccum<INT> @@gender_count;
start = {ANY};
// Select all person vertices and check the gender attribute
friends = SELECT v FROM start:v
WHERE v.type == "Person"
POST-ACCUM (v) CASE
WHEN (v.gender == gender) THEN
@@gender_count += 1
END;
PRINT @@gender_count;
}
GSQL > RUN QUERY person_gender("Female")
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"@@gender_count": 3}]
}
Updating vertex-attached accumulators
Vertices referenced via a vertex-attached accumulator of a selected vertex may have their vertex-attached accumulators updated in the ACCUM
clause (but not in the POST-ACCUM
clause).
That is, a vertex referenced by a selected vertex can be updated, with some limitations explained below.
Some examples will help to illustrate this more complex condition.
-
Suppose a query declares a vertex-attached accumulator which holds vertex information. We call this a vertex-holding accumulator. This could take several forms:
-
A scalar accumulator, e.g.,
MaxAccum< VERTEX > @maxV
; -
A collection accumulator: e.g.,
ListAccum< VERTEX > @listV
; -
An accumulator containing tuple(s), where the tuple type contains a
VERTEX
field.
-
-
If a vertex
V
is selected, then not only canV’s accumulators be updated, but the vertices stored in its vertex-holding accumulators can also be updated, in the `ACCUM
clause. -
Before these indirectly referenced vertices can be used, they need to be activated. There are two ways to activate an indirect vertex:
-
A vertex from a vertex-holding accumulator is first assigned to a local vertex variable. The vertex can now be updated through the local vertex variable.
-
ACCUM
vertex<Person> mx = tgt.@max_v, # assign to local variable
mx.@cur_id += src.id # access via local variable
-
A FOREACH loop can iterate on a vertex-holding collection accumulator. The vertices can now be updated through the loop variable.
ACCUM
FOREACH vtx IN src.@set_ids DO # iterate on collection accumulator
vtx.@cur_id += tgt.id # access via loop variable
END
The following uses are NOT supported:
|
The following query demonstrates updates to indirectly activated vertices.
CREATE QUERY v_update_indirect_accum() FOR GRAPH Social_Net {
SetAccum<vertex<Person>> @posters;
SetAccum<vertex<Person>> @fellows;
persons = {Person.*};
// To each post, attach a list of persons who liked the post
liked_posts = SELECT p
FROM persons:src -(Liked:e)- Post:p
ACCUM
p.@posters += src;
// To each person who liked a post, attach a list of everyone
// who also liked one of this person's liked posts.
liked_posts = SELECT src
FROM liked_posts:src
ACCUM
FOREACH v IN src.@posters DO
v.@fellows += src.@posters
END
ORDER BY src.subject;
PRINT persons[persons.@fellows];
}
GSQL > RUN QUERY v_update_indirect_accums()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"persons": [
{
"v_id": "person7",
"attributes": {"persons.@fellows": ["person7"]},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {"persons.@fellows": ["person5"]},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {"persons.@fellows": [
"person3",
"person2",
"person1"
]},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {"persons.@fellows": [
"person4",
"person8"
]},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {"persons.@fellows": [
"person3",
"person2",
"person1"
]},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {"persons.@fellows": [
"person3",
"person2",
"person1"
]},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {"persons.@fellows": [
"person4",
"person8"
]},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {"persons.@fellows": ["person6"]},
"v_type": "Person"
}
]}]
}
PER
The PER
clause is an optional prefix to an ACCUM
clause, affecting only that clause.
The FROM
clause of a SELECT
statement produces a match table.
The PER
clause allows the user to specify that they wish to aggregate the match table, so that there is one row per alias.
For more information see the PER
Clause section in the Pattern Matching tutorial.
HAVING
The optional HAVING
clause provides constraints on the result set of the SELECT
statement.
The constraints are applied after ACCUM
and POST-ACCUM
actions.
This differs from the WHERE
clause, which is applied before the ACCUM
and POST-ACCUM
actions.
HAVING
ClausehavingClause := HAVING condition
The condition in a HAVING
clause is applied to each vertex in the SELECT
set (either source or target vertices) which also fulfilled the FROM
and WHERE
conditions.
The HAVING
clause is intended to test one or more of the accumulator variables that were updated in the ACCUM
or POST-ACCUM
clause, though the condition may be anything that equates to a boolean value.
If the condition is false for a particular vertex, then that vertex is excluded from the result set.
The following example demonstrates using the HAVING
clause to constrain a result set based on the vertex accumulator variable which was updated during the ACCUM
clause.
The following query finds all persons meeting a given activity threshold, based on how many posts or likes a person has made.
CREATE QUERY active_members (INT activity_threshold) FOR GRAPH Social_Net {
SumAccum<INT> @activity_amount;
start = {Person.*};
result = SELECT v FROM start:v -(:e)- Post:tgt
ACCUM v.@activity_amount +=1
HAVING v.@activity_amount >= activity_threshold;
PRINT result;
}
If the activity_threshold
parameter is set to 3, the query returns 5 vertices:
GSQL > RUN QUERY active_members(3)
CREATE QUERY active_members (INT activity_threshold) FOR GRAPH Social_Net {
SumAccum<INT> @activity_amount;
start = {Person.*};
result = SELECT v FROM start:v -(:e)- Post:tgt
ACCUM v.@activity_amount +=1
HAVING v.@activity_amount >= activity_threshold;
PRINT result;
}
If the threshold is set to 2, the query would return 8 vertices. If the threshold is set to 4, the query would return no vertices.
The following example demonstrates the equivalence of a SELECT
statement in which the condition for the HAVING
clause is always true.
The following query finds all person meeting a given activity threshold, based on how many posts or likes a person has made
CREATE QUERY print_member_activity() FOR GRAPH Social_Net
{
SumAccum<INT> @activity_amount;
start = {Person.*};
/* --- equivalent statements -----
result = SELECT v FROM start:v -(:e)- Post:tgt
ACCUM v.@activity_amount +=1
HAVING true; */
result = SELECT v FROM start:v -(:e)- Post:tgt
ACCUM v.@activity_amount +=1;
PRINT result;
}
GSQL > RUN QUERY print_member_activity()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result": [
{
"v_id": "person7",
"attributes": {
"gender": "Male",
"@activity_amount": 3,
"id": "person7"
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"gender": "Female",
"@activity_amount": 3,
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"gender": "Male",
"@activity_amount": 2,
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"gender": "Female",
"@activity_amount": 2,
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"gender": "Male",
"@activity_amount": 2,
"id": "person3"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"gender": "Male",
"@activity_amount": 3,
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"gender": "Female",
"@activity_amount": 3,
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"gender": "Male",
"@activity_amount": 3,
"id": "person8"
},
"v_type": "Person"
}
]}]
}
The following shows an example of equivalent result sets from using WHERE
vs. HAVING
.
Recall that the WHERE
clause is evaluated before the ACCUM
and that the HAVING
clause is evaluated after the ACCUM
.
Both constrain the result set based on a condition that vertices must meet.
The query below computes the total post activity for each male person.
Because the gender of the vertex does not change, evaluating whether the person vertex is male before the ACCUM
clause with WHERE
or after the ACCUM
clause with HAVING
does not change the result.
However, if the condition in the HAVING
clause could change within the ACCUM
clause, these statements would produce different results.
HAVING
vs. WHERE
CREATE QUERY active_male_members() FOR GRAPH Social_Net
{
SumAccum<INT> @activity_amount;
start = {Person.*};
// The two statements produce equivalent results
result1 = SELECT v FROM start:v -(:e)- Post:tgt
WHERE v.gender == "Male"
ACCUM v.@activity_amount +=1;
result2 = SELECT v FROM start:v -(:e)- Post:tgt
ACCUM v.@activity_amount +=1
HAVING v.gender == "Male";
PRINT result1[result1.@activity_amount];
PRINT result2[result2.@activity_amount];
}
--qio0
- Results
The following example has a compilation error because the result set is taken from the source vertices, but the HAVING
condition is checking the target vertices.
This query does not compile because the having condition is testing the wrong vertex set
HAVING
the wrong vertex setCREATE QUERY print_member_about_cats() FOR GRAPH Social_Net
{
start = {Person.*};
result = SELECT v FROM start:v -(:e)- Post:tgt
HAVING tgt.subject == "cats";
PRINT result;
}
$ gsql printMemberAboutCats.gsql
Semantic Check Error in query printMemberAboutCats (SEM-50): line 8, col 33
The SELECT block selects src, but the HAVING clause uses tgt
ORDER BY
The optional ORDER BY
clause sorts the result set.
orderClause := ORDER BY expr [ASC | DESC] ["," expr [ASC | DESC]]*
ASC
specifies ascending order, and DESC
specifies descending order.
If neither is specified, then ascending order is used.
Each expression must refer to the attributes or accumulators of a member of the result set, and the expression must evaluate to a sortable value (e.g., a number or a string).
ORDER BY
offers hierarchical sorting by allowing a comma-separated list of expressions, sorting first by the leftmost expr.
It uses the next expression only to sort items where the current sort expr results in identical values.
Any items in the result set which cannot be sorted (because the sort expressions do not pertain to them) will appear at the end of the set, after the sorted items.
The following example demonstrates the use of ORDER BY
with multiple expressions.
The returned vertex set is first ordered by the number of friends of the vertex, and then ordered by the number of coworkers of that vertex.
This query finds the most popular people, sorting first based on the number of friends and then in case of a tie by the number of coworkers
CREATE QUERY top_popular() FOR GRAPH Friend_Net {
SumAccum<INT> @num_friends;
SumAccum<INT> @num_coworkers;
start = {Person.*};
result = SELECT v FROM start -((Friend|Coworker):e)- Person:v
ACCUM CASE
WHEN e.type == "Friend" THEN v.@num_friends += 1
WHEN e.type == "Coworker" THEN v.@num_coworkers += 1
END
ORDER BY v.@num_friends DESC, v.@num_coworkers DESC;
PRINT result;
}
RUN QUERY top_popular()
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result": [
{
"v_id": "person9",
"attributes": {
"@num_friends": 5,
"@num_coworkers": 3,
"id": "person9"
},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {
"@num_friends": 4,
"@num_coworkers": 1,
"id": "person12"
},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {
"@num_friends": 4,
"@num_coworkers": 1,
"id": "person8"
},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {
"@num_friends": 3,
"@num_coworkers": 4,
"id": "person6"
},
"v_type": "Person"
},
{
"v_id": "person1",
"attributes": {
"@num_friends": 3,
"@num_coworkers": 3,
"id": "person1"
},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 5,
"id": "person4"
},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 3,
"id": "person2"
},
"v_type": "Person"
},
{
"v_id": "person3",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 3,
"id": "person3"
},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {
"@num_friends": 2,
"@num_coworkers": 1,
"id": "person10"
},
"v_type": "Person"
},
{
"v_id": "person7",
"attributes": {
"@num_friends": 1,
"@num_coworkers": 6,
"id": "person7"
},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {
"@num_friends": 1,
"@num_coworkers": 5,
"id": "person5"
},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {
"@num_friends": 1,
"@num_coworkers": 1,
"id": "person11"
},
"v_type": "Person"
}
]}]
}
LIMIT
The optional LIMIT
clause sets constraints on the number and ranking of items included in the final result set.
LIMIT
ClauselimitClause := LIMIT ( expr | expr "," expr | expr OFFSET expr )
Each of the expression must evaluate to a non-negative integer.
To understand LIMIT
, note that the tentative result set is held in the computer as a list of vertices.
If the query has an ORDER BY
clause, the order is specified; otherwise the list order is unknown.
Assume we number the vertices as v_1
, v_2
, …, v_n
.
The LIMIT
clause specifies a range of vertices, starting from a lower position in the list to an upper position.
There are three forms:
LIMIT
scenariosresult = SELECT v FROM S -(:e)- :v LIMIT k; (1)
result = SELECT v FROM S -(:e)- :v LIMIT j, k; (2)
result = SELECT v FROM S -(:e)- :v LIMIT k OFFSET j; (3)
1 | Case 1: k = Count |
2 | Case 2: j = Offset from the start of the list, k = Count |
3 | Case 3: k = Count, j = Offset from the start of the list |
Case 1: LIMIT k
-
When a single expr is provided,
LIMIT
returns the first k elements from the tentative result set. If there are fewer than k elements available, then all elements will be returned in the result set. If k=5 and the tentative result set has at least 5 items, then the final result list will be [ v_1 , v_2 , v_3 , v_4 , v_5 ].
Case 2: LIMIT j, k
-
When a comma separates two expressions,
LIMIT
treats the first expression j as an offset. That is, it skips the first j items in the list. The second expr k tells the maximum number of items to include. If the list has at least 7 items, thenLIMIT 2, 5
would return[ v_3 , v_4 , v_5, v_6 , v_7 ]
.
Case 3: LIMIT k OFFSET j
-
The behavior of Case 3 is the same as that of Case 2, except that the syntax is different. The keyword
OFFSET
separates the two expressions, and the count comes before the offset, rather than vice versa. If the list has at least 7 items, thenLIMIT 5 OFFSET 2
would return[ v_3 , v_4 , v_5, v_6 , v_7 ]
.
If any of the expressions evaluate to a negative integer, the results are undefined.
|
The following examples demonstrate the various forms of the LIMIT clause.
The first example shows the LIMIT
clause when used as an upper limit. It returns a result set with a maximum size of 4 elements in the set.
CREATE QUERY limit_ex_1 (INT k) FOR GRAPH Friend_Net {
start = {Person.*};
result1 = SELECT v FROM start:v
ORDER BY v.id
LIMIT k;
PRINT result1[result1.id]; // api v2
}
RUN QUERY limit_ex_1(4)
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result1": [
{
"v_id": "person1",
"attributes": {"result1.id": "person1"},
"v_type": "Person"
},
{
"v_id": "person10",
"attributes": {"result1.id": "person10"},
"v_type": "Person"
},
{
"v_id": "person11",
"attributes": {"result1.id": "person11"},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {"result1.id": "person12"},
"v_type": "Person"
}
]}]
}
The following example shows how to use the LIMIT
clause with an offset.
LIMIT
with lower-bound and sizeCREATE QUERY limit_ex_2 (INT j, INT k) FOR GRAPH Friend_Net
{
start = {Person.*};
result2 = SELECT v FROM start:v
ORDER BY v.id
LIMIT j, k;
PRINT result2[result2.id]; // api v2
}
RUN QUERY limit_ex_2(2,3)
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result2": [
{
"v_id": "person11",
"attributes": {"result2.id": "person11"},
"v_type": "Person"
},
{
"v_id": "person12",
"attributes": {"result2.id": "person12"},
"v_type": "Person"
},
{
"v_id": "person2",
"attributes": {"result2.id": "person2"},
"v_type": "Person"
}
]}]
}
The following example shows the alternative syntax for a result size limit with an offset.
This time we try larger values for offset and size.
In a large data set, limitTest(5,20)
might return 20 vertices, but since we don’t have 25 vertices in the original data, the output is fewer than 20 vertices.
LIMIT
with OFFSET
CREATE QUERY limit_ex_3 (INT j, INT k) FOR GRAPH Friend_Net {
start = {Person.*};
result3 = SELECT v FROM start:v
ORDER BY v.id
LIMIT k OFFSET j;
PRINT result3[result3.id]; // api v2
}
RUN QUERY limit_ex_3(5,20)
{
"error": false,
"message": "",
"version": {
"schema": 0,
"edition": "enterprise",
"api": "v2"
},
"results": [{"result3": [
{
"v_id": "person3",
"attributes": {"result3.id": "person3"},
"v_type": "Person"
},
{
"v_id": "person4",
"attributes": {"result3.id": "person4"},
"v_type": "Person"
},
{
"v_id": "person5",
"attributes": {"result3.id": "person5"},
"v_type": "Person"
},
{
"v_id": "person6",
"attributes": {"result3.id": "person6"},
"v_type": "Person"
},
{
"v_id": "person7",
"attributes": {"result3.id": "person7"},
"v_type": "Person"
},
{
"v_id": "person8",
"attributes": {"result3.id": "person8"},
"v_type": "Person"
},
{
"v_id": "person9",
"attributes": {"result3.id": "person9"},
"v_type": "Person"
}
]}]
}