This page explains the key factors that affect memory usage in TigerGraph, as well as how to:
Monitor memory usage
Configure memory thresholds
Manage query workload distribution
Below are the major factors that impact memory usage in TigerGraph:
The biggest factor that affects memory usage in TigerGraph is the amount of data you have. Most database operations require more memory the more data you have.
Queries that read or write a high volume of data use more memory.
In a distributed cluster, a non-distributed query can be memory-intensive for the node where the query is run. See Distributed Query Mode.
You can monitor memory usage by query and by machine.
TigerGraph’s Graph Processing Engine (GPE) records memory usage by query at different stages of the query and saves it to
$(gadmin config get System.LogRoot)/gpe/log.INFO.
You can monitor how much memory a query is using by searching the log file for the request ID and filter for lines that contain
$ grep -i <request_id> $(gadmin config get System.LogRoot)/gpe/log.INFO | grep -i "querymem"
You can also run a query first, and then run the following command immediately after to retrieve the most recent query logs and filter for
$ tail -n 50 $(gadmin config get System.LogRoot)/gpe/log.INFO | grep -i "querymem"
The logs show memory usage by the query at different stages of its execution. The number at the end of each line indicates the number of bytes of memory utilized by the query:
0415 01:33:40.885433 6553 gpr.cpp:195] Engine_MemoryStats|ldbc_snb::,196612.RESTPP_1_1.1618450420870.N,NNN,15,0,0|MONITORING Step(1) BeforeRun[GPR][QueryMem]: 116656 I0415 01:33:42.716199 6553 gpr.cpp:241] Engine_MemoryStats|ldbc_snb::,196612.RESTPP_1_1.1618450420870.N,NNN,15,0,0|MONITORING Step(1) AfterRun[GPR][QueryMem]: 117000
If you have access to Admin Portal, you can monitor memory usage by node through the cluster monitoring tool in Dashboard.
The following is a list of Linux commands to measure system memory and check for out-of-memory errors:
To check available memory on a node, run
grunto check available memory on every node.
To view CPU and memory details interactively, run
When a query is aborted, run
dmesg -T| grep -i “oom” to check for out of memory errors
[Thu Feb 4 00:41:08 2021] google_osconfig invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 (1) [Thu Feb 4 00:41:08 2021] [<ffffffffafdc208d>] oom_kill_process+0x2cd/0x490 [Thu Feb 4 00:41:08 2021] [ pid ] uid tigergraph total_vm rss nr_ptes swapents oom_score_adj name [Thu Feb 4 00:41:09 2021]  1004 20183 20200397 377046 5701 0 0 tg_dbs_restd [Thu Feb 4 00:41:09 2021] Out of memory: Kill process 20183 (tg_dbs_restd) score 239 or sacrifice child [Thu Feb 4 00:41:09 2021] Killed process 20183 (tg_dbs_restd), UID 1004, total-vm:80801588kB, anon-rss:1508400kB, file-rss:0kB, shmem-rss:0kB
1 If you see a line that says something invoked oom-killer, it means the node ran out of memory.
GPE has memory protection to prevent out-of-memory issues. There are three memory states:
There is a healthy amount of free memory. All system operations run normally for best performance.
There is a shortage of free memory. TigerGraph’s GPE starts to slow down processing new requests to allow long-running queries to finish and release memory.
There is a critical shortage of free memory. TigerGraph’s GPE starts to abort queries to ensure system stability.
TigerGraph implements memory protection thresholds through the following environment variables. By default, the thresholds are only effective when a machine has more than 30 GB of total memory:
The free memory threshold that causes TigerGraph to enter the Alert state. Default value is 30%.
The free memory threshold that causes TigerGraph to enter Critical state. Default value is 10%.
To configure these environment variables, run
gadmin config entry GPE.BasicConfig.Env.
This shows the current values of the environment variables and allows you to add new entries:
$ gadmin config entry GPE.BasicConfig.Env ✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false▐
Add your desired memory threshold configuration after the existing environment values.
Use a semicolon
; to separate the different environment variables:
✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false;SysMinFreePct=5;SysAlertFreePct=25; (1)
|1||This sets the critical threshold to 5 percent and the alert threshold to 25 percent.|
Spaces have been added to the following full example for readability.
> gadmin config entry GPE.BasicConfig.Env GPE.BasicConfig.Env [ LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu...(too long to show the full content, please use 'gadmin config get GPE.BasicConfig.Env' to get it) ]: The runtime environment variables, separated by ';' ✔ New: LD_PRELOAD=$LD_PRELOAD; LD_LIBRARY_PATH=$LD_LIBRARY_PATH; CPUPROFILE=/tmp/tg_cpu_profiler; CPUPROFILESIGNAL=12; MALLOC_CONF=prof:true,prof_active:false ; SysMinFreePct=20;SysAlertFreePct=30 (1)
|1||In this example, the user has set
After making a change, run
gadmin config apply to apply the changes and
gadmin restart gpe to restart the GPE service.
Changes will take effect after the restart.
On a distributed cluster, you can specify on which nodes you want a query to be run through the Run Query REST endpoint.