Kafka Loader Overview

Kafka is a popular pub-sub system in enterprise IT, offering a distributed and fault-tolerant real-time data pipeline. The Kafka Loader lets you integrate TigerGraph with a Kafka cluster and speed up your real-time data ingestion. The Kafka Loader is easily extensible using the many plugins available in the Kafka ecosystem.

The Kafka Loader consumes data in a Kafka cluster and loads data into the TigerGraph system. Multiple loading jobs created with the Kafka Loader can run at the same time to stream data from various sources into the TigerGraph system concurrently.

Architecture

From a high level, a user provides instructions to the TigerGraph system through GSQL, and the external Kafka cluster loads data into TigerGraph’s RESTPP server. The following diagram demonstrates the Kafka Loader data architecture.

Diagram of the Kafka Loader showing User Input going through a GSQL server into a RESTPP Server. A Kafka cluster data source is also shown connected to the RestPP server. Both servers are labeled as the TigerGraph System.
Figure 1. User input feeds into the RESTPP server via a GSQL server. A Kafka cluster feeds directly into the RESTPP server.

The Kafka loader doesn’t use consumer groups, and therefore doesn’t support HA. If a loading job is interrupted, you need to manually resume the loading job.

A resumed job picks up from where it was stopped before and will not consume messages that have already been consumed.

Supported file formats

  • Avro

  • CSV

  • JSON

If the Kafka loader fails to parse messages due to formatting issues, the Kafka Loader skips the messages the fail to be parsed.