Set up Workbench with Kafka Streaming (Enterprise Only)

ML Workbench offers two different data transfer protocols for exporting data from your TigerGraph Server instance: HTTP and Kafka streaming. HTTP data is limited to 2GB of data export at a time, while Kafka streaming does not have this limitation. Kafka streaming is only available for the Enterprise edition.

The ML Workbench supports PLAINTEXT, SASL_PLAINTEXT, and SASL_SSL authentication methods with your kafka cluster.

The default configurations of this procedure installs Kafka, ML Workbench, as well as TigerGraph Server. If you already have instances of these services running, you can remove the corresponding service from the configuration.

1. Prerequisites

  • Docker is installed on your machine.

  • Your machine needs to have the following ports open:

    • 2181.

    • 19092.

    • 9092. if you configure Kafka to be listening on another port, then whichever taht port is must be open instead of 9092.

  • You have purchased the ML Workbench Enterprise Edition.

This procedure uses docker-compose and therefore requires Docker to be installed on your machine. You can also set up Kafka directly on your machine without using Docker, or set up a Kafka cluster on your a Kubernetes container management system on any cloud provider. If you need help setting up Kafka without Docker, please open a support ticket.

2. Procedure

2.1. Install Kafka and Workbench

The first step is to install Kafka on your machine. If you already have a running Kafka cluster, you can skip this step.

  1. Download docker-compose.yaml.

    • The default configurations assume that you do not have a running TigerGraph instance or installed ML Workbench on your machine.

  2. If you already have a TigerGraph Server instance running, or you have already installed ML Workbench, remove the services tigergraph or mlwb from the yaml file, respectively.

  3. Replace <public-ip-address> on line 24 with the IP address of your machine. This IP address needs to be accessible both by your TigerGraph Server instance and ML Workbench.

  4. In the same directory as the docker-compose.yaml file, run docker-compose up -d.

2.2. Configure Workbench to use Kafka Streaming

After configuring the connection between the Workbench and your TigerGraph instance, provide the address of your Kafka instance to the ML Workbench:

conn.gds.configureKafka(kafka_address="<kafka-ip-address>:9092") (1)
1 Replace <kafka-ip-address> with the address you provided in the docker-compose.yaml file.

2.3. Activate ML Workbench Enterprise edition

You can skip this step if you have already activated the Enterprise Edition.

Only the Enterprise edition of ML Workbench can stream data via Kafka from a TigerGraph Server instance. If you have purchased the Enterprise edition, you can use the credentials you received from the TigerGraph Sales team to obtain the activator and how to use it to activate the Enterprise edition.

3. Examples

Refer to the Data Loading tutorial for more examples on how to configure the dataloader to work with your Kafka cluster.