The Kafka Data Source is a streaming data source which will ingest data directly from the provided Kafka topic into Upsolver. For the purposes of this guide we will assume you already have a running Kafka cluster and a topic you would like to ingest.
Creating A New Kafka Data Source
- On the Data Sources page click Add New Data Source.
- Select Kafka from the list.
- Fill in a name for your Data Source.
- In the Version drop down menu select the appropriate version for your Kafka cluster.
- In the Kafka Hosts provide a comma separated list of the public IPs or DNS names of your Kafka servers with the port specified. For example
- In the Kafka Topic field enter the name of the topic you would like to ingest.
- If you would like to ingest the topic from the beginning check the box Read From Start. If this checkbox is left unchecked the ingestion will start from the head (latest data) of the topic.
- In the Content Format drop down menu select the format of the messages in the Kafka topic.
- Select the Compute Cluster you would like this Data Source to run on. If you have not created a Compute Cluster yet, you can create one from the drop down menu.
- You can choose to change where the ingested data from this source will be saved by changing the Storage field in the Advanced section.
- If your topic contains large amounts of data you should increase the parallelism value in the Advanced section. A recommended value would be to increase it by 1 for every 50 MB/s sent to your topic. For example, if your topic ingests 360MB/s (equivalent to 30TB per day), then you should set the parallelism to 8. If your process is sensitive to processing delays, it is recommended to set this value according to the peak data rate. Otherwise, it can be set to the average expected daily data rate. This value should be set to accommodate the maximum data rate expected, since it can't be changed later.
In order for Upsolver to read from your Kafka cluster you will need to open the cluster's security settings to allow polling from our servers. To get the list of IP's you will need to allow please contact our support. For more information see Kafka Permissions.