Links

Kafka options

Syntax

CREATE [SYNC] JOB <job_name>
[{ job_options }]
AS COPY FROM KAFKA
<connection_identifier>
[{ source_options }]
INTO <table_identifier>;

Jump to

Job options

[ CONSUMER_PROPERTIES = '<properties>' ]
[ READER_SHARDS = <integer> ]
[ STORE_RAW_DATA = { TRUE | FALSE } ]
[ START_FROM = { NOW | BEGINNING } ]
[ END_AT = { NOW | <timestamp> } ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ RUN_PARALLELISM = <integer> ]
[ CONTENT_TYPE = { AUTO
| CSV
| JSON
| PARQUET
| TSV
| AVRO
| AVRO_SCHEMA_REGISTRY
| FIXED_WIDTH
| REGEX
| SPLIT_LINES
| ORC
| XML } ]
[ COMPRESSION = { AUTO
| GZIP
| SNAPPY
| LZO
| NONE
| SNAPPY_UNFRAMED
| KCL } ]
[ COMMENT = '<comment>' ]

Jump to

CONSUMER_PROPERTIES — editable

Type: text_area
(Optional) Additional properties to use when configuring the consumer. This overrides any settings in the Kafka connection.

READER_SHARDS — editable

Type: integer
Default: 1
(Optional) Determines how many readers are used in parallel to read the stream.
This number does not need to equal your number of partitions in Kafka.
A recommended value would be to increase it by 1 for every 70 MB/s in sent to your topic.

STORE_RAW_DATA

Type: boolean
Default: false
(Optional) When true, stores an additional copy of the data in its original format.

START_FROM

Values: { NOW | BEGINNING }
Default: BEGINNING
(Optional) Configures the time to start ingesting data from. Files before the specified time are ignored.

END_AT — editable

Values: { NOW | <timestamp> }
Default: Never
(Optional) Configures the time to stop ingesting data. Files after the specified time are ignored. Timestamps provided should be based on UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'.

COMPUTE_CLUSTER — editable

Type: identifier
Default: The sole cluster in your environment
(Optional) The compute cluster to run this job.
This option can only be omitted when there is just one cluster in your environment.
Once you have more than one compute cluster, you are required to provide which one to use through this option.

RUN_PARALLELISM — editable

Type: integer
Default: 1
(Optional) The number of parser jobs to run in parallel per minute.

CONTENT_TYPE — editable

Values: { AUTO | CSV | JSON | PARQUET | TSV | AVRO | AVRO_SCHEMA_REGISTRY | FIXED_WIDTH | REGEX | SPLIT_LINES | ORC | XML }
Default: AUTO
(Optional) The file format of the content being read.
Note that AUTO only works when reading Avro, JSON, or Parquet.
To configure additional options for certain content types, see: Content type options

COMPRESSION

Values: { AUTO | GZIP | SNAPPY | LZO | NONE | SNAPPY_UNFRAMED | KCL }
Default: AUTO
(Optional) The compression of the source.

COMMENT — editable

Type: text
(Optional) A description or comment regarding this job.

Source options

TOPIC_NAME = '<topic_name>'

TOPIC_NAME

Type: text
The topic to read from.

Example