Job options

This section describes the job options you can use to transform and write data to your target location.

The following job options can be applied to all targets. See the links below for target-specific job options.

[ AGGREGATION_PARALLELISM = <integer> ]
[ COMMENT = '<comment>' ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ END_AT = { NOW | timestamp } ]
[ RUN_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ RUN_PARALLELISM = <integer> ]
[ START_FROM = { NOW | BEGINNING | timestamp } ]

Jump to

AGGREGATION_PARALLELISM — editable

Type: integer

Default: 1

(Optional) Only supported when the query contains aggregations. Formally known as "output sharding."

COMMENT — editable

Type: text

(Optional) A description or comment regarding this job.

COMPUTE_CLUSTER — editable

Type: identifier

Default: The sole cluster in your environment

(Optional) The compute cluster to run this job.

This option can only be omitted when there is just one cluster in your environment.

Once you have more than one compute cluster, you are required to provide which one to use through this option.

END_AT — editable

Values: { NOW | timestamp }

Default: Never

(Optional) Configures the time to stop inserting data. Data after the specified time is ignored.

If set as a timestamp, it should be aligned to the RUN_INTERVAL.

For example, if RUN_INTERVAL is set to 5 minutes, then you can set an end time of 12:05 PM but not 12:03 PM. Additionally, the timestamp should be based in UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'.

If set to NOW, the job runs up until the previous full period. For example, if the current time is 12:03 PM, creating the job with a RUN_INTERVAL of 5 minutes ending at NOW means that the last task executed by the job ends at 12:00 PM.

RUN_INTERVAL

Value: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }

Default: 1 MINUTE

(Optional) How often the job runs.

The runs take place over a set period of time defined by this interval and they must be divisible by the number of hours in a day.

For example, you can set RUN_INTERVAL to 2 hours (the job runs 12 times per day), but trying to set RUN_INTERVAL to 5 hours would fail since 24 hours is not evenly divisible by 5.

RUN_PARALLELISM — editable

Type: integer

Default: 1

(Optional) Controls how many jobs run in parallel to process a single minute of data from the source table.

Increasing this can lower the end-to-end latency if you have lots of data per minute.

START_FROM

Values: { NOW | BEGINNING | timestamp }

Default: BEGINNING

(Optional) Configures the time to start inserting data from. Data before the specified time is ignored.

If set as a timestamp, it should be aligned to the RUN_INTERVAL.

For example, if RUN_INTERVAL is set to 5 minutes, then you can set a start time of 12:05 PM but not 12:03 PM. Additionally, the timestamp should be based in UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'.

If set to NOW or BEGINNING, the job runs from the previous full period. For example, if the current time is 12:03 PM, creating the job with a RUN_INTERVAL of 5 minutes starting from NOW means that the first task executed by the job starts from 12:00 PM.

Target-specific job options

Visit the pages below for target-specific job options:

Last updated