Links

Snowflake

Syntax

[ RUN_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ START_FROM = { NOW | BEGINNING | timestamp } ]
[ END_AT = { NOW | timestamp } ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ ALLOW_CARTESIAN_PRODUCTS = { TRUE | FALSE } ]
[ AGGREGATION_PARALLELISM = <integer> ]
[ RUN_PARALLELISM = <integer> ]
[ CUSTOM_INSERT_EXPRESSIONS = (<column_name> = '<expression>', ...) ]
[ CUSTOM_UPDATE_EXPRESSIONS = (<column_name> = '<expression>', ...) ]
[ KEEP_EXISTING_VALUES_WHEN_NULL = { TRUE | FALSE } ]
[ COMMENT = '<comment>' ]

Jump to

Job options

RUN_INTERVAL

Value: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }
Default: 1 MINUTE
(Optional) How often the job runs.
The runs take place over a set period of time defined by this interval and they must be divisible by the number of hours in a day.
For example, you can set RUN_INTERVAL to 2 hours (the job runs 12 times per day), but trying to set RUN_INTERVAL to 5 hours would fail since 24 hours is not evenly divisible by 5.

START_FROM

Values: { NOW | BEGINNING | timestamp }
Default: BEGINNING
(Optional) Configures the time to start inserting data from. Data before the specified time is ignored.
If set as a timestamp, it should be aligned to the RUN_INTERVAL.
For example, if RUN_INTERVAL is set to 5 minutes, then you can set a start time of 12:05 PM but not 12:03 PM. Additionally, the timestamp should be based in UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'.
If set to NOW or BEGINNING, the job runs from the previous full period. For example, if the current time is 12:03 PM, creating the job with a RUN_INTERVAL of 5 minutes starting from NOW means that the first task executed by the job starts from 12:00 PM.

END_AT — editable

Values: { NOW | timestamp }
Default: Never
(Optional) Configures the time to stop inserting data. Data after the specified time is ignored.
If set as a timestamp, it should be aligned to the RUN_INTERVAL.
For example, if RUN_INTERVAL is set to 5 minutes, then you can set an end time of 12:05 PM but not 12:03 PM. Additionally, the timestamp should be based in UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'.
If set to NOW, the job runs up until the previous full period. For example, if the current time is 12:03 PM, creating the job with a RUN_INTERVAL of 5 minutes ending at NOW means that the last task executed by the job ends at 12:00 PM.

COMPUTE_CLUSTER — editable

Type: identifier
Default: The sole cluster in your environment
(Optional) The compute cluster to run this job.
This option can only be omitted when there is just one cluster in your environment.
Once you have more than one compute cluster, you are required to provide which one to use through this option.

ALLOW_CARTESIAN_PRODUCTS

Type: boolean
Default: false
(Optional) When true, flattening unrelated arrays may lead to Cartesian products in your final result.
See: UNNEST

AGGREGATION_PARALLELISM — editable

Type: integer
Default: 1
(Optional) Only supported when the query contains aggregations. Formally known as "output sharding."

RUN_PARALLELISM — editable

Type: integer
Default: 1
(Optional) Controls how many jobs run in parallel to process a single minute of data from the source table.
Increasing this can lower the end-to-end latency if you have lots of data per minute.

CUSTOM_INSERT_EXPRESSIONS — editable

Type: array[(column, expression)]
Default: ()
(Optional) Configure a list of custom expression transformations to apply to the value of each column when inserting unmatched (new) rows. Note this is only used in Merge Jobs.
Note: You can use {} as a placeholder for the mapped value from the select statement.
Example:
The following example will set the INSERT_TIME column of the target table to the value of CURRENT_TIMESTAMP() and will add 1 to the value of some_value when inserting new rows.
CREATE SYNC JOB my_merge_job
START_FROM = BEGINNING
RUN_INTERVAL = 1 MINUTE
CUSTOM_INSERT_EXPRESSIONS = (INSERT_TIME = 'CURRENT_TIMESTAMP()', MY_VALUE = '{} + 1')
AS MERGE INTO SNOWFLAKE snow.PUBLIC.MY_TABLE AS target
USING (SELECT id AS ID, some_value as MY_VALUE
FROM glue.staging.my_staging_table
WHERE $event_time BETWEEN run_start_time() AND run_end_time()) source
ON (target.ID = source.ID)
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME;

CUSTOM_UPDATE_EXPRESSIONS — editable

Type: array[(column, expression)]
Default: ()
(Optional) Configure a list of custom expression transformations to apply to the value of each column when updating matched rows. Note this is only used in Merge Jobs.
Note: You can use {} as a placeholder for the mapped value from the select statement.
Example:
The following example will set the UPDATE_TIME column of the taget table to the value of CURRENT_TIMESTAMP() and will add 1 to the value of some_value when updating existing rows.
CREATE SYNC JOB my_merge_job
START_FROM = BEGINNING
RUN_INTERVAL = 1 MINUTE
CUSTOM_UPDATED_EXPRESSIONS = (UPDATE_TIME = 'CURRENT_TIMESTAMP()', MY_VALUE = '{} + 1')
AS MERGE INTO SNOWFLAKE snow.PUBLIC.MY_TABLE AS target
USING (SELECT id AS ID, some_value as MY_VALUE
FROM glue.staging.my_staging_table
WHERE $event_time BETWEEN run_start_time() AND run_end_time()) source
ON (target.ID = source.ID)
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME;

KEEP_EXISTING_VALUES_WHEN_NULL — editable

Type: boolean
Default: false
(Optional) If enabled, updates to the table preserve the previous non-null value. This option is useful if your update events only contain values for modified columns. This works by coalescing the new value with the existing value. If the new value is null the previous value will be preserved. This means that updating values to null is not supported.

COMMENT — editable

Type: text
(Optional) A description or comment regarding this job.
Last modified 29d ago