Snowflake
[ RUN_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ START_FROM = { NOW | BEGINNING | timestamp } ]
[ END_AT = { NOW | timestamp } ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ ALLOW_CARTESIAN_PRODUCTS = { TRUE | FALSE } ]
[ AGGREGATION_PARALLELISM = <integer> ]
[ RUN_PARALLELISM = <integer> ]
[ CUSTOM_INSERT_EXPRESSIONS = (<column_name> = '<expression>', ...) ]
[ CUSTOM_UPDATE_EXPRESSIONS = (<column_name> = '<expression>', ...) ]
[ KEEP_EXISTING_VALUES_WHEN_NULL = { TRUE | FALSE } ]
[ COMMENT = '<comment>' ]
Value:
<integer> { MINUTE[S] | HOUR[S] | DAY[S] }
Default:
1 MINUTE
(Optional) How often the job runs.
The runs take place over a set period of time defined by this interval and they must be divisible by the number of hours in a day.
For example, you can set
RUN_INTERVAL
to 2 hours (the job runs 12 times per day), but trying to set RUN_INTERVAL
to 5 hours would fail since 24 hours is not evenly divisible by 5.Values:
{ NOW | BEGINNING | timestamp }
Default:
BEGINNING
(Optional) Configures the time to start inserting data from. Data before the specified time is ignored.
For example, if
RUN_INTERVAL
is set to 5 minutes, then you can set a start time of 12:05 PM but not 12:03 PM. Additionally, the timestamp should be based in UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'
.If set to
NOW
or BEGINNING
, the job runs from the previous full period. For example, if the current time is 12:03 PM, creating the job with a RUN_INTERVAL
of 5 minutes starting from NOW
means that the first task executed by the job starts from 12:00 PM.Values:
{ NOW | timestamp }
Default: Never
(Optional) Configures the time to stop inserting data. Data after the specified time is ignored.
For example, if
RUN_INTERVAL
is set to 5 minutes, then you can set an end time of 12:05 PM but not 12:03 PM. Additionally, the timestamp should be based in UTC and in the following format: TIMESTAMP 'YYYY-MM-DD HH:MM:SS'
.If set to
NOW
, the job runs up until the previous full period. For example, if the current time is 12:03 PM, creating the job with a RUN_INTERVAL
of 5 minutes ending at NOW
means that the last task executed by the job ends at 12:00 PM.Type:
identifier
Default: The sole cluster in your environment
(Optional) The compute cluster to run this job.
This option can only be omitted when there is just one cluster in your environment.
Once you have more than one compute cluster, you are required to provide which one to use through this option.
Type:
boolean
Default:
false
(Optional) When
true
, flattening unrelated arrays may lead to Cartesian products in your final result.Type:
integer
Default:
1
(Optional) Only supported when the query contains aggregations. Formally known as "output sharding."
Type:
integer
Default:
1
(Optional) Controls how many jobs run in parallel to process a single minute of data from the source table.
Increasing this can lower the end-to-end latency if you have lots of data per minute.
Type:
array[(column, expression)]
Default:
()
(Optional) Configure a list of custom expression transformations to apply to the value of each column when inserting unmatched (new) rows. Note this is only used in Merge Jobs.
Note: You can use
{}
as a placeholder for the mapped value from the select statement.Example:
The following example will set the INSERT_TIME column of the target table to the value of
CURRENT_TIMESTAMP()
and will add 1
to the value of some_value
when inserting new rows.CREATE SYNC JOB my_merge_job
START_FROM = BEGINNING
RUN_INTERVAL = 1 MINUTE
CUSTOM_INSERT_EXPRESSIONS = (INSERT_TIME = 'CURRENT_TIMESTAMP()', MY_VALUE = '{} + 1')
AS MERGE INTO SNOWFLAKE snow.PUBLIC.MY_TABLE AS target
USING (SELECT id AS ID, some_value as MY_VALUE
FROM glue.staging.my_staging_table
WHERE $event_time BETWEEN run_start_time() AND run_end_time()) source
ON (target.ID = source.ID)
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME;
Type:
array[(column, expression)]
Default:
()
(Optional) Configure a list of custom expression transformations to apply to the value of each column when updating matched rows. Note this is only used in Merge Jobs.
Note: You can use
{}
as a placeholder for the mapped value from the select statement.Example:
The following example will set the UPDATE_TIME column of the taget table to the value of
CURRENT_TIMESTAMP()
and will add 1
to the value of some_value
when updating existing rows.CREATE SYNC JOB my_merge_job
START_FROM = BEGINNING
RUN_INTERVAL = 1 MINUTE
CUSTOM_UPDATED_EXPRESSIONS = (UPDATE_TIME = 'CURRENT_TIMESTAMP()', MY_VALUE = '{} + 1')
AS MERGE INTO SNOWFLAKE snow.PUBLIC.MY_TABLE AS target
USING (SELECT id AS ID, some_value as MY_VALUE
FROM glue.staging.my_staging_table
WHERE $event_time BETWEEN run_start_time() AND run_end_time()) source
ON (target.ID = source.ID)
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME;
Type:
boolean
Default:
false
(Optional) If enabled, updates to the table preserve the previous non-null value. This option is useful if your update events only contain values for modified columns. This works by coalescing the new value with the existing value. If the new value is null the previous value will be preserved. This means that updating values to
null
is not supported. Type:
text
(Optional) A description or comment regarding this job.
Last modified 29d ago