MERGE
A MERGE
job defines a query that pulls in a set of data based on the given SELECT
statement and inserts into, replaces, or deletes the data from the designated target based on the job definition. This query is then run periodically based on the RUN_INTERVAL
defined within the job.
Using the condition provided in the ON
clause, the query can insert records that do not meet the condition and replace those that do. Additionally, records that match the ON
condition and a delete condition can be deleted.
Additionally, if two jobs writing to the same table rewrite the same record, it is nondeterministic as to which job's data ends up in the table.
Note that MERGE
statements are only supported for target tables with primary key constraints.
Use the MERGE command if you need to:
Delete records
Update based on a unique constraint
Update data in a non-data lake table, like Snowflake or Elasticsearch
Syntax
Jump to
Job identifier
Valid identifiers match the following format:
Job options
Target location
Target definition
ON
clause
ON
clauseIf the ON
statement is omitted, there must be a natural join for all of the primary key and partition columns. Otherwise, there should be a column_condition
for each primary key of the target table.
For tables without globally unique keys, there must also be a column_condition
for each partition column.
column_name
must be a primary key of the target table.
expression
must only reference columns in the SELECT
statement.
MAP_COLUMNS_BY_NAME
MAP_COLUMNS_BY_NAME
The MAP_COLUMNS_BY_NAME
keyword maps columns from the SELECT
statement to the table by the names of the returned columns in the query. Columns listed after EXCEPT
are excluded from the final table.
When using MAP_COLUMNS_BY_NAME
, the columns matched from the target and source tables in the ON
condition (if not omitted) must share the same name.
If a column list is provided instead of MAP_COLUMNS_BY_NAME
, it should contain all primary keys and partition columns, and their mapping should be identical to the mapping in the ON
clause.
If nothing is specified, fields are mapped by ordinal position in the query, and fields mapped to each special column must match exactly the ON
clause.
More information on MAP_COLUMNS_BY_NAME
.
EXCEPT
EXCEPT
Columns listed after the EXCEPT
keyword are not written to your target table.
For example, you may have a column to_delete
; in order to use this column as your delete_condition
, you need to include it within the SELECT
statement. However, there is likely no meaning in having this column itself within your final target table, so using EXCEPT to_delete
allows you to have it excluded from the final output.
Example
Write into a data lake table
Note that since there is a selected column matching the name of the primary key column in the table we are merging into, the ON
clause here is optional.
Last updated