Delete Data from a Target Table

This quickstart guide describes how to use MERGE to delete data from your target table.

If your target table is a data lake table, you can define a primary key as follows:

CREATE TABLE my_catalog.my_schema.target_table (pk string)
PRIMARY KEY pk;

You can then delete data from your target table by running a job with the following format:

CREATE JOB my_merge_job
AS MERGE INTO my_catalog.my_schema.target_table AS target
USING (
    SELECT f1 AS field1, f2 AS field2, f3 AS field3, pk, to_delete
    FROM my_catalog.my_schema.source_table
    WHERE $time BETWEEN execution_start_time() AND execution_end_time()
) source
ON target.pk = source.pk
WHEN MATCHED AND to_delete THEN DELETE
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME EXCEPT to_delete;

This job first matches your data based on the columns provided in the ON condition. If the values match and the delete condition to_delete is met, Upsolver deletes the data from your target table.

If you omit the ON statement, there should be a natural join for all primary key and partition columns. In this example, the name of the primary key column pk in the target table matches the name of a column in the source table, so this query works even if the ON statement is omitted.

If the column names are different, you can rename the source table column in the SELECT statement:

CREATE JOB my_merge_job
AS MERGE INTO my_catalog.my_schema.target_table AS target
USING (
    SELECT f1 AS field1, f2 AS field2, f3 AS field3, primary_key AS pk, to_delete
    FROM my_catalog.my_schema.source_table
    WHERE $time BETWEEN execution_start_time() AND execution_end_time()
) source
ON target.pk = source.pk
WHEN MATCHED AND to_delete THEN DELETE
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME

Another option is to merge using (column, ...) values (expression, ...) instead:

CREATE JOB my_merge_job
AS MERGE INTO my_catalog.my_schema.target_table AS target
USING (
    SELECT f1 AS field1, f2 AS field2, f3 AS field3, primary_key, to_delete
    FROM my_catalog.my_schema.source_table
    WHERE $time BETWEEN execution_start_time() AND execution_end_time()
) source
ON target.pk = source.primary_key
WHEN MATCHED AND to_delete THEN DELETE
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN 
    INSERT (col1, col2, col3, pk) 
    VALUES (field1, field2, field3, primary_key)

By explicitly mapping the values in the primary_key column of the source table to the pk column of the target table, you can match your data using these two columns even though they do not share the same name.

Last updated