Delete data from your target table

This quickstart guide describes how to use MERGE to delete data from your target table.

To delete data from your target table, create a transformation job that includes a MERGE statement. The job requires that your target table has at least one primary key to match rows between source and target.

If your target table is a data lake table, you can define a primary key as follows:

CREATE TABLE my_catalog.my_schema.target_table(
    pk string)
    PRIMARY KEY pk;

You can then delete data from your target table by running a job with the following format:

CREATE JOB my_merge_job
AS MERGE INTO my_catalog.my_schema.target_table AS target
USING (
    SELECT f1 AS field1, f2 AS field2, f3 AS field3, pk, to_delete
    FROM my_catalog.my_schema.source_table
    WHERE $time BETWEEN execution_start_time() AND execution_end_time()) source
ON target.pk = source.pk
WHEN MATCHED AND to_delete THEN DELETE
WHEN MATCHED THEN REPLACE
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME EXCEPT to_delete;

This job first matches your data based on the columns provided in the ON condition. If the values match and the delete condition to_delete is met, Upsolver deletes the data from your target table.

If you omit the ON statement, there should be a natural join for all primary key and partition columns. In this example, the name of the primary key column pk in the target table matches the name of a column in the source table, so this query works even if the ON statement is omitted.

Note that we've included the EXCEPT clause here, indicating that the specified column to_delete should be excluded and not inserted into the target table.

For the rest of the columns, the MAP_COLUMNS_BY_NAME keyword merges data from the source table to the target table based on matching column names.

If the column names are different, you can rename the source table column in the SELECT statement:

CREATE JOB my_merge_job
    AS MERGE INTO my_catalog.my_schema.target_table AS target
    USING (SELECT f1 AS field1, f2 AS field2, f3 AS field3, primary_key as pk, to_delete
               FROM my_catalog.my_schema.source_table
               WHERE $time BETWEEN execution_start_time() AND execution_end_time()) source
    ON target.pk = source.pk
    WHEN MATCHED AND to_delete THEN DELETE
    WHEN MATCHED THEN REPLACE
    WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME

Another option is to merge using (column, ...) values (expression, ...) instead:

CREATE JOB my_merge_job
    AS MERGE INTO my_catalog.my_schema.target_table AS target
    USING (SELECT f1 AS field1, f2 AS field2, f3 AS field3, primary_key, to_delete
               FROM my_catalog.my_schema.source_table
               WHERE $time BETWEEN execution_start_time() AND execution_end_time()) source
    ON target.pk = source.primary_key
    WHEN MATCHED AND to_delete THEN DELETE
    WHEN MATCHED THEN REPLACE
    WHEN NOT MATCHED THEN INSERT (col1, col2, col3, pk) VALUES (field1, field2, field3, primary_key)

By explicitly mapping the values in the primary_key column of the source table to the pk column of the target table, you can match your data using these two columns even though they do not share the same name.

Last updated