Upsert Data to a Target Table
This quickstart article describes how to use INSERT and MERGE to update a row or insert a new row in your target table.
Upserting is a term we use to describe row transformation. When you create a job to upsert data, the job performs one of two functions:
Inserts a row in your target table with new data
Updates an existing row with the new data
You can us the INSERT
or MERGE
functions to upsert data. Examples of both are below.
Upsert using INSERT
To upsert data, your target table should have at least one primary key to match the rows between your source and target tables.
If your target table is an Upsolver-managed table, you can define a primary key as follows:
CREATE TABLE default_glue_catalog.upsolver_samples.orders_upsert_with_insert
(customer_email string)
PRIMARY KEY customer_email;
You can then insert your data into your target table. To do this, run the following job:
CREATE JOB insert_orders_upsert
START_FROM = BEGINNING
ADD_MISSING_COLUMNS = TRUE
RUN_INTERVAL = 1 MINUTE
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_upsert_with_insert
MAP_COLUMNS_BY_NAME
Having a primary key in the target table means that if a new row arrives that shares a primary key value with an existing row, the old row is updated; otherwise, Upsolver inserts the new row into the table.
Finally, use a SELECT
statement to apply transformations to your data:
SELECT customer.email AS customer_email,
COUNT(DISTINCT orderid) AS number_of_orders,
SUM(nettotal) AS total_sales,
MIN(orderdate) AS first_purchase,
MAX(orderdate) AS last_purchase
FROM default_glue_catalog.upsolver_samples.orders_raw_data
WHERE $commit_time BETWEEN run_start_time() AND run_end_time()
GROUP BY 1
HAVING COUNT(DISTINCT orderid::string) > 1;
Query your data with the following statement.
SELECT *
FROM default_glue_catalog.upsolver_samples.orders_upsert_with_insert
LIMIT 10;
Upsert using MERGE
As with the INSERT command, to upsert your data your target table should have at least one primary key to match the rows between your source and target tables.
Also as before, if your target table is an Upsolver-managed table, you can define a primary key as follows:
CREATE TABLE default_glue_catalog.upsolver_samples.orders_upsert_with_merge
(customer_email string)
PRIMARY KEY customer_email;
You can then merge your data into your target table by running the following job:
CREATE JOB merge_orders_upsert
START_FROM = BEGINNING
ADD_MISSING_COLUMNS = TRUE
RUN_INTERVAL = 1 MINUTE
AS MERGE INTO
default_glue_catalog.upsolver_samples.orders_upsert_with_merge AS target
Finally, apply transformations to your data using a SELECT
statement.
USING (
SELECT customer.email AS customer_email,
COUNT(DISTINCT orderid) AS number_of_orders,
SUM(nettotal) AS total_sales,
MIN(orderdate) AS first_purchase,
MAX(orderdate) AS last_purchase
FROM default_glue_catalog.upsolver_samples.orders_raw_data
WHERE $commit_time BETWEEN run_start_time() AND run_end_time()
GROUP BY 1
HAVING COUNT(DISTINCT orderid::string) > 1
) source
ON (target.customer_email = source.customer_email)
-- Update if primary keys match
WHEN MATCHED THEN REPLACE
-- Insert if primary key is unique (new record)
WHEN NOT MATCHED THEN INSERT MAP_COLUMNS_BY_NAME;
Query your data with the following statement.
SELECT *
FROM default_glue_catalog.upsolver_samples.orders_upsert_with_merge
LIMIT 10;
Last updated