UPSERT streaming data to Amazon Athena
An
UPSERT
(UPDATE
/ INSERT
) is a common occurrence in Relational databases, but not as common in Data Warehouses and Data Lakes, which are typically append only. Upsolver provides UPSERT
functionality on systems that do not typically allow this functionality.In addition to the shopping table, In this section we will create an S3 data source for shipping data. Follow the below steps to to build an output into Amazon Athena.
The output should be an UPSERTing table that contains the most recent shipping status for every order. Sample data:
{
"time": "2020/07/07 01:10:00",
"orderId": "abc",
"trackingNumber": "Z1234ABC",
"shipmentStatus": "SHIPPED"
}
Click on DATA OUTPUT > NEW and SELECT Amazon Athena output. Give the output a NAME > shipping and choose DATA SOURCES > Orders Click on NEXT to continue.

Map orderId, trackingNumber, and shipmentStatus to Amazon Athena table by clicking on the + sign next to each field.

Click on MORE > Manage Upserts find the orderid field and check under USE AS KEY. Click on CLOSE. Switch over to the SQL UI and click on PREVIEW to make sure everything looks as expected. Click on RUN.

// Copy and paste the UPSERT SQL to your UI if it looks different
// from your environment.
SELECT data.orderId AS orderid:STRING,
time AS processing_time:TIMESTAMP,
data.shipmentStatus AS shipmentstatus:STRING,
data.trackingNumber AS trackingnumber:STRING
FROM "Orders"
REPLACE ON DUPLICATE data.orderId
Choose the DATABASE NAME > upsolver and TABLE NAME > shipping Click on NEXT.

Leave everything on the Run page as default and click on DEPLOY.
The output is continuous stream by leaving ENDING AT as Never,

Wait for the data stream start writing to Amazon Athena tables.

When Upsolver creates an upserting output to Athena, it creates two objects: a table with the suffice
_underlying_data
and a view with the output name. It's recommended to query the VIEW, and not the underlying table.Query the shipping data set created by Upsolver (it may take a few minutes before the data is available).
Sample queries:
// What is the current shipping status of order 57459731866?
SELECT shipmentstatus
FROM upsolver.shipping
WHERE orderid = '57459731866'

What is the current shipping status of order 57459731866?
// What are the tracking numbers of all items that have been shipped?
SELECT trackingnumber
FROM shipping
WHERE shipmentstatus = 'SHIPPED'

What are the tracking numbers of all items that have been shipped?