UPSERT streaming data to Amazon Athena
Last updated
Last updated
An UPSERT
(UPDATE
/ INSERT
) is a common occurrence in Relational databases, but not as common in Data Warehouses and Data Lakes, which are typically append only. Upsolver provides UPSERT
functionality on systems that do not typically allow this functionality.
You can read more about how Upsolver accomplishes this in this article.
In addition to the shopping table, In this section we will create an S3 data source for shipping data. Follow the below steps to to build an output into Amazon Athena.
The output should be an UPSERTing table that contains the most recent shipping status for every order. Sample data:
Click on DATA OUTPUT > NEW and SELECT Amazon Athena output. Give the output a NAME > shipping and choose DATA SOURCES > Orders Click on NEXT to continue.
Map orderId, trackingNumber, and shipmentStatus to Amazon Athena table by clicking on the + sign next to each field.
Click on MORE > Manage Upserts find the orderid field and check under USE AS KEY. Click on CLOSE. Switch over to the SQL UI and click on PREVIEW to make sure everything looks as expected. Click on RUN.
Choose the DATABASE NAME > upsolver and TABLE NAME > shipping Click on NEXT.
Leave everything on the Run page as default and click on DEPLOY.
The output is continuous stream by leaving ENDING AT as Never,
Wait for the data stream start writing to Amazon Athena tables.
When Upsolver creates an upserting output to Athena, it creates two objects: a table with the suffice _underlying_data
and a view with the output name. It's recommended to query the VIEW, and not the underlying table.
Query the shipping data set created by Upsolver (it may take a few minutes before the data is available).
Sample queries: