Upsert and delete use case
This article walks through a use case on how to do upserts and deletions in Upsolver.
Upsolver ingests raw data. You can upsert or delete events in the data lake when compaction takes place by adding an upsert.
When compaction takes place, only the last events per upsert key are kept.
The upsert is the identifier of each row that is updated (e.g. if you wanted to keep only the latest event per host, add the host field as the Upsert Key).
When compaction takes place, events marked for deletion based on the delete key are deleted.
To delete data from the data lake, you must have a boolean field which indicates when an event must be deleted. When this Is Delete Key field is
true, the event is deleted.
If you do not have a boolean field in the source data, you can create one by using an enrichment function that returns a boolean value.
For example, to delete all records which arrived before a specific date (in Epoch seconds), use the
LESS_THANenrichment function to create a boolean field, and then use it as the Is Delete Field.
1. In your output, click More > Manage Upserts and select your upsert key.
2. Close the upserts window.
Now if you toggle your view from UI to SQL, you will find that the upsert key appears as
REPLACE ON DUPLICATE <UPSERT KEY>after the
3. Now to add a delete key, click on More > Manage Upserts again and then select the Deletions tab.
4. Open the Delete Indicator Field dropdown and select one of the boolean fields for deletion.
Once the output is Running, to view the data in Athena, select the Properties tab and click on Go to AWS Athena Console in the sidebar.