Upsert and Delete Use Case

This use case builds on the AWS S3 to Athena Simple Use Case. Upsolver ingests raw data. You can upsert or delete events in the data lake when compaction takes place. This is done by adding an upsert key (and a delete key).

Upserts: When compaction takes place, only the last events per upsert key are kept. The upsert is the identifier of each row that is updated, for example, if you wanted to keep only the latest event per host, add the host field as the UPSERT KEY.
Deletions: When compaction takes place, events marked for deletion based on the delete key are deleted. To delete data from the data lake, you must have a Boolean field which indicates when an event must be deleted. When this IS DELETE KEY field is true, the event is deleted.

If you do not have a field in the source data, create one using an enrichment function which returns a Boolean value. For example, to delete all records which arrived before a specific date (in Epoch seconds), use the LESS_THAN enrichment function to create the Boolean field, and then use it as the IS DELETE FIELD.

Required Level of Expertise

Intermediate

To add upsert and delete keys:

1. Open the simple use case output.
2. Click UPSERT KEY and select the required field for upserts.
3. Click SQL. In Transform with SQL the upsert key appears as

REPLACE ON DUPLICATE <UPSERT KEY> after the SELECT statement.

4. Click UI.
5. Click IS DELETE FIELD and select one of the Boolean fields for deletion.
6. Run the output. See Running an Output.The table is created in Athena.
7. When the output is RUNNING, select the PROPERTIES tab.
8. To view the data in Athena, click Go to AWS Athena Console. Data arrives within a couple of minutes.