Handle PostgreSQL TOAST Values
This page describes how to handle PostgreSQL TOAST values with Upsolver.
Overview
PostgreSQL employs TOAST (The Oversized-Attribute Storage Technique) to efficiently store large objects such as strings or byte arrays that exceed a specific size. While TOAST is often transparent to the user, it gains importance when implementing change data capture (CDC) and logical replication functionalities. This is particularly relevant when using CDC engines like Debezium in conjunction with Upsolver.
Key Behavior: TOAST and Debezium
When Debezium captures a changed row from a PostgreSQL database, it omits unchanged TOASTed values unless they are explicitly part of the table's replica identity. If a TOAST column's value remains unchanged but appears in the logical replication message, the Debezium Postgres connector substitutes the value with a placeholder: __debezium_unavailable_value
.
Example Scenario: Replicating PostgreSQL to Snowflake
When replicating data from PostgreSQL to Snowflake, you may encounter that the __debezium_unavailable_value
placeholder replaces the original TOAST value in the Snowflake target table.
Step-by-Step Guide
1. PostgreSQL Table Setup
Create a table in PostgreSQL to store large data:
2. Insert Row with TOASTed Value
Insert a row into the table with a large text that will be TOASTed.
3. Configuration an Ingestion Job
Configure an ingestion job in Upsolver to replicate the data.
4. Modify Row Without Changing TOASTed Column
5. Query Upsolver SQLake Table
You will see that updates to the big_text
column display __debezium_unavailable_value
.
Potential Workarounds
Option 1: Set Replica Identity to FULL
To keep track of unchanged TOASTed columns, set the replica identity of the PostgreSQL table to FULL.
Option 2: Leverage CUSTOM_UPDATE_EXPRESSIONS
CUSTOM_UPDATE_EXPRESSIONS
In a Snowflake Merge job, use the CUSTOM_UPDATE_EXPRESSIONS
parameter to conditionally replace __debezium_unavailable_value
with the original value.
Caveat
If both an insert and an update operation for the same row are ingested into Snowflake within the same COMMIT_INTERVAL
, the __debezium_unavailable_value
placeholder will persist, as the original value has not yet been established in the table.
Option 3: Use a Materialized View for Latest TOAST Value
Create a materialized view in Upsolver that stores the latest TOAST values and JOIN it with the target table.
By implementing these strategies, you can ensure that TOASTed values are properly handled and replicated in your Upsolver jobs.
Last updated