PostgreSQL Partitioned Tables
This article describes how to work with partitioned tables in PostgreSQL.
Last updated
This article describes how to work with partitioned tables in PostgreSQL.
Last updated
In PostgreSQL, some tables are partitioned into multiple tables. This is implemented as underlying tables that each hold some of the table's data and a top-level table that acts as the Partition Root. When querying the Partition Root, PostgreSQL will query all underlying tables automatically and return a union of the results. When replicating a partitioned table using CDC, you may want to output the results into a logical partition root instead of multiple tables. A single table is usually a preferred structure for warehouse and data lake targets. Since PostgreSQL 14 this is now possible. This page will describe how to configure this behavior.
publish_via_root_partition
As described in the pre-requisites for CDC you need to configure a publication
for your CDC stream. By default, PostgreSQL creates the publication with publish_via_root_partition = false
. This means that the events in the replication slot will not contain the root table name, but instead will contain events for each underlying table that is updated during normal operation.
In order to read via the root partition you need to CREATE PUBLICATION
and set this property to TRUE
. Execute the following command in your source PostgreSQL database:
READ_VIA_PARTITION_ROOT
for your ingestion jobNext, you need to configure your ingestion job to read via the root partition. This is because the snapshotting stage needs to be aware of which table it should snapshot, the partitioned table or the underlying tables.
To do this you add READ_VIA_PARTITION_ROOT = TRUE
to the CREATE JOB
statement as follows:
You may encounter issues if the settings are not in sync. The ingestion job and PostgreSQL publication configuration of READ_VIA_PARTITION_ROOT
and PUBLISH_VIA_PARTITION_ROOT
respectively, must both be set to the same value, TRUE
or FALSE
.
If the publication publishes to the root partition but the job is not configured to read via the root partition, you will get a snapshot of the underlying table and streaming data will be sent to the root partition table, so the underlying tables won't update.
If the job reads via the root partition but the publication doesn't publish to the root partition, Upsolver will snapshot the root partition table but the replication slot will not contain update events for that table, and it will not be updated with changes.