PostgreSQL Partitioned Tables

This article describes how to work with partitioned tables in PostgreSQL.

In PostgreSQL, some tables are partitioned into multiple tables. This is implemented as underlying tables that each hold some of the table's data and a top-level table that acts as the Partition Root. When querying the Partition Root, PostgreSQL will query all underlying tables automatically and return a union of the results. When replicating a partitioned table using CDC, you may want to output the results into a logical partition root instead of multiple tables. A single table is usually a preferred structure for warehouse and data lake targets. Since PostgreSQL 14 this is now possible. This page will describe how to configure this behavior.

Reading via Partition Root in Upsolver CDC

Creating a publication with `publish_via_root_partition`

As described in the pre-requisites for PostgreSQL CDC you need to configure a publication for your CDC stream. By default, PostgreSQL creates the publication with publish_via_root_partition = false. This means that the events in the replication slot will not contain the root table name, but instead will contain events for each underlying table that is updated during normal operation.

In order to read via the root partition you need to CREATE PUBLICATION and set this property to TRUE. Execute the following command in your source PostgreSQL database:

CREATE PUBLICATION upsolver FOR ALL TABLES WITH (publish_via_partition_root = true);

Enable `READ_VIA_PARTITION_ROOT` for your ingestion job

Next, you need to configure your ingestion job to read via the root partition. This is because the snapshotting stage needs to be aware of which table it should snapshot, the partitioned table or the underlying tables.

To do this you add READ_VIA_PARTITION_ROOT = TRUE to the CREATE JOB statement as follows:

CREATE SYNC JOB load_raw_events_from_postgres
      PUBLICATION_NAME = 'upsolver'
      READ_VIA_PARTITION_ROOT = TRUE
AS COPY FROM POSTGRES upsolver_postgres_samples
      TABLE_INCLUDE_LIST = ('users_tbl', 'orders_tbl')
INTO default_glue_catalog.staging.postgres_raw_evets;

Troubleshooting

You may encounter issues if the settings are not in sync. The ingestion job and PostgreSQL publication configuration of READ_VIA_PARTITION_ROOT and PUBLISH_VIA_PARTITION_ROOT respectively, must both be set to the same value, TRUE or FALSE.

I see the underlying tables and they don't receive updates

If the publication publishes to the root partition but the job is not configured to read via the root partition, you will get a snapshot of the underlying table and streaming data will be sent to the root partition table, so the underlying tables won't update.

I see the root partition table but it doesn't receive updates

If the job reads via the root partition but the publication doesn't publish to the root partition, Upsolver will snapshot the root partition table but the replication slot will not contain update events for that table, and it will not be updated with changes.

Last updated 1 year ago