PostgreSQL
Syntax
Jump to
Job options
Jump to
PostgreSQL job options:
General job options:
See also:
HEARTBEAT_TABLE
HEARTBEAT_TABLE
Type: string
The name of the heartbeat table to use as described in Setting up a Heartbeat Table.
(Optional) If it is not set, no heartbeat table is used. Using a heartbeat table is recommended to avoid the replication slot growing indefinitely when no CDC events are captured for the subscribed tables.
PARSE_JSON_COLUMNS
PARSE_JSON_COLUMNS
Type: Boolean
Default: false
If enabled, Upsolver will parse JSON columns into a struct matching the JSON value.
PUBLICATION_NAME
PUBLICATION_NAME
Type: text
Adds a new publication to the current database. The publication name must be distinct from the name of any existing publication in the current database. DDL will be filtered.
SKIP_SNAPSHOTS
SKIP_SNAPSHOTS
Type: Boolean
Default: false
(Optional) By default, snapshots are enabled for new tables. This means that Upsolver will take a full snapshot of the table(s) and ingest it into the staging table before it continues to listen for change events. When True
, Upsolver will not take an initial snapshot and only process change events starting from the time the ingestion job is created.
In the majority of cases, when you connect to your source tables, you want to take a full snapshot and ingest it as the baseline of your table. This creates a full copy of the source table in your data lake before you begin to stream the most recent change events. If you skip taking a snapshot, you will not have the historical data in the target table, only the newly added or changed rows.
Skipping a snapshot is useful in scenarios where your primary database instance crashed or became unreachable, failing over to the secondary. In this case, you will need to re-establish the CDC connection but would not want to take a full snapshot because you already have all of the history in your table. In this case, you would want to restart processing from the moment you left off when the connection to the primary database went down.
SNAPSHOT_PARALLELISM
SNAPSHOT_PARALLELISM
Type: int
Default: 1
(Optional) Configures how many snapshots are performed concurrently. The more snapshots performed concurrently, the quicker the tables are streaming. However, doing more snapshots in parallel increases the load on the source database.
Source Options
TABLE_INCLUDE_LIST
— editable
TABLE_INCLUDE_LIST
— editableType: text
Default: ''
(Optional) Comma-separated list of regular expressions that match fully-qualified table identifiers of tables whose changes you want to capture. Tables not included in this list will not be loaded. If the list is left empty all tables will be loaded. This maps to the table.include.list property in Debezium.
By default, the connector captures changes in every non-system table in all databases. To match the name of a table, Upsolver applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the table. It does not match substrings that might be present in a table name.
Each RegEx pattern matches against the full string databaseName.tableName
, for example:
RegEx Pattern | Results |
---|---|
db_name.* | Select all tables from the |
db_name.users, db_name.items | Select the |
db1.items_.* | Select all tables from |
COLUMN_EXCLUDE_LIST
— editable
COLUMN_EXCLUDE_LIST
— editableType: array[string]
Default: ''
(Optional) Comma-separated list of regular expressions that match the fully-qualified names of columns to exclude from change event record values. This maps to the Debezium column.exclude.list property.
By default, the connector matches all columns of the tables listed in TABLE_INCLUDE_LIST
. To match the name of a column, Upsolver applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the column; it does not match substrings that might be present in a column name.
Each RegEx pattern matches against the full string databaseName.tableName.columnName
, for example:
RegEx Pattern | Results |
---|---|
db.users.address_.* | Select all columns starting with |
db.*.(.*_pii) | Select all columns ending in |
Examples
Ingest data into the data lake
The following example creates a job to ingest data from PostgreSQL into a table in the data lake. The PUBLICATION_NAME
option specifies that a new publication named sample is added.
Last updated