The page describes the options for creating a replication job.

Replication jobs copy change data capture (CDC) data from enabled databases into one or more target schemas in the following supported targets:

  • Amazon Redshift

  • AWS Glue Data Catalog

  • Snowflake

  • Tabular

The source data is ingested once into a staging table that each replication group then reads from to update the target. While sharing the source data, replication groups can be individually configured using different intervals and options to write to a separate schema. Therefore you create one entity and Upsolver automatically generates a job for each replication group and manages the freshness of data in each target based on the commit interval that you specify.

Please note:

  • Target tables in Snowflake are created in upper case.

  • Target tables in Amazon Redshift are created in lower case.


    [{ job_options }]  
FROM <source_connection_name> 
    [{ source_options }]
    [{ replication_group_options }]

Job identifier

This is the name of your job, which should match the following format:

identifier = "([^"]|"")*"|[A-Za-z_][A-Za-z0-9_]*;

Job options

The following job options can be applied to all sources.:

[ COMMENT = '<comment>' ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]

Jump to

COMMENT — editable

Type: text

(Optional) A description or comment regarding this job.


Type: identifier

Default: The sole cluster in your environment

(Optional) The compute cluster to run this job.

This option can only be omitted when there is only one cluster in your environment.

If you have more than one compute cluster, you need to determine which one to use through this option.


Type: text

The name of the connection you are using to stage the data ingested from your CDC source.


Type: text

The name of the location you are using to stage the data ingested from your CDC source.


Type: int

Default: 1

(Optional) Configures how many snapshots are performed concurrently. The more snapshots performed concurrently, the quicker the tables are streaming. However, running more snapshots in parallel increases the load on the source database.

Source options

Visit the pages below for source-specific options:

Replication group options


The following replication group options apply to all data sources.

    [ INCLUDED_TABLES_REGEX = ('<included_tables_regex>', ...) 
    EXCLUDED_COLUMNS_REGEX = ('<excluded_columns_regex>'[, ...]) ] 
    [ INCLUDED_SCHEMA_DEFINITION = (<included_table_pattern>)
    EXCLUDED_SCHEMA_DEFINITION = (<excluded_table_pattern>) ] 
[ COLUMN_TRANSFORMATIONS = (<column_name> = <expresion> [, ...]) ]
[ COMMIT_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ LOGICAL_DELETE_COLUMN = <column_identifier> ]                     
[ REPLICATION_TARGET = <target_connection_identifier> ]
[ TARGET_TABLE_NAME_EXPRESSION = '<table_prefix_name>' || $table_name ]
[ UPSOLVER_EVENT_TIME_COLUMN = <column_identifier> ] 

Jump to


Type: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }

Default: 1 MINUTE

(Optional) Defines how often the job will load and commit data to the target in a direct ingestion job. This interval must be divisible by the number of hours in a day.


Values: ('regexFilter1', 'regexFilter2')

One or more regular expressions that define which tables should be included in the replication.

For example, ('dbo.prod_.*') includes all tables in the dbo schema prefixed with prod_.


Values: ('regexFilter1', 'regexFilter2')

(Optional) Can only be included when INCLUDED_TABLES_REGEX is supplied. One or more regular expressions to define which columns should be excluded from the replication.

For example, ('dbo.*.address_') excludes all columns in all tables prefixed with address_.


Type: text

One or more expressions that define which tables should be included in the replication.

For example, *, schema.*, schema.table.*, or schema.table.column.


Type: text

(Optional) Can only be included when INCLUDED_SCHEMA_DEFINITION is supplied. One or more expressions to determine which tables should be excluded from the replication.

For example, schema.*, schema.table.


Values: ( <column> = <expression>, ...)

(Optional) If transformations must be applied prior to data landing in your target, you can use this option to perform data transformations during ingestion. When ingesting into the data lake, it is recommended that you only apply essential transformations, such as protecting PII, as it is easier to make amendments or corrections at a later date if the data remains in its raw state and instead use a transformation job to apply modifications. Therefore, as a general rule, you should only transform data that must be modified before it reaches the target.

However, transformations provide the flexibility to shape your data before it lands in the target. You can use all the functions and operators supported by Upsolver to create calculated fields within your ingestion job. New columns can be added to your target, and existing column data can be transformed. You can perform actions such as converting data types, formatting string values, and concatenating columns to create a new column.

If you need to mask sensitive or personally identifiable information (PII) prior to loading into your staging tables or when performing direct ingestion into your target destination, you can use hashing functions to prevent data from being exposed downstream.

See the Functions and Operators references for a full list of options that you can apply.


Type: text

(Optional) By default, Upsolver permanently deletes rows from the target. To logically delete the row instead, specify the column name in which Upsolver will mark them as deleted.

Only valid when WRITE_MODE = MERGE.


Type: text

Values: <connection_name>

Specify the name of the target connection to write the data. The connection must be created prior to creating your job.


Type: expression

Specify the name of the target schema in Snowflake that the replication group is writing to. The following inputs are supported: $database_name, $schema_name, $full_table_name and $table_name. If you are ingesting from MongoDB, you can also use the $collection_name input.

As well as using the exact name of the schema, for example, TARGET_SCHEMA_NAME_EXPRESSION = 'SALES', you could also write a case statement to define the target schema, for example:

    CASE $schema_name 
        WHEN 'sales' THEN 'SALES' 
        ELSE 'ARCHIVE' 

Similarly, you can use the name of the database to define the target schema as follows:

    CASE $database_name 
        WHEN 'Suppliers' THEN 'SUPPLIERS' 


Type: expression

Define how you want Upsolver to name the target tables in Snowflake. The following inputs are supported: $database_name, $schema_name, $full_table_name and $table_name. If you are ingesting from MongoDB, you can also use the $collection_name input.

Optionally, you can include a prefix for the table name. Use in conjunction with $full_table_name, $table_name, or $collection_name to create the target table using the name of the source table or collection. For example, to create the table in Snowflake with the prefix sales_, followed by the name of the source table, use the expression:

TARGET_TABLE_NAME_EXPRESSION = 'sales_' || $table_name::STRING

Furthermore, you could include a CASE statement to determine the name of the target table based on the source database:

    CASE $database_name 
        WHEN 'Suppliers' THEN 'supp_' || $table_name::STRING 
        ELSE 'dist_' || $table_name::STRING 


Type: text

Values: <column_name>

(Optional) Upsolver uses distributed locking technology to keep events strongly ordered. The ingestion timestamp is stored in a dedicated column reflecting the order, which is UPSOLVER_EVENT_TIME by default unless specified here.


Type: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }

Determines how often the job writes the CDC data to the REPLICATION_TARGET specified within the group. Frequent writes provide up-to-date information but may be costly, especially for Snowflake.

The writes take place over a set period of time defined by this interval and they must be divisible by the number of hours in a day.

For example, you can set WRITE_INTERVAL to 2 hours (the job runs 12 times per day), but trying to set WRITE_INTERVAL to 5 hours would fail since 24 hours is not evenly divisible by 5.


Values: { MERGE | APPEND }

Defines whether the job should merge the data into the target, or append it.

If the WRITE_MODE is set to MERGE, you must include a LOGICAL_DELETE_COLUMN.

Note: For the 'Amazon Redshift' target, only 'MERGE' write mode is supported.

Last updated