Microsoft SQL Server

This page describes how to create a replication job to ingest data from Microsoft SQL Server.

Syntax

CREATE REPLICATION JOB <job_identifier>
    [{ job_options }]  
FROM <source_connection_name> 
WITH REPLICATION GROUP <group_name>
    [{ replication_group_options }]
[ WITH REPLICATION GROUP ... ];

See Replication jobs for a full list of job options.

Example

This example shows you how to create a replication job that ingests to Snowflake. To use a different target, replace the value in the REPLICATION_TARGET option with the name of your your target connection.

Ingest CDC data to Snowflake example

The following example creates a job named mssql_replication_to_snowflake that writes to two replication groups in Snowflake: replicate_to_snowflake_prod for production use and replicate_to_snowflake_dev for development:

CREATE REPLICATION JOB mssql_replication_to_snowflake
  COMMENT = 'Replicate SQL Server CDC data to Snowflake groups'
  COMPUTE_CLUSTER = "Default Compute (Free)"
  INTERMEDIATE_STORAGE_CONNECTION = s3_connection
  INTERMEDIATE_STORAGE_LOCATION = 's3://upsolver-integration-tests/test/' 
FROM my_mssql_connection 
WITH REPLICATION GROUP replicate_to_snowflake_prod 
  INCLUDED_TABLES_REGEX = ('orders\..*')
  EXCLUDED_COLUMNS_REGEX = ('.*\.creditcard') -- exclude creditcard columns 
  COMMIT_INTERVAL = 5 MINUTES
  LOGICAL_DELETE_COLUMN = "is_deleted"
  REPLICATION_TARGET = my_snowflake_connection
  TARGET_SCHEMA_NAME_EXPRESSION = 'ORDERS'
  TARGET_TABLE_NAME_EXPRESSION = $table_name
  WRITE_MODE = MERGE 
WITH REPLICATION GROUP replicate_to_snowflake_dev 
  INCLUDED_TABLES_REGEX = ('orders\..*')
  COMMIT_INTERVAL = 1 HOUR
  REPLICATION_TARGET = my_snowflake_connection
  TARGET_TABLE_NAME_EXPRESSION = 'history_' || $table_name
  TARGET_SCHEMA_NAME_EXPRESSION = 'ORDERS_DEV'
  WRITE_MODE = APPEND;

Each group has its own set of options that can be configured differently while streaming from the same data source.

Both groups replicate all tables in the orders schema, as specified in the INCLUDED_TABLES_REGEX option. However, all creditcard columns are excluded from the production target in the EXCLUDED_COLUMNS_REGEX option to remove PII.

The production replication group has a COMMIT_INTERVAL of 5 MINUTES to keep the target frequently updated, whereas the development group has an interval of 1 HOUR to update less often. Furthermore, the production group will MERGE the data, using the column is_deleted to denote if a row is deleted in the source, and the development group will APPEND the data.

The REPLICATION_TARGET is set to my_snowflake_connection for both groups, but this can be any Upsolver-supported target.

In the development group, the TARGET_SCHEMA_NAME_EXPRESSION option includes the prefix value of history_. This means that Upsolver will create the target table using the name of the source table with the prefix so it is clear that the target tables in this group are used for development.

Last updated 1 year ago