Microsoft SQL Server
This article describes how to ingest CDC data from your SQL Server database.
Before ingesting your CDC data into Upsolver, please ensure you have configured your database to enable change data capture. This article guides you through the process.
Prerequisites for SQL Server
Upsolver currently supports the following SQL Server set-up:
Amazon RDS for SQL Server
SQL Server 2019 (15x)
SQL Server Standard or Enterprise Edition
SQL Server Agent must be running
Please refer to the Amazon RDS article Using change data capture for more information.
Enable CDC on your database
Ensure you are logged in to SQL Server using an account with sysadmin privileges. From your query window, run the following to enable change data capture on the database:
For Amazon RDS for SQL Server, enable change data capture as follows:
Enable CDC on your tables
After CDC is enabled for a database, any user with db_owner privileges of the database can enable or disable CDC at the table level.
To check if a table has already been enabled for change data capture, run the following:
If is_tracked_by_cdc returns 0, enable change data capture for each table you want to include in the capture:
The input parameters should be specified as follows:
Parameter | Description |
---|---|
@source_schema | This is the name of the schema to which the table belongs, for example, dbo. |
@source_name | The name of the source table you want to enable for change data capture, e.g. Customer. |
@role_name | To control access to the changed data, you can optionally specify an existing fixed server or database role. If the database role does not exist, SQL Server creates it.
Users must have |
@filegroup_name | It is best practice to store your change data table separately from the source table. You can optionally specify the name of a pre-existing filegroup to store the CDC data, e.g. fg_sales_cdc_data. If you don't specify an alternative filegroup, CDC data is stored in the default filegroup of the database. |
@supports_net_changes | Set this value to 1 if you want a net changes function to be generated for the capture instance. The net changes function will return one change for each distinct row that was changed in the specified interval in the call. For more information, please refer to cdc.fn_cdc_get_net_changes_<capture_instance>.
The net changes function requires the source table to include a primary key or unique index. If using the latter, you must include the |
Specifying capture columns
By default, when you enable CDC on a table, all columns are identified as captured columns. If you don't need to track all columns or want to exclude specific columns for privacy reasons for example, you can use the @captured_column_list
parameter:
If you have set @support_net_changes
to 1 to switch on net change tracking, you must include either the primary key column or the columns defined in the unique index in the @captured_column_list
parameter.
For more information on CDC, please refer to the Enable and disable change data capture guide from Microsoft.
Create a CDC job that reads from your database
Connect to SQL Server
Now that change data capture is enabled on your tables, the next step is to create a connection to SQL Server from Upsolver. The example below shows you the syntax to create a persistent connection to your database that is available to other users in your organization.
Next, you can create a job to ingest your data from SQL Server into a staging table in the data lake.
Jump to
Ingest to the data lake
After completing the prerequisites, you can create your staging tables. The example below creates a table without defining columns or data types, as these will be inferred automatically by Upsolver, though you can define columns if required:
Create a staging table to store the data ingested from SQL Server.
Upsolver recommends partitioning by the system column $event_date
or another date column within the data in order to optimize your query performance.
Next, create an ingestion job as follows:
Job options
The example above only uses a small subset of all job options available when reading from SQL Server. Depending on your use case, you may want to configure a different set of options.
Transformations can be applied to your ingestion job, for example, to exclude columns, correct issues, or mask data, before it lands in the target. Furthermore, you can use expectations to define data quality rules on your data stream and take appropriate action.
For more information, see the Ingestion jobs page, which describes the available job options and includes examples.
After your data has been ingested into your staging table, you are ready to move on to the next step of building your data pipeline: transforming your data and writing it to your intended target locations.
Alter a job
You can alter some of the options of an existing job. For example, if you want to keep the job as is, but only change the cluster that is running the job, execute the following command:
Note that some options such as COMPRESSION
cannot be altered once the job has been created.
To check which job options are mutable, see Microsoft SQL Server.
Drop a job that reads from SQL Server
If you no longer need a job, you can easily drop it using the following SQL command:
Last updated