Upsolver SQLake
Search…
⌃K

MySQL

Job and data source properties for MySQL connector

Syntax

CREATE JOB <job_name>
[{ job_options }]
AS COPY FROM MYSQL
<connection_identifier>
[{ source_options }]
INTO <table_identifier>;

Jump to

SQLake supports MySQL 5.6+ and 8.0.x. You can connect to:
  • Generic MySQL (self-hosted)
  • Amazon RDS MySQL
  • Amazon Aurora MySQL
You must setup your MySQL database before you configure SQLake. Please follow the Setting Up MySQL instruction. Once complete, return to this page to configure the SQLake connector.

Job options

The following job properties configure the behavior of the ingestion job.
[ SKIP_SNAPSHOTS = { TRUE | FALSE } ]
[ END_AT = { NOW | <timestamp> } ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ COMMENT = '<comment>' ]

Jump to

SKIP_SNAPSHOTS editable

Type: boolean
Default: false
(Optional) By default, snapshots are enabled for new tables. This means that SQLake will take a full snapshot of the table/s and ingest it into the staging table before it continues to listen to change event. When set to True, SQLake will not take an initial snapshot and only process change events starting from the time the ingestion job is created.
In majority of cases, when you connect to source tables, you want to take a full snapshot and ingest it as the baseline of your table. This creates a full copy of the source table in your data lake before you begin to stream most recent change events. If you skip taking a snapshot, you will not have the historical data in the target table, only the newly added or changed rows.
Skipping a snapshot is useful in scenarios where your primary database instance crashed or became unreachable failing over to the secondary. In this case, you will need to re-estabilish the CDC connection but would not want to take a full snapshot because you already have all of the history in your table. In this case, you would want to restart processing from the moment you left off when the connection to the primary database went down.

END_AT — editable

Values: { NOW | <timestamp> }
Default: Never
(Optional) Configures the time to stop ingesting data. Events after the specified time are ignored. Timestamps provided should be based on UTC.

COMPUTE_CLUSTER — editable

Type: identifier
Default: The sole cluster in your environment
(Optional) The compute cluster to run this job.
This option can only be omitted when there is just one cluster in your environment.
Once you have more than one compute cluster, you are required to provide which one to use through this option.

COMMENT — editable

Type: text
(Optional) A description or comment regarding this job.

Source options

The following data source properties configure how to replicate data from MySQL.
[ TABLE_INCLUDE_LIST = ('regexFilter1', 'regexFilter2') ]
[ COLUMN_EXCLUDE_LIST = ('regexFilter1', 'regexFilter2') ]

Jump to

TABLE_INCLUDE_LIST editable

Type: text
Default: ''
(Optional) Comma-separated list of regular expressions that match fully-qualified table identifiers of tables whose changes you want to capture. This maps to Debezium table.include.list property.
By default, the connector captures changes in every non-system table in all databases. To match the name of a table, SQLake applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the table. It does not match substrings that might be present in a table name.
Each RegEx pattern matches against the full string databaseName.tableName. For example:
RegEx Pattern
Results
db_name.*
Select all tables under db_name database
db_name.users, db_name.items
Selects tables users and items under db_name database
db1.items_.*
Selects all tables from db1, that start with items_

COLUMN_EXCLUDE_LIST — editable

Type: text
Default: ''
(Optional) Comma-separated list of regular expressions that match the fully-qualified names of columns to exclude from change event record values. This maps to Debezium column.exclude.list property.
By default, the connector matches all columns of the tables listed in TABLE_INCLUDE_LIST. To match the name of a column, SQLake applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the column; it does not match substrings that might be present in a column name.
Each RegEx pattern matches against the full string databaseName.tableName.columnName. For example:
RegEx Pattern
Results
db.users.address_.*
Selects all of the columns that start with address_ in the users table of database db.
db.*.(.*_pii)
Selects all of the columns ending with _pii across all tables within db database.

Example

Create a job to ingest two tables
Disable initial snapshot of the source tables
Exclude specific columns from selected tables