Upsolver
Search…
CDC data sources (Debezium)
This article provides an introduction to how Upsolver works with CDC (Change Data Capture) data sources.

What is Debezium?

Debezium is an open source distributed platform for change data capture. To use it, start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Upsolver currently uses Debezium v1.4.

CDC Data Sources

Upsolver supports ingesting CDC data from relational databases such as MySQL, MariaDB and PostgreSql. Upsolver provides CDC capabilities by running a Debezium Engine under the hood to connect to database journals. The connectors automatically detect and ingest any change.

Event Format

Upsolver will read Debezium Change events with the following fields:
  • before -The state of the row before the change that was applied in the current event. This can be null if this row is new.
  • after - The state of the row after the change that was applied in the current event.
  • source - Information about the change event. Including things such as what binlog file it came from, and it's sequence number or position within the file. The Source Table and Database will also be here.
  • op - The change type. The options are:
    • r - Read events (when loading the initial data)
    • c - Create
    • u - Update
    • d - Delete

Example 1:

1
{
2
"ts_ms": 1617525879250,
3
"op": "c",
4
"after": {
5
"id": "188283-21202",
6
"cost": 4.2,
7
"item_id": 10
8
},
9
"source": {
10
"name": "debezium",
11
"db": "prod",
12
"row": 0,
13
"server_id": 0,
14
"snapshot": "true",
15
"table": "sales",
16
"version": "1.4.2.Final",
17
"ts_ms": 0,
18
"file": "mysql-bin-changelog.008019",
19
"pos": 156,
20
"connector": "mysql"
21
}
22
}
Copied!
This example event represents a new row being added to the table. You can tell this by the op type being c. In this case we added a new sale to the sales table in the prod database (this can be seen in the source information).

Example 2:

1
{
2
"ts_ms": 1617525879252,
3
"op": "u",
4
"before": {
5
"id": "188283-21202",
6
"cost": 4.2,
7
"item_id": 10
8
},
9
"after": {
10
"id": "188283-21202",
11
"cost": 5,
12
"item_id": 10
13
},
14
"source": {
15
"name": "debezium",
16
"db": "prod",
17
"row": 0,
18
"server_id": 0,
19
"snapshot": "true",
20
"table": "sales",
21
"version": "1.4.2.Final",
22
"ts_ms": 0,
23
"file": "mysql-bin-changelog.008019",
24
"pos": 157,
25
"connector": "mysql"
26
}
27
}
Copied!
In this case we received an updated event for the event in Example 1. We can see the old values in before and the new updated values in after.

Supported Databases

Currently the following databases and versions are supported:
Database
Version
AWS RDS Supported?
MySQL
5.6+
Yes
PostgreSQL
10, 11, and 12
Yes
Some databases may require specific journal configurations to be used. See the documentation page for creating a CDC data source for your database for info.

MySQL insert example

1
/{
2
"time": "2021-11-18 10:33:45",
3
"data": {
4
"operation": "insert",
5
"database_name": "prod",
6
"table_name": "sales",
7
"full_table_name": "prod.sales",
8
"primary_key": "188283-21202",
9
"row": {
10
"id": "188283-21202",
11
"cost": 5,
12
"item_id": 10
13
},
14
"metadata": {
15
"binlog_file_name": "mysql-bin-changelog.030565",
16
"binlog_file_position": 749,
17
"binlog_row": 0,
18
"from_snapshot": false,
19
"binlog_timestamp": 1637224425000,
20
"is_delete": false
21
}
22
}
23
}
Copied!

MySQL update example

1
{
2
"time": "2021-11-18 10:33:45",
3
"data": {
4
"operation": "update",
5
"database_name": "prod",
6
"table_name": "sales",
7
"full_table_name": "prod.sales",
8
"primary_key": "188283-21202",
9
"row": {
10
"id": "188283-21202",
11
"cost": 5,
12
"item_id": 10
13
},
14
"old_row": {
15
"id": "188283-21202",
16
"cost": 3,
17
"item_id": 10
18
},
19
"metadata": {
20
"binlog_file_name": "mysql-bin-changelog.030565",
21
"binlog_file_position": 749,
22
"binlog_row": 0,
23
"from_snapshot": false,
24
"binlog_timestamp": 1637224425000,
25
"is_delete": false
26
}
27
}
28
}
Copied!

Postgres insert example

1
{
2
"time": "2021-11-18 09:38:00",
3
"data": {
4
"operation": "insert",
5
"database_name": "postgres",
6
"schema_name": "prod",
7
"table_name": "sales",
8
"full_table_name": "postgres.prod.sales",
9
"primary_key": "188283-21202",
10
"row": {
11
"id": "188283-21202",
12
"cost": 5,
13
"item_id": 10
14
},
15
"metadata": {
16
"lsn": 2032660385960,
17
"from_snapshot": false,
18
"binlog_timestamp": 1637221080616,
19
"is_delete": false,
20
"is_heartbeat": false
21
}
22
}
23
}
Copied!

Postgres update example

1
{
2
"time": "2021-11-18 09:38:00",
3
"data": {
4
"operation": "update",
5
"database_name": "postgres",
6
"schema_name": "prod",
7
"table_name": "sales",
8
"full_table_name": "postgres.prod.sales",
9
"primary_key": "188283-21202",
10
"row": {
11
"id": "188283-21202",
12
"cost": 5,
13
"item_id": 10
14
},
15
"old_row": {
16
"id": "188283-21202",
17
"cost": 3,
18
"item_id": 10
19
},
20
"metadata": {
21
"lsn": 2032660385960,
22
"from_snapshot": false,
23
"binlog_timestamp": 1637221080616,
24
"is_delete": false,
25
"is_heartbeat": false
26
}
27
}
28
}
Copied!

Supported Data Outputs

Upsolver currently supports the following data outputs (for database replications).
  • Ahana
  • Amazon Athena
  • Dremio
  • Hive Metastore
  • PrestoDB
  • Qubole
  • Redshift Spectrum
  • Starburst
  • Upsolver Query
Last modified 18d ago