LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Articles
Articles
  • Articles
  • GET STARTED
    • Core Concepts
      • Core Components
      • Deployment Models
      • Entities Overview
      • Upsolver Timeline
      • Schema Detection and Evolution
    • Pipeline Basics
    • Understanding Sync and Non-Sync Jobs
  • DATA
    • Optimization Processes for Iceberg Tables in Upsolver
    • Column Case Sensitivity
    • Column Transformations
    • Compaction Process
    • Expectations
    • Field Name Encoding
    • Iceberg Adaptive Clustering
    • Schema Evolution
      • Iceberg Schema Evolution
      • Snowflake Schema Evolution
      • Redshift Schema Evolution
    • System Columns
    • Working with Date Patterns
  • JOBS
    • Ingest Data Using CDC
      • Performing Snapshots
      • MySQL Binlog Retention
      • PostgreSQL Partitioned Tables
      • CDC Known Limitations
    • Transformation
      • Flattening Arrays
      • Working with Arrays
Powered by GitBook
On this page
  • Limitations
  • Conclusion
  1. JOBS
  2. Ingest Data Using CDC

CDC Known Limitations

This article provides an in-depth look at the specific limitations associated with using Upsolver for Change Data Capture (CDC) to replicate data into data lakes or Snowflake. Understanding these limitations is crucial for effectively managing data workflows and ensuring data integrity.

Limitations

1. Empty Tables

Tables that are empty, containing no rows, will not be replicated. Ensure tables have data if replication is necessary.

2. Null-Value Columns

Columns populated entirely by null values will not be replicated. This may cause schema discrepancies between the source and target environments.

3. Conflicting Data Types Across Tables

In scenarios where multiple tables have columns with the same name but different data types, conflicts can occur during replication:

  • If both tables are updated simultaneously, and one column type is date while another is int, the column will be replicated as date in the target environment. This conflict also applies to columns with types long and timestamp.

4. Nested Column Type Limitations

Nested data types(like JSON) come with several limitations:

  • Missing Fields: JSON fields that are null are omitted during replication.

  • Nulls in Arrays: Null values within JSON arrays are skipped.

  • Empty Arrays: Empty arrays or arrays with only nulls will be treated as null.

  • Type Casting in Arrays: Arrays containing elements of different types will be cast to varchar (e.g., [1, 'str'] becomes ['1', 'str']).

5. TOAST Values

For PostgreSQL sources, fields stored as TOAST require a full replica identity for replication, affecting large data fields.

6. Non-Replication of Default Values

Default values defined in database schemas are not replicated. This can affect how data appears in the target system if defaults are relied upon.

7. Data Type Upcasting

Upsolver converts original data types to a set of supported primitive types:

  • Integer Types: All are mapped to bigint.

  • Floating Point and Decimal Types: All are mapped to double.

  • Decimal Type is mapped to double.

Upsolver supported primitive types are:

  • String

  • Bigint

  • Double

  • Boolean

  • Date

  • Timestamp (milliseconds precision)

8. Unsupported Truncate Events

Truncate operations, which delete all rows in a table, are not supported in CDC replication.

9. Commit Synchronization

There is no mechanism to synchronize commits between target tables, which may result in temporal data discrepancies.

10. Handling Changes in Column Types

A change in a source column's data type results in the creation of a new column in the target. For example, if a column changes from bigint to varchar, a new column with a suffix (e.g., col_string) will be created for the varchar values.

See Schema Evolution for more details.

11. Dropped and Renamed Columns

  • Dropped Columns: Columns dropped in the source are not dropped in the target, potentially leading to outdated schema representations.

  • Renamed Columns: Renamed columns will not be renamed in the target; instead, an additional column with the new name will be created.

12. Column Order

The order of the created columns in the target tables is not guaranteed and may not match the order in the source tables.

Conclusion

These limitations highlight the challenges and considerations when using Upsolver for CDC with data lakes or Snowflake. Planning and understanding these constraints is essential for effective data management and integration strategies, ensuring that the replicated data is accurate and consistent with business needs.

Last updated 7 months ago