April 2024

Upsolver new features, enhancements, and bug fixes for April 2024.

Release Notes Blog

For more detailed information on these updates, check out the Upsolver May 2024 Feature Summary blog.

2024.04.25-12.36

⬆️ Enhancements

  • Iceberg:

    • Added support for writing to hidden partitions

    • Enabled changing the partition specification of existing tables even while they are actively being written to by a job

    • Support writing to External Iceberg tables

    • Support altering Iceberg table properties via SQL

🔧 Bug Fixes

  • Worksheet tree - Show replication jobs under tables that were created dynamically

  • MongoDB CDC:

    • Corrected the parsing of Decimal types to Double

    • Resolved errors encountered when replicating collections containing fields with types Regex, Min Key, and Max Key

2024.04.16-12.06

⬆️ Enhancements

  • Introduced the PARSE_DEBEZIUM_JSON_TYPE property to the Avro Schema Registry content format for dynamic parsing of JSON columns from Debezium sources into Upsolver records or keeping as JSON strings. For Snowflake outputs with schema evolution, fields are written to columns of type Variant.

  • Added support for Iceberg table retention using the TABLE_DATA_RETENTION property

  • Upgraded the Snowflake driver to 3.15.0

  • UI: ClickHouse wizard cosmetic changes

🔧 Bug Fixes

  • Fixed a bug preventing the pausing of ingestion jobs to Snowflake

  • Iceberg schema evolution:

    • Nested fields were added without the field docs, which are later used to understand which field evolved from which. Affected tables may need to be recreated if jobs writing to them are causing errors

    • Was not handling cases where a field can have multiple types (e.g., a field can be a record and can also be an array of strings)

2024.04.04-09.33

New Features

  • Data lineage diagram is now accessible from Job Status, Datasets, and materialized view pages. Users can easily view real-time job status and dependencies

  • Ingestion wizard:

    • ClickHouse is now supported as a target (CDC sources are not supported at this point)

⬆️ Enhancements

  • For new entities, you can now use the updated Parquet list structure (parquet.avro.write-old-list-structure = false) when writing Parquet files to S3 and Upsolver tables

  • Support casting strings to JSON in jobs writing to Iceberg tables

  • Previewing Classic Data Sources is now supported (SELECT * FROM "classic data source name")

  • COLUMN_TRANSFORMATIONS are now supported by replication jobs

  • Cost reduction:

    • Reduced S3 API costs of replication jobs and single entity jobs

    • Reduced S3 API costs of Iceberg tables

    • Reduced S3 API costs of Hive tables

  • The OPTIMIZE option for external Iceberg tables now supports optimizing tables that are not partitioned

  • The cluster system table (system.monitoring.clusters) now shows data that is aligned with the Cluster Monitoring page

🔧 Bug Fixes

  • Fixed a bug that could skip data when reading from CDC sources

  • Fixed a bug where events Written graph wouldn't show for single entity jobs that contains a lot of sub jobs or where the job list page contains a lot of jobs

  • CDC Event log is now deleted right after parsing the log events

  • Fixed a bug where replication and single entity jobs wouldn't work when trying to create a table with a name that existed before

  • Increased performance of VPC integration experience

  • Fixed a rare bug where showing "Lifetime" statistics on the Datasets page wouldn't show the lifetime statistics

  • Fixed a bug where jobs that read data from the system.information_schema.columns would timeout when there were tables with a large number of columns

  • Fixed a bug where it was possible to drop a table that a replication or single entity job was writing into. The new behavior now requires that the job is dropped first

  • Fixed a bug where a single entity job that reads data from a table that is partitioned by time wouldn't read from the start of the table

  • Fixed a bug where the first point in the Datasets graph would have a timestamp that is before the start time of the first job that writes to a table