September 2024

Upsolver new features, enhancements, and bug fixes for September 2024.

2024.09.26-09.41

⬆️ Enhancements

  • Multi Job now supports reading from Partitioned Iceberg Table

  • Replication Jobs: Implemented deletion of internal files once they are no longer used

  • Sync Jobs Reading from Iceberg Tables: Reduced the number of scanned blocks and rows by leveraging statistics in the Iceberg metadata for more efficient reads

  • Multi Job now supports using partitioned Iceberg tables as a source

🔧 Bug Fixes

  • API: Fixed wrong validation when creating or editing Output that writes Parquet files

  • Minor Bug fixes

2024.09.18-13.03

⬆️ Enhancements

  • Pause Job Support for Iceberg Table Targets

    • Pause Job functionality is now fully supported for all jobs writing to Iceberg table targets. Previously, pausing was not available for jobs targeting Iceberg tables, but with this update, users can now pause and resume these jobs as needed, providing greater flexibility and control over long-running data operations.

  • Introduced a retry mechanism when committing data to the Iceberg table from a job, specifically handling cases where the table is modified by another process

🔧 Bug Fixes

  • Fixed an issue where auto-sharding tasks could potentially fail on NullPointerException

  • Fixed a bug in CDC jobs where empty unrelated system columns were added to the target tables

2024.09.18-09.11

⬆️ Enhancements

  • Pause Job Support for Iceberg Table Targets

    • Pause Job functionality is now fully supported for all jobs writing to Iceberg table targets. Previously, pausing was not available for jobs targeting Iceberg tables, but with this update, users can now pause and resume these jobs as needed, providing greater flexibility and control over long-running data operations.

  • Introduced a retry mechanism when committing data to the Iceberg table from a job, specifically handling cases where the table is modified by another process

🔧 Bug Fixes

  • Fixed an issue where auto-sharding tasks could potentially fail on NullPointerException

  • Fixed a bug in CDC jobs where empty unrelated system columns were added to the target tables

2024.09.10-07.45

⬆️ Enhancements

  • Iceberg Partition Clustering

    • New Partition Clustering Feature: You can now efficiently manage large datasets partitioned on high-cardinality columns using partition clustering. This feature optimizes storage by merging small files, improving performance, reducing query times, and minimizing S3 API costs.

    • Improved Query Performance: By clustering partitions and reducing the number of small files, full table scans and data refresh processes are significantly faster

    • When to Use: Partition clustering is ideal for datasets with high cardinality, frequent data arrival, and skewed data distribution.

    • How to Use: When creating a table with partition clustering, use the CLUSTERED BY clause instead of PARTITIONED BY.

    • Please see the complete documentation for more details, including usage scenarios, limitations, and syntax options.

  • Data Lineage Enhancements

    • Improved Visual Distinction: Previously, job source tables and lookup tables (materialized views) had similar visual representations, leading to confusion. We’ve enhanced the clarity by differentiating the arrows between jobs and materialized views from those from source tables.

    • Additional UX Improvements: Various user experience enhancements have been made to further improve the overall workflow and usability.

  • Support adding tables to the STOPPED_TABLES list in replication jobs

🔧 Bug Fixes

  • SQL Server CDC: Parse columns of type DateTime2 as Timestamp

  • Fixed Expire Snapshots, leaving dangling files

  • Using LIMIT with a sync job reading from Iceberg caused the job never to process data

2024.09.01-09.29

⬆️ Enhancements

  • Improved performance of Iceberg compactions

🔧 Bug Fixes

  • Fixed a bug where JSON data files in jobs writing to Snowflake were not being deleted

  • Fixed an issue where using a classic data source in a job caused errors when a field had multiple types during job creation.

Last updated