September 2024
Upsolver new features, enhancements, and bug fixes for September 2024.
2024.09.26-09.41
⬆️ Enhancements
Multi Job now supports reading from Partitioned Iceberg Table
Replication Jobs: Implemented deletion of internal files once they are no longer used
Sync Jobs Reading from Iceberg Tables: Reduced the number of scanned blocks and rows by leveraging statistics in the Iceberg metadata for more efficient reads
Multi Job now supports using partitioned Iceberg tables as a source
🔧 Bug Fixes
API: Fixed wrong validation when creating or editing Output that writes Parquet files
Minor Bug fixes
2024.09.18-13.03
⬆️ Enhancements
Pause Job Support for Iceberg Table Targets
Pause Job functionality is now fully supported for all jobs writing to Iceberg table targets. Previously, pausing was not available for jobs targeting Iceberg tables, but with this update, users can now pause and resume these jobs as needed, providing greater flexibility and control over long-running data operations.
Introduced a retry mechanism when committing data to the Iceberg table from a job, specifically handling cases where the table is modified by another process
🔧 Bug Fixes
Fixed an issue where auto-sharding tasks could potentially fail on
NullPointerException
Fixed a bug in CDC jobs where empty unrelated system columns were added to the target tables
2024.09.18-09.11
⬆️ Enhancements
Pause Job Support for Iceberg Table Targets
Pause Job functionality is now fully supported for all jobs writing to Iceberg table targets. Previously, pausing was not available for jobs targeting Iceberg tables, but with this update, users can now pause and resume these jobs as needed, providing greater flexibility and control over long-running data operations.
Introduced a retry mechanism when committing data to the Iceberg table from a job, specifically handling cases where the table is modified by another process
🔧 Bug Fixes
Fixed an issue where auto-sharding tasks could potentially fail on
NullPointerException
Fixed a bug in CDC jobs where empty unrelated system columns were added to the target tables
2024.09.10-07.45
⬆️ Enhancements
Iceberg Partition Clustering
New Partition Clustering Feature: You can now efficiently manage large datasets partitioned on high-cardinality columns using partition clustering. This feature optimizes storage by merging small files, improving performance, reducing query times, and minimizing S3 API costs.
Improved Query Performance: By clustering partitions and reducing the number of small files, full table scans and data refresh processes are significantly faster
When to Use: Partition clustering is ideal for datasets with high cardinality, frequent data arrival, and skewed data distribution.
How to Use: When creating a table with partition clustering, use the CLUSTERED BY clause instead of PARTITIONED BY.
Please see the complete documentation for more details, including usage scenarios, limitations, and syntax options.
Data Lineage Enhancements
Improved Visual Distinction: Previously, job source tables and lookup tables (materialized views) had similar visual representations, leading to confusion. We’ve enhanced the clarity by differentiating the arrows between jobs and materialized views from those from source tables.
Additional UX Improvements: Various user experience enhancements have been made to further improve the overall workflow and usability.
Support adding tables to the
STOPPED_TABLES
list in replication jobs
🔧 Bug Fixes
SQL Server CDC: Parse columns of type
DateTime2
asTimestamp
Fixed Expire Snapshots, leaving dangling files
Using LIMIT with a sync job reading from Iceberg caused the job never to process data
2024.09.01-09.29
⬆️ Enhancements
Improved performance of Iceberg compactions
🔧 Bug Fixes
Fixed a bug where JSON data files in jobs writing to Snowflake were not being deleted
Fixed an issue where using a classic data source in a job caused errors when a field had multiple types during job creation.
Last updated