December 2024
Upsolver new features, enhancements, and bug fixes for December 2024.
Last updated
Upsolver new features, enhancements, and bug fixes for December 2024.
Last updated
⬆️ Enhancements
Datasets
Enhanced Iceberg Statistics Page
We enhanced the Iceberg table page, located in the module, with new metrics and insights to improve your monitoring experience. Track key metrics including storage, scan time, row count, partition stats, and details on time travel, snapshots, compactions, data lifecycle, and orphan file clean-up. The page provides a high-level summary, while the tab offers deeper insights into specific processes such as compaction, snapshot expiration, and orphan file clean-up.
Snapshot Expiration Tracking
Monitor Iceberg table snapshot expiration using the new tab, located in > . Snapshots are created with each table change, but outdated snapshots can accumulate and consume storage. The expiration process removes these snapshots based on predefined policies, freeing up storage while maintaining table integrity.
Orphan Files Clean-up Tracking
Monitor Iceberg table orphan files clean-up with the new tab, located in > . In distributed processing environments, tasks or jobs sometimes fail, leaving behind files that are not referenced in the table metadata. These files, referred to as orphan files, can accumulate over time and consume significant storage space. A file is considered an orphan if it is not associated with any valid snapshot in the table metadata. Regular clean-up of these files is essential for optimizing storage and maintaining efficient table operations. The tab enables you to monitor clean-up jobs that remove these files.
New Partitions Tab
Partition information has been moved to a dedicated tab (previously in the tab).
New Columns Tab
Find the columns in your table as defined in the Iceberg specification in the new Columns tab.
New Metadata retention option for jobs writing to Iceberg tables:
We introduced a new configuration option, , for jobs writing to Iceberg tables. This feature enables you to define the retention period for the statistical data collected during job execution. By managing the retention of this metadata, you can ensure that relevant statistics are available for data analysis while managing storage costs associated with keeping this information. For more details, please refer to the Apache Iceberg job options page.
We’ve introduced retention policies for Iceberg tables, enabling you to define and enforce data retention directly within Upsolver:
Two new options are now available when defining your Iceberg tables:
: Specify the column to determine data retention. Compatible column types include DATE
, TIMESTAMP
, TIMESTAMPTZ
, LONG
, or INT
. Optimal performance is achieved when the retention column is part of the table's partition columns.
: Set the number of days data should be retained. Values range from 1 to 9999 days, and data older than this duration will be scheduled for deletion, ensuring efficient storage cost management. For detailed information, please see .
Iceberg:
Reduced the number of snapshots created on compactions.
Upgraded library version to 1.6.1.
Improved the performance of the commit operation when processing large backlogs.
Support renaming connections via ALTER CONNECTION <connection_name> RENAME TO <new_name>
.
Ingestion Wizard:
We enhanced the UI Wizard experience on the Jobs page for creating new jobs. When Iceberg is selected as the target, the wizard now includes an additional step that guides you through defining key table properties. Now you can easily configure the target table’s name, columns, partitions, retention policies, sorting, and snapshot expiration in a streamlined and intuitive process.
🔧 Bug Fixes
Iceberg:
Fixed the incorrect update to the partition spec of compacted files, after partition evolution.
Fixed a bug that caused some internal files not to be deleted when dropping an Iceberg table.