December 2024
Upsolver new features, enhancements, and bug fixes for November 2024.
2024.11.28-08.49
⬆️ Enhancements
Datasets - Iceberg tables tracking
Enhanced Iceberg Table Statistics Page
We’ve enhanced the Iceberg Table Statistics page (under the Datasets module) with many new metrics and insights to improve your monitoring experience. Track key metrics like storage, scan time, row count, partition stats, and details on time travel, snapshots, compactions, data lifecycle, and orphan file cleanup. The page provides a high-level summary, while the Maintenance tab offers deeper insights into specific processes like compaction, snapshot expiration, and orphan file cleanup. Read more here
Snapshot Expiration Tracking
Monitor Iceberg table snapshot expiration using the new Expire Snapshots tab under Datasets > Maintenance. Snapshots are created with each table change, but outdated snapshots can accumulate and consume storage. The expiration process removes these snapshots based on predefined policies, freeing up storage while maintaining table integrity. Read more here
Orphan Files Cleanup Tracking
Monitor Iceberg Table Orphan Files cleanup using the new Orphan Files tab under Datasets > Maintenance. In distributed processing environments, tasks or jobs may sometimes fail, leaving behind files that are not referenced in the table metadata. These files, referred to as orphan files, can accumulate over time and consume significant storage space. A file is considered an orphan if it is not associated with any valid snapshot in the table metadata. Regular cleanup of these files is essential for optimizing storage and maintaining efficient table operations. The Orphan Files tab allows you to monitor cleanup jobs that remove these files. Read more
New Partitions Tab
Partition information has been moved to a separate Partitions tab (previously in the Statistics page).
New Columns Tab
Find the columns in your table as defined in the Iceberg specification in the new Columns tab.
METADATA_RETENTION options for jobs writing to Iceberg tables
We’ve introduced a new configuration option, METADATA_RETENTION, for jobs writing to Iceberg tables. This feature lets users define the retention period for the statistical data collected during job execution. By managing the retention of this metadata, you can ensure that relevant statistics are available for data analysis while managing storage costs associated with keeping this information. For more details, read the complete documentation.
We’ve introduced retention policies for Iceberg tables, enabling you to define and enforce data retention directly within Upsolver.
Two new options are now available when defining your Iceberg tables: RETENTION_COLUMN: Specify the column to determine data retention. Compatible column types include DATE, TIMESTAMP, TIMESTAMPTZ, LONG, or INT. Optimal performance is achieved when the retention column is part of the table's partition columns. RETENTION_DURATION: Set the number of days data should be retained. Values range from 1 to 9999 days, and data older than this duration will be scheduled for deletion, ensuring efficient storage cost management. For detailed information, refer to our documentation.
Iceberg:
Reduced the number of snapshots created on compactions
Upgraded library version to 1.6.1
Improved the performance of the commit operation when processing large backlogs
Support renaming connections via
ALTER CONNECTION connection_name RENAME TO new_name
Ingestion Wizard
We’ve enhanced the UI Wizard experience on the jobs page for creating new jobs. When Iceberg is selected as the target, the wizard now includes an additional step that guides you through defining key table properties. You’ll be able to easily configure the target table’s name, columns, partitions, retention policies, sorting, and snapshot expiration in a streamlined and intuitive process.
Bug Fixes
Iceberg: Fixed incorrectly updating the partition spec of compacted files after partition evolution
Fixed a bug that caused some internal files not to be deleted when dropping an Iceberg table
Last updated