August 2024
Upsolver new features, enhancements, and bug fixes for August 2024.
2024.08.26-09.46
⬆️ Enhancements
Improved performance of Iceberg compactions
🔧 Bug Fixes
Fixed duplicate data in
MERGE
jobs to Iceberg. In rare cases, Iceberg would drop delete files prematurely (before compaction), causing old rows to remain in the tableFixed incorrect information in recent_compactions system table and monitoring page
Fixed a bug where JSON data files in jobs writing to Snowflake were not being deleted
Fixed an issue where using a classic data source in a job caused errors when a field had multiple types during job creation.
2024.08.19-12.07
🔧 Bug Fixes
Minor bug fixes
2024.08.15-10.03
⬆️ Enhancements
CDC job monitoring enhancements - The monitoring page for Replication (CDC) jobs has been enhanced to improve tracking of table statuses. The page is now divided into two tabs, allowing for more accurate monitoring of each status:
Tables in the 'Pending Snapshot' or 'Snapshotting' status can be tracked in the Snapshots tab.
Tables running incrementally can be found in the Syncing Tables tab.
View the full changes documentation here.
Utilize JSON as the intermediate format when writing to Redshift.
Support for adding new primitive columns when writing to Snowflake Iceberg tables.
🔧 Bug Fixes
Iceberg:
Schema Evolution: Skip empty field names as query engines do not support them
Reading from Iceberg: when reading data from a table after dropping columns from that table, errors could happen, causing delays
Fixed a rare case where some task executions would stop running until the server is restarted
Fixed an issue where files could have been committed twice to an iceberg table if the server crashed while committing
2024.08.07-11.23
⬆️ Enhancements
Polaris Catalog Support
You can now configure Polaris Catalog as your default Iceberg Lakehouse catalog and begin ingesting data from databases, streams, and files into a high-performance Iceberg lake:
What is Polaris Catalog?
Polaris Catalog is an open source under the Apache 2.0 license and available on GitHub. It is offered as Snowflake’s managed service for Polaris Catalog in public preview.
Polaris Catalog enables open, secure lakehouse architectures with broad read-and-write interoperability and cross-engine access controls.
Support credential vending from Iceberg REST catalogs
Previously, connecting to Iceberg REST catalogs required setting your Amazon S3 connection through Upsolver. Now, Upsolver will use the credentials already configured in your catalog for access.
Independently create cluster for accounts with multiple VPC connections
Organizations with multiple VPC connections can independently create a cluster. A new parameter,
VPC_CONNECTION
, has been added to theCREATE CLUSTER
command, allowing you to select the relevant VPC connection.
Configure Snapshot Parallelism for CDC jobs via UI
When creating a CDC job, you'll be able to set the snapshot parallelism via the UI.
When the CDC job begins, it initially takes a snapshot (full historical load) of each table before loading changes incrementally. Snapshot parallelism allows you to configure the number of snapshots performed concurrently. Increasing the number of concurrent snapshots can speed up the table streaming process. However, higher parallelism also increases the load on the source database.
After starting the job, you can adjust the parallelism setting while the job runs. The default parallelism is se
t to1.
🔧 Bug Fixes
Fixed a bug in jobs writing to a table where using a
JOIN
expression with an uppercase alias caused the joined row to return nulls in all fields.Fixed a bug when creating ingestion jobs from Amazon S3 with large CSV files to Iceberg tables.
Fixed job monitoring for jobs writing to Iceberg tables with
JOIN
expressions.
Last updated