Optimize Your Iceberg Tables

This quickstart shows you how to select an Iceberg table for optimization.

Create a connection to your catalog

Login to Upsolver and from the home screen select Optimize My Iceberg Tables. You can also click on the Upsolver logo at the top of the menu to view this screen:

This displays the Connect to Catalog screen, enabling you to connect to AWS Glue Data Catalog or Tabular. If you already have a connection in Upsolver, select Use an existing connection, otherwise, select Create a new connection, and enter your credentials:

When you have connected to your catalog, click Select Tables to continue to the next screen.

Analyze your tables

This takes you to the Datasets screen. From the navigation tree, click one or more tables to add to the analyzer:

The analyzer scans the partitions and files for each table you add, and calculates the potential space saving costs of running a compaction operation, and how much this will speed up scans. Each table you add to the list will be added to the optimization process. To remove a table from the list, click the bin icon at the far right of the row for the table you want to exclude.

You can view more detailed insights on a table by clicking the information icon at the far right of the row, or by clicking on the Table Name link. This displays a pop-up window with more statistics on the potential storage savings and data scan improvement:

Click Remove Table from Optimization, or Cancel to close the window. Having selected your tables, click Review Optimization to navigate to the next screen when you can confirm your selection.

Review your table selection

Review the SQL code for the tables you want to optimize. Optionally, you can click Edit in Worksheet if you want to make alterations to the code and execute it manually. Alternatively, you can click Copy to run the code from another query tool.

When you are ready, click Start Optimization, and this returns you to the Datasets screen where you can monitor the space savings and data scan improvements following the optimization process.

Monitor table optimization

In Datasets, you can click on the table you selected for optimization to view the status of the optimization process, and see space savings. The Table Statistics tab displays running values for the count of files and partitions, the size of the table and potential savings:

Click on the Compactions tab to view the status for each partition: see the Start Time and Status, and the number of Data Files and the Data File Size, and scroll to the right to view information on equality and position delete:

Learn More

See the Table Statistics and Compactions reference for more information about the details provided in these tabs.

Learn the Optimization Processes for Iceberg Tables in Upsolver to understand the operations that Upsolver performs.

Last updated 1 year ago