LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Quickstarts
Quickstarts
  • Quickstarts
  • DATA INGESTION WIZARD
    • Using the Wizard
      • Source Set-up
        • Amazon Kinesis
        • Amazon S3
        • Apache Kafka
        • Confluent Cloud
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • PostgreSQL
      • Target Set-up
        • Amazon Redshift
        • AWS Glue Data Catalog
        • ClickHouse
        • Polaris Catalog
        • Snowflake
      • Job Configuration
        • Job Configuration
        • Job Configuration for CDC
      • Review and Run Job
  • CONNECTORS
    • Connectors
      • Amazon Kinesis
      • Amazon Redshift
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • ClickHouse
      • Confluent Cloud
      • Elasticsearch
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • Polaris Catalog
      • PostgreSQL
      • Snowflake
  • JOBS
    • Ingestion
      • Job Basics
        • Ingest to a Staging Table
        • Output to a Target Table
      • Stream and File Sources
        • Amazon Kinesis
        • Amazon S3
        • Apache Kafka
        • Confluent Kafka
      • CDC Sources
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • PostgreSQL
    • Transformation
      • Updating Data
        • Upsert Data to a Target Table
        • Delete Data from a Target Table
        • Aggregate and Output Data
        • Join Two Data Streams
      • Data Targets
        • Output to Amazon Athena
        • Output to Amazon Redshift
        • Output to Amazon S3
        • Output to Elasticsearch
        • Output to Snowflake
  • APACHE ICEBERG
    • Optimize Your Iceberg Tables
    • Install the Iceberg Table Analyzer
Powered by GitBook
On this page
  • Create a connection to your catalog
  • Analyze your tables
  • Review your table selection
  • Monitor table optimization
  1. APACHE ICEBERG

Optimize Your Iceberg Tables

This quickstart shows you how to select an Iceberg table for optimization.

Last updated 11 months ago

Create a connection to your catalog

Login to Upsolver and from the home screen select Optimize My Iceberg Tables. You can also click on the Upsolver logo at the top of the menu to view this screen:

This displays the Connect to Catalog screen, enabling you to connect to AWS Glue Data Catalog or Tabular. If you already have a connection in Upsolver, select Use an existing connection, otherwise, select Create a new connection, and enter your credentials:

When you have connected to your catalog, click Select Tables to continue to the next screen.

Analyze your tables

This takes you to the Datasets screen. From the navigation tree, click one or more tables to add to the analyzer:

The analyzer scans the partitions and files for each table you add, and calculates the potential space saving costs of running a compaction operation, and how much this will speed up scans. Each table you add to the list will be added to the optimization process. To remove a table from the list, click the bin icon at the far right of the row for the table you want to exclude.

You can view more detailed insights on a table by clicking the information icon at the far right of the row, or by clicking on the Table Name link. This displays a pop-up window with more statistics on the potential storage savings and data scan improvement:

Click Remove Table from Optimization, or Cancel to close the window. Having selected your tables, click Review Optimization to navigate to the next screen when you can confirm your selection.

Review your table selection

Review the SQL code for the tables you want to optimize. Optionally, you can click Edit in Worksheet if you want to make alterations to the code and execute it manually. Alternatively, you can click Copy to run the code from another query tool.

When you are ready, click Start Optimization, and this returns you to the Datasets screen where you can monitor the space savings and data scan improvements following the optimization process.

Monitor table optimization

In Datasets, you can click on the table you selected for optimization to view the status of the optimization process, and see space savings. The Table Statistics tab displays running values for the count of files and partitions, the size of the table and potential savings:

Click on the Compactions tab to view the status for each partition: see the Start Time and Status, and the number of Data Files and the Data File Size, and scroll to the right to view information on equality and position delete:

Learn More

See the and reference for more information about the details provided in these tabs.

Learn the to understand the operations that Upsolver performs.

Optimization Processes for Iceberg Tables in Upsolver
Table Statistics
Compactions
The Upsolver home screen provides the gateway to optimizing your Iceberg tables
Connect to an existing or new catalog to AWS Glue Data Catalog or Tabular.
Click on a table in the tree and Upsolver will analyze the files and partitions that comprise the table.
Look at the details for each table to determine if it requires optimizing.
The Table Statistics tab shows you the current size of your partitions and files and projected savings.
The Compactions tab displays in-depth statistics on each partition that comprises your table.