LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
How To Guides
How To Guides
  • How To Guides
  • SETUP
    • Deploy Upsolver on AWS
      • Deployment Guide
      • AWS Role Permissions
      • VPC Peering Guide
      • Role-Based AWS Credentials
    • Enable API Integration
    • Install the Upsolver CLI
  • CONNECTORS
    • Create Connections
      • Amazon Kinesis
      • Amazon Redshift
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • ClickHouse
      • Confluent Cloud
      • Elasticsearch
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • PostgreSQL
      • Snowflake
      • Tabular
    • Configure Access
      • Amazon Kinesis
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • Confluent Kafka
    • Enable CDC
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • PostgreSQL
  • JOBS
    • Basics
      • Real-time Data Ingestion — Amazon Kinesis to ClickHouse
      • Real-time Data Ingestion — Amazon S3 to Amazon Athena
      • Real-time Data Ingestion — Apache Kafka to Amazon Athena
      • Real-time Data Ingestion — Apache Kafka to Snowflake
    • Advanced Use Cases
      • Build a Data Lakehouse
      • Enriching Data - Amazon S3 to ClickHouse
      • Joining Data — Amazon S3 to Amazon Athena
      • Upserting Data — Amazon S3 to Amazon Athena
      • Aggregating Data — Amazon S3 to Amazon Athena
      • Managing Data Quality - Ingesting Data with Expectations
    • Database Replication
      • Replicate CDC Data into Snowflake
      • Replicate CDC Data to Multiple Targets in Snowflake
      • Ingest Your Microsoft SQL Server CDC Data to Snowflake
      • Ingest Your MongoDB CDC Data to Snowflake
      • Handle PostgreSQL TOAST Values
    • VPC Flow Logs
      • Data Ingestion — VPC Flow Logs
      • Data Analytics — VPC Flow Logs
    • Job Monitoring
      • Export Metrics to a Third-Party System
    • Data Observability
      • Observe Data with Datasets
  • DATA
    • Query Upsolver Iceberg Tables from Snowflake
  • APACHE ICEBERG
    • Analyze Your Iceberg Tables Using the Upsolver CLI
    • Optimize Your Iceberg Tables
Powered by GitBook
On this page
  • Overview
  • Installation
  • Using Brew
  • Using PIP
  • Usage Instructions
  • Command-Line Options
  • Usage
  • Source Code
  1. APACHE ICEBERG

Analyze Your Iceberg Tables Using the Upsolver CLI

This how-to guide shows you how to install the Iceberg Diagnostic Tool to discover how Upsolver can optimize your Iceberg tables for improved performance.

Overview

The Iceberg Table Analysis CLI Tool evaluates your Apache Iceberg tables to identify how Upsolver optimizations can enhance efficiency. It presents a side-by-side comparison of current metrics against potential improvements in scan duration, file counts, and file sizes, providing a straightforward assessment of optimization opportunities.

The following example shows the output from running the CLI against an Iceberg table to check the current performance metrics:

Installation

iceberg-diag can be installed using either Brew or PIP, as detailed below:

Using Brew

Execute the following commands to install iceberg-diag via Brew:

brew tap upsolver/iceberg-diag
brew install iceberg-diag

Using PIP

Prerequisites

  • Python 3.8 or higher: Verify Python's installation:

    python3 --version
  • Rust: check if installed:

    cargo --version

    If Rust is not installed, install it using:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

To install iceberg-diag using PIP, ensure you have the latest version of pip:

pip install --upgrade pip

Then, install the package with pip

pip install iceberg-diag

Usage Instructions

iceberg-diag [options]

Command-Line Options

  • -h, --help: Display the help message and exit.

  • --profile PROFILE: Set the AWS credentials profile for the session, defaults to the environment's current settings.

  • --region REGION: Set the AWS region for operations, defaults to the specified profile's default region.

  • --database DATABASE: Set the database name, will list all available iceberg tables if no --table-name provided.

  • --table-name TABLE_NAME: Enter the table name or a glob pattern (e.g., '*', 'tbl_*').

  • --remote: Enable remote diagnostics by sending data to the Upsolver API for processing. The 'Remote' option yields more detailed analytics compared to running the process locally.

Usage

  1. Displaying help information:

     iceberg-diag --help
  2. Listing all available databases in profile:

    iceberg-diag --profile <profile>
  3. Listing all available iceberg tables in a given database:

    iceberg-diag --profile <profile> --database <database>
  4. Running diagnostics on a specific table in a specific AWS profile and region (completely locally):

     iceberg-diag --profile <profile> --region <region> --database <database> --table-name '*'
  5. Running diagnostics using remote option

    iceberg-diag --profile <profile> --database <database> --table-name 'prod_*' --remote

Source Code

The source code of Iceberg diagnostic tool can be found here:

https://github.com/Upsolver/iceberg-diag
Example of the CLI output