Apache Iceberg

Discover how Upsolver can ingest your data to Iceberg, and analyze and optimize your lakehouse for reduced storage costs and optimized data scans.

Welcome to the Lakehouse

Data lakes are cost-effective for storing unlimited volumes of data - ideal for streaming big data and scaling at pace. However, with no inherent organization or understanding of what it holds, or a systematic way to develop relational mapping, it is no wonder that 85% of self-managed lakes encounter issues leading to failure within the first year.

Apache Iceberg introduces a new open table format standard to overcome the limitations of the lake. By tracking a canonical list of files rather than a directory, it brings database-like features to data lakes, including transactional concurrency, support for schema evolution, and time-travel and rollbacks through the use of snapshots.

Leveraging many of the features of a data warehouse on top of a data lake, Apache Iceberg builds a lakehouse that combines cheap storage with concurrent and fast transactions.

With a standard that is increasingly being adopted, Iceberg’s open table format allows any engine to read from, and write to, Iceberg tables, without adversely impacting other concurrent operations.

Upsolver & Apache Iceberg

While Apache Iceberg delivers an evolution to the data lake, it still requires intervention to compact and tune the files and partitions that comprise your tables.

It is important to manage your lakehouse to ensure you are not overpaying on storage and that data scans are efficiently returning results to the query engine as quickly as possible.

Not only does Upsolver support ingesting your data to Iceberg tables, we offer tools for auditing your Iceberg tables and an optimizer to compact and tune your tables.


Ingest Data to Apache Iceberg

We support ingesting your data from the major data platforms into Iceberg. After creating your pipelines, Upsolver takes care of managing your tables and maintaining performance so your users can query the data without experiencing the delays caused by long-running scans.

Upsolver will automatically manage your tables by running a compaction process based on industry best-practice. The compaction operations will be run at the optimal time to deliver the best results and, by reducing the size of your tables and number of files, you will save money on storage and benefit from faster data scans.

Ingest your data to Apache Iceberg from streaming, database, and file sources.

Ingest Your Data to Iceberg

To ingest your data into Apache Iceberg, begin with the Learning Paths and follow the step by step guide to setting up your environment and building a pipeline.


Optimize Your Iceberg Tables

Reduce costs and accelerate your queries for any Iceberg table. Our Iceberg Analyzer continuously monitors and optimizes Iceberg tables, whether created by Upsolver or another tool. We automatically apply data engineering best practices to reduce storage costs and accelerate query performance - no managing optimization jobs or custom code needed!

Run the Iceberg Analyzer to discover the tables that can be tuned and compacted:

Upsolver analyzes your Iceberg tables to uncover where storage and performance improvements can be made.

Upsolver compacts your files to reduce the size of your tables, which lowers the cost of your storage and increases data scans for increased query performance:

Our standalone optimization tool can help you tune your existing lakehouse. All you need is a connection to AWS Glue Data Catalog or Tabular, and you can be analyzing and optimizing your tables in minutes.

Optimize Your Iceberg Tables

Read the how-to Optimize Your Iceberg Tables guide, which will walk you end to end through the process of connecting to your catalog, analyzing your tables, and running the optimizer.


Find Iceberg Tables for Optimization

Quickly find tables in your lakehouse that need compaction to reduce storage and increase data scans. If you have already built your Iceberg lakehouse, you can install the Upsolver Iceberg Table Analyzer CLI tool to quickly analyze your existing lakehouse and identify problematic Iceberg tables.

View the percentage of improvement that you can gain for your tables:

The Iceberg Table Analyzer uncovers the tables that can benefit from compaction.

Download and run our open source CLI tool and uncover tables that can benefit from optimization.

Analyze Your Iceberg Tables

Use this quickstart guide to Install the Iceberg Table Analyzer CLI and begin analyzing your Iceberg tables.


Further Learning

Blogs

Videos

Last updated