🎓Academy

Check out the following free e-learning modules to help you get started with Upsolver and learn about current data topics.

Modules

Check out the following modules to begin your journey into learning Apache Iceberg:


Hive to Iceberg Table Migration

In this eLearning module, Roy Hasson, VP of Product at Upsolver, will walk you through the key strategies and considerations for migrating Hive tables to Iceberg.

What you’ll learn:

  • Migration Strategies: Learn the differences between in-place migration, which adds Iceberg metadata to existing files, and full migration, which involves a complete data transfer.

  • Simplifying In-Place Migrations: Discover how to utilize the Iceberg migrate procedure for adding data files to Iceberg tables, streamlining the process without rewriting your data.

  • Performing DML and Optimizations: Understand how DML operations and optimizations like compaction are handled with migrated tables and what you can do to get the most benefit from your new Iceberg tables.


Building an Iceberg Lakehouse with Spark and Upsolver

Watch this eLearning module for a technical deep dive where we show you how to build and operate an Iceberg-based lakehouse.

You’ll start by learning how to create and query Iceberg tables using Apache Spark. Then, you’ll explore how data is organized in S3 and what properties you should tune for best performance.

What will be covered?

  • Spark in Lakehouse Architecture: Learn how Apache Iceberg integrates with Apache Spark, emphasizing its role in ETL and reducing the cost of data transformation and storage compared to a traditional data warehouse.

  • Simple and Reliable Ingestion with Upsolver: Examine how Upsolver simplifies the ingestion of operational data into Iceberg tables, highlighting its no-code and ZeroETL approaches for efficient data movement.

  • Impacts of Data Management on Query Performance: Explore the impacts of small files, fast and continuous updates/deletes, and manifest file churn on query performance. Compare how data is managed and optimized between Spark and Upsolver, including how each handles schema evolution and transactional concurrency.

  • Best Practices for Implementing Lakehouse Architectures: Discuss best practices for deploying and managing a lakehouse architecture using Spark and Upsolver, with insights into optimizing storage, improving query speeds, and ensuring high quality, reliable data

This module is presented by Upsolver's VP of Product, Roy Hasson, who brings a wealth of knowledge from his previous position as a product manager for AWS Glue and AWS Lake Formation.


Iceberg Table Optimization Techniques

This e-learning module is designed for data engineers, architects, and anyone working with Iceberg Tables. We will cover essential optimization techniques crucial for maintaining a healthy Iceberg architecture according to best practices.

What will be covered?

  • Effective Data Ordering: Learn how to optimize data retrieval through strategic ordering techniques to enhance query performance.

  • Efficient Partitioning Strategies: Discover methods to partition Iceberg tables efficiently to ensure data is organized for optimal access and processing speed.

  • Managing Small Files: Address the common challenge of small file management in Iceberg tables, which can degrade performance and increase costs.

  • Sharing Iceberg Tables Externally: Explore best practices for sharing Iceberg tables across different environments and platforms, ensuring compatibility and maintaining data integrity.

This module is presented Jason Hall, Senior Solutions Architect at Upsolver.


Lakehouse vs. Data Lake

In these videos we provide an in-depth look at data lakes and lakehouses, highlighting their goals, architectural differences, and use cases.

The learning covers everything from the basics and common applications to detailed discussions on architecture, file versus table formats, and transactions.

The module also addresses:

  • Schema and partition management

  • Scalability

  • Performance challenges

  • And offers guidance on selecting the appropriate technology for your data needs.

In his role as VP of Product, Roy Hasson contributes extensive knowledge from his previous position as a product manager for AWS Glue and AWS Lake Formation.

Last updated