Query Upsolver Iceberg Tables from Snowflake

Learn how to query Upsolver-managed Iceberg tables from your Snowflake database.

Did you know you can query externally managed Apache Iceberg tables from Snowflake? It sounds great, but if you've tried implementing this, you know it comes with manual, ongoing maintenance. When inserts, updates, or deletes are applied, the metadata for the table changes and a new snapshot is committed. These changes require a manual refresh with Snowflake, so users won't see up-to-date information until the table is synchronized.

Traditionally, to keep Snowflake tables in sync with Iceberg tables, you need to:

Write and orchestrate a task to refresh the Snowflake table.
Track the snapshot filename and location to update the table.
Guess the best time to run the operation since data changes randomly.
Pay for unnecessary compute costs if the operation runs when there are no changes.

Upsolver offers a solution that allows you to mirror Upsolver-managed Iceberg tables to Snowflake, providing:

Data that is always up-to-date, ensuring users work with the freshest data.
No manual intervention, eliminating the need to write or orchestrate tasks.
Minimal compute costs since refreshes occur only when data changes.

By using an integration between Snowflake and your object store where Upsolver ingests your data, you can query the data directly in Amazon S3 without having to move and duplicate it. Furthermore, no additional ETL is required, saving development time and ongoing maintenance. Changes are automatically mirrored from your Upsolver-managed Iceberg table to Snowflake.

What you will learn

In this guide, you will learn how to query your Upsolver-managed Iceberg tables from Snowflake. You will create a connection to your Snowflake database within Upsolver and then, in Snowflake, create a catalog integration and external volume to enable Snowflake to view the data. You will create an Upsolver-managed Iceberg table that you can query from Snowflake.

Lastly, you will create a job to populate your Iceberg table from an Amazon Kinesis stream, and learn how to monitor the status of your mirrored tables, and pause, resume, and drop a replication.

Prerequisites

Before you begin, please ensure you have followed the guide to Deploy Upsolver on AWS and created the necessary roles and permissions for Upsolver to access your Amazon S3 storage.

The steps for querying your Upsolver-managed Iceberg tables from Snowflake, are as follows:

Create a connection to your Snowflake database
Create an integration between Snowflake and Iceberg
Configure an external volume in Snowflake
Create an Upsolver-managed Iceberg table
Mirror the table in Snowflake
Create a job to populate your Iceberg table
Monitor and maintain your mirrored tables

Step 1

Create a connection to Snowflake

The first step is to create a connection to Snowflake from Upsolver. This connection requires the necessary credentials to access your database.

When you create a connection in Upsolver, other users in your organization will be able to use this connection, and it is always on, meaning you don't need to reconnect each time you want to use it.

Here's the code:

// SYNTAX
CREATE SNOWFLAKE CONNECTION <connection_identifier> 
    CONNECTION_STRING = '<connection_string>' 
    USER_NAME = '<user_name>'
    PASSWORD = '<password>'; 

// EXAMPLE
CREATE SNOWFLAKE CONNECTION my_snowflake_connection
    CONNECTION_STRING = 'jdbc:snowflake://snowflakedemo.us-east-1.snowflakecomputing.com?db=DEMO_DB'
    USER_NAME = 'my_username'
    PASSWORD = 'my_password'
    COMMENT = 'Connection to mirror Iceberg tables to Snowflake';

In the above example, a connection named my_snowflake_connection was created, providing permission for Upsolver to connect to the DEMO_DB database. An optional COMMENT was included to denote that this connection will be used for mirroring Iceberg tables.

Snowflake Connection

For more information, connection options, and examples, please see the SQL reference guide to creating a Snowflake connection.

Step 2

Create a catalog integration in Snowflake

Next, you will need to create a catalog integration within your Snowflake database to enable it to connect with data held in Iceberg tables within the object store.

Here's the code:

// SYNTAX
CREATE CATALOG INTEGRATION <catalog_integration_name>
  CATALOG_SOURCE = OBJECT_STORE
  TABLE_FORMAT = ICEBERG
  ENABLED = TRUE;
  
// EXAMPLE
CREATE CATALOG INTEGRATION myCatalogInt
  CATALOG_SOURCE = OBJECT_STORE
  TABLE_FORMAT = ICEBERG
  ENABLED = TRUE;

The example above creates a catalog integration named myCatalogInt. The CATALOG_SOURCE is set to OBJECT_STORE to define the location as external cloud storage.

We set the TABLE_FORMAT as ICEBERG, and ENABLED should be TRUE.

More Information

Please see the page on how to Configure a catalog integration for files in object storage in the Snowflake documentation for further details.

Step 3

Configure an S3 external volume in Snowflake

Now that we have created the catalog integration, the next step is to configure an S3 external volume within snowflake. It is imperative that this external volume points to the same S3 storage location utilized by your Upsolver-managed Iceberg table, which you will create in Step 4.

Follow the Snowflake documentation to Configure an external volume for Amazon S3. Configure all required AWS role and permission as explained in Snowflake's doc.

Ensure that both Snowflake and Upsolver have the necessary permissions for this location.

Create External Volume Syntax:

// SYNTAX
CREATE EXTERNAL VOLUME <volume_name>
  STORAGE_LOCATIONS =
    (
      (
        NAME = <volume_name>
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://<bucket>[/<path>/]'
        STORAGE_AWS_ROLE_ARN = '<iam_role>'
        STORAGE_AWS_EXTERNAL_ID = '<external_id>'
      )  
    );

// EXAMPLE 
CREATE EXTERNAL VOLUME iceberg_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'iceberg_location'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://testbucket/testpath/'
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::111111111111:role/<upsolver-role-*>'
        STORAGE_AWS_EXTERNAL_ID = 'my_external_id'
      )  
    );

Step 4

Create an Upsolver-managed Iceberg table

Create an Upsolver-managed Iceberg table that you'd like to populate and replicate to Snowflake.

Example:

CREATE ICEBERG TABLE default_glue_catalog.my_database.my_iceberg_table();

Step 5

Create a mirror table in Snowflake

Next, you can establish a mirror table in your Snowflake database.

When you create a mirror table in Upsolver, an external table is generated in Snowflake. Changes are automatically mirrored with every change (every Iceberg commit) from your Upsolver-managed Iceberg table to Snowflake.

Here's the code:

// SYNTAX
CREATE MIRROR FOR <glue_connection.schema.tbl_name>
    IN <sf_connection.schema.tbl_name>
    CATALOG_INTEGRATION = <CATALOG_INTEGRATION>
    EXTERNAL_VOLUME = <EXTERNAL_VOLUME>
    MIRROR_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] };
    
// EXAMPLE
CREATE MIRROR FOR default_glue_catalog.demo.my_iceberg_table
    IN my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl
    CATALOG_INTEGRATION = myCatalogInt
    EXTERNAL_VOLUME = iceberg_volume
    MIRROR_INTERVAL = 1 HOUR;

In the above example, we will mirror the my_iceberg_table Upsolver-managed Iceberg table to a mirrored table named mirror_iceberg_tbl within Snowflake. The MIRROR_INTERVAL option enables us to control how often the Snowflake table should be mirrored in order to control Snowflake charges, in this example, we set the frequency to 1 HOUR.

Step 6

Create a job to populate the Iceberg table

Now that we have defined and mirrored our table, the next step is to create a job to populate it with the data we want to query from Snowflake.

Upsolver supports streaming, database (CDC), and file sources, so please see the SQL reference to create a job for your exact requirements.

In our example, we will create a job to ingest data from an Amazon Kinesis stream.

Here's the code:

// CREATE A JOB TO INGEST DATA FROM KINESIS TO OUR ICEBERG TABLE
CREATE SYNC JOB ingest_kinesis_data
    START_FROM = BEGINNING
    CONTENT_TYPE = JSON
AS COPY FROM KINESIS my_kinesis_connection
    STREAM = 'sample-stream'
INTO default_glue_catalog.demo.my_iceberg_table;

In the above code, we create a job named ingest_kinesis_data to load data into our Upsolver-managed Iceberg table, my_iceberg_table, and the job will run until paused or dropped.

Congratulations, you can now query the data in your Upsolver-managed Iceberg table from your Snowflake database.

Step 7

Monitor and maintain your mirror tables

Upsolver includes system tables that you can query for monitoring your entities. The mirrors system table provides information about all tables you have shared with Snowflake.

In Upsolver, open a new Worksheet, and run the following code:

// QUERY THE MIRRORS SYSTEM TABLE
SELECT *
FROM system.information_schema.mirrors;

This table details each mirrored table within your organization, including the status to show if the mirroring process is Running or Paused.

System Tables

Please see the Mirrors system table reference to discover the data provided by this table.

It is easy to pause and resume mirroring between your Upsolver-managed Iceberg table and your Snowflake table by running the following code from a Worksheet:

// PAUSE THE MIRROR
PAUSE MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl;

// RESUME THE MIRROR
RESUME MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl;

When you resume a mirror, all data that was not mirrored during the time when it was paused, is then replicated to the mirrored table.

To drop mirroring between two tables, execute the following statement from your Worksheet, using the DROP_FROM_SNOWFLAKE option to determine if the table should be dropped from your Snowflake database, as shown below:

// DROP THE MIRROR TABLE BUT LEAVE IN SNOWFLAKE
DROP MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl 
    DROP_FROM_SNOWFLAKE = FALSE;
    
// DROP THE MIRROR TABLE AND REMOVE FROM SNOWFLAKE
DROP MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl 
    DROP_FROM_SNOWFLAKE = TRUE;

Conclusion

In this guide, you learned how to create a connection to your Snowflake database from Upsolver. You saw how to create a catalog integration to enable Snowflake to connect to an object store, and then learned how to configure an S3 external volume from Snowflake to point to the storage location where your Upsolver-managed Iceberg tables are located.

You then saw how to create an Upsolver-managed Iceberg table and mirrored your table so you can query the data from your Snowflake database. Then you discovered the options for creating a job to populate your new table from Amazon Kinesis. Lastly, you learned how to monitor and maintain your mirrored tables.

Try it yourself

To query your Upsolver-managed Iceberg tables from Snowflake, follow these steps:

Create a connection to your Snowflake database.
Configure a catalog integration from Snowflake.
Configure an external volume for Amazon S3 in Snowflake.
Create an Upsolver-managed Iceberg table.
Mirror the iceberg table in Snowflake.
Create a job to populate your Iceberg table.
Monitor your mirror tables using the Upsolver system tables.

Last updated 1 year ago