Query Upsolver Iceberg Tables from Snowflake
Learn how to query Upsolver-managed Iceberg tables from your Snowflake database.
Did you know you can query externally managed Apache Iceberg tables from Snowflake? It sounds great, but if you've tried implementing this, you know it comes with manual, ongoing maintenance. When inserts, updates, or deletes are applied, the metadata for the table changes and a new snapshot is committed. These changes require a manual refresh with Snowflake, so users won't see up-to-date information until the table is synchronized.
Traditionally, to keep Snowflake tables in sync with Iceberg tables, you need to:
Write and orchestrate a task to refresh the Snowflake table.
Track the snapshot filename and location to update the table.
Guess the best time to run the operation since data changes randomly.
Pay for unnecessary compute costs if the operation runs when there are no changes.
Upsolver offers a solution that allows you to mirror Upsolver-managed Iceberg tables to Snowflake, providing:
Data that is always up-to-date, ensuring users work with the freshest data.
No manual intervention, eliminating the need to write or orchestrate tasks.
Minimal compute costs since refreshes occur only when data changes.
By using an integration between Snowflake and your object store where Upsolver ingests your data, you can query the data directly in Amazon S3 without having to move and duplicate it. Furthermore, no additional ETL is required, saving development time and ongoing maintenance. Changes are automatically mirrored from your Upsolver-managed Iceberg table to Snowflake.
What you will learn
In this guide, you will learn how to query your Upsolver-managed Iceberg tables from Snowflake. You will create a connection to your Snowflake database within Upsolver and then, in Snowflake, create a catalog integration and external volume to enable Snowflake to view the data. You will create an Upsolver-managed Iceberg table that you can query from Snowflake.
Lastly, you will create a job to populate your Iceberg table from an Amazon Kinesis stream, and learn how to monitor the status of your mirrored tables, and pause, resume, and drop a replication.
Prerequisites
Before you begin, please ensure you have followed the guide to Deploy Upsolver on AWS and created the necessary roles and permissions for Upsolver to access your Amazon S3 storage.
The steps for querying your Upsolver-managed Iceberg tables from Snowflake, are as follows:
Create a connection to your Snowflake database
Create an integration between Snowflake and Iceberg
Configure an external volume in Snowflake
Create an Upsolver-managed Iceberg table
Mirror the table in Snowflake
Create a job to populate your Iceberg table
Monitor and maintain your mirrored tables
Step 1
Create a connection to Snowflake
The first step is to create a connection to Snowflake from Upsolver. This connection requires the necessary credentials to access your database.
When you create a connection in Upsolver, other users in your organization will be able to use this connection, and it is always on, meaning you don't need to reconnect each time you want to use it.
Here's the code:
In the above example, a connection named my_snowflake_connection was created, providing permission for Upsolver to connect to the DEMO_DB database. An optional COMMENT
was included to denote that this connection will be used for mirroring Iceberg tables.
Snowflake Connection
Step 2
Create a catalog integration in Snowflake
Next, you will need to create a catalog integration within your Snowflake database to enable it to connect with data held in Iceberg tables within the object store.
Here's the code:
The example above creates a catalog integration named myCatalogInt. The CATALOG_SOURCE
is set to OBJECT_STORE to define the location as external cloud storage.
We set the TABLE_FORMAT
as ICEBERG, and ENABLED
should be TRUE.
More Information
Please see the page on how to Configure a catalog integration for files in object storage in the Snowflake documentation for further details.
Step 3
Configure an S3 external volume in Snowflake
Now that we have created the catalog integration, the next step is to configure an S3 external volume within snowflake. It is imperative that this external volume points to the same S3 storage location utilized by your Upsolver-managed Iceberg table, which you will create in Step 4.
Follow the Snowflake documentation to Configure an external volume for Amazon S3. Configure all required AWS role and permission as explained in Snowflake's doc.
Ensure that both Snowflake and Upsolver have the necessary permissions for this location.
Create External Volume Syntax:
Step 4
Create an Upsolver-managed Iceberg table
Create an Upsolver-managed Iceberg table that you'd like to populate and replicate to Snowflake.
Example:
Step 5
Create a mirror table in Snowflake
Next, you can establish a mirror table in your Snowflake database.
When you create a mirror table in Upsolver, an external table is generated in Snowflake. Changes are automatically mirrored with every change (every Iceberg commit) from your Upsolver-managed Iceberg table to Snowflake.
Here's the code:
In the above example, we will mirror the my_iceberg_table Upsolver-managed Iceberg table to a mirrored table named mirror_iceberg_tbl within Snowflake. The MIRROR_INTERVAL
option enables us to control how often the Snowflake table should be mirrored in order to control Snowflake charges, in this example, we set the frequency to 1 HOUR.
Step 6
Create a job to populate the Iceberg table
Now that we have defined and mirrored our table, the next step is to create a job to populate it with the data we want to query from Snowflake.
In our example, we will create a job to ingest data from an Amazon Kinesis stream.
Here's the code:
In the above code, we create a job named ingest_kinesis_data to load data into our Upsolver-managed Iceberg table, my_iceberg_table, and the job will run until paused or dropped.
Congratulations, you can now query the data in your Upsolver-managed Iceberg table from your Snowflake database.
Step 7
Monitor and maintain your mirror tables
Upsolver includes system tables that you can query for monitoring your entities. The mirrors system table provides information about all tables you have shared with Snowflake.
In Upsolver, open a new Worksheet, and run the following code:
This table details each mirrored table within your organization, including the status to show if the mirroring process is Running or Paused.
System Tables
It is easy to pause and resume mirroring between your Upsolver-managed Iceberg table and your Snowflake table by running the following code from a Worksheet:
When you resume a mirror, all data that was not mirrored during the time when it was paused, is then replicated to the mirrored table.
To drop mirroring between two tables, execute the following statement from your Worksheet, using the DROP_FROM_SNOWFLAKE
option to determine if the table should be dropped from your Snowflake database, as shown below:
Conclusion
In this guide, you learned how to create a connection to your Snowflake database from Upsolver. You saw how to create a catalog integration to enable Snowflake to connect to an object store, and then learned how to configure an S3 external volume from Snowflake to point to the storage location where your Upsolver-managed Iceberg tables are located.
You then saw how to create an Upsolver-managed Iceberg table and mirrored your table so you can query the data from your Snowflake database. Then you discovered the options for creating a job to populate your new table from Amazon Kinesis. Lastly, you learned how to monitor and maintain your mirrored tables.
Try it yourself
To query your Upsolver-managed Iceberg tables from Snowflake, follow these steps:
Configure a catalog integration from Snowflake.
Configure an external volume for Amazon S3 in Snowflake.
Last updated