LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
How To Guides
How To Guides
  • How To Guides
  • SETUP
    • Deploy Upsolver on AWS
      • Deployment Guide
      • AWS Role Permissions
      • VPC Peering Guide
      • Role-Based AWS Credentials
    • Enable API Integration
    • Install the Upsolver CLI
  • CONNECTORS
    • Create Connections
      • Amazon Kinesis
      • Amazon Redshift
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • ClickHouse
      • Confluent Cloud
      • Elasticsearch
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • PostgreSQL
      • Snowflake
      • Tabular
    • Configure Access
      • Amazon Kinesis
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • Confluent Kafka
    • Enable CDC
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • PostgreSQL
  • JOBS
    • Basics
      • Real-time Data Ingestion — Amazon Kinesis to ClickHouse
      • Real-time Data Ingestion — Amazon S3 to Amazon Athena
      • Real-time Data Ingestion — Apache Kafka to Amazon Athena
      • Real-time Data Ingestion — Apache Kafka to Snowflake
    • Advanced Use Cases
      • Build a Data Lakehouse
      • Enriching Data - Amazon S3 to ClickHouse
      • Joining Data — Amazon S3 to Amazon Athena
      • Upserting Data — Amazon S3 to Amazon Athena
      • Aggregating Data — Amazon S3 to Amazon Athena
      • Managing Data Quality - Ingesting Data with Expectations
    • Database Replication
      • Replicate CDC Data into Snowflake
      • Replicate CDC Data to Multiple Targets in Snowflake
      • Ingest Your Microsoft SQL Server CDC Data to Snowflake
      • Ingest Your MongoDB CDC Data to Snowflake
      • Handle PostgreSQL TOAST Values
    • VPC Flow Logs
      • Data Ingestion — VPC Flow Logs
      • Data Analytics — VPC Flow Logs
    • Job Monitoring
      • Export Metrics to a Third-Party System
    • Data Observability
      • Observe Data with Datasets
  • DATA
    • Query Upsolver Iceberg Tables from Snowflake
  • APACHE ICEBERG
    • Analyze Your Iceberg Tables Using the Upsolver CLI
    • Optimize Your Iceberg Tables
Powered by GitBook
On this page
  • What you will learn
  • Prerequisites
  • Step 1
  • Create a connection to Snowflake
  • Step 2
  • Create a catalog integration in Snowflake
  • Step 3
  • Configure an S3 external volume in Snowflake
  • Step 4
  • Create an Upsolver-managed Iceberg table
  • Step 5
  • Create a mirror table in Snowflake
  • Step 6
  • Create a job to populate the Iceberg table
  • Step 7
  • Monitor and maintain your mirror tables
  • Conclusion
  • Try it yourself
  1. DATA

Query Upsolver Iceberg Tables from Snowflake

Learn how to query Upsolver-managed Iceberg tables from your Snowflake database.

Did you know you can query externally managed Apache Iceberg tables from Snowflake? It sounds great, but if you've tried implementing this, you know it comes with manual, ongoing maintenance. When inserts, updates, or deletes are applied, the metadata for the table changes and a new snapshot is committed. These changes require a manual refresh with Snowflake, so users won't see up-to-date information until the table is synchronized.

Traditionally, to keep Snowflake tables in sync with Iceberg tables, you need to:

  • Write and orchestrate a task to refresh the Snowflake table.

  • Track the snapshot filename and location to update the table.

  • Guess the best time to run the operation since data changes randomly.

  • Pay for unnecessary compute costs if the operation runs when there are no changes.

Upsolver offers a solution that allows you to mirror Upsolver-managed Iceberg tables to Snowflake, providing:

  • Data that is always up-to-date, ensuring users work with the freshest data.

  • No manual intervention, eliminating the need to write or orchestrate tasks.

  • Minimal compute costs since refreshes occur only when data changes.

By using an integration between Snowflake and your object store where Upsolver ingests your data, you can query the data directly in Amazon S3 without having to move and duplicate it. Furthermore, no additional ETL is required, saving development time and ongoing maintenance. Changes are automatically mirrored from your Upsolver-managed Iceberg table to Snowflake.

What you will learn

In this guide, you will learn how to query your Upsolver-managed Iceberg tables from Snowflake. You will create a connection to your Snowflake database within Upsolver and then, in Snowflake, create a catalog integration and external volume to enable Snowflake to view the data. You will create an Upsolver-managed Iceberg table that you can query from Snowflake.

Lastly, you will create a job to populate your Iceberg table from an Amazon Kinesis stream, and learn how to monitor the status of your mirrored tables, and pause, resume, and drop a replication.


Prerequisites

The steps for querying your Upsolver-managed Iceberg tables from Snowflake, are as follows:

  1. Create a connection to your Snowflake database

  2. Create an integration between Snowflake and Iceberg

  3. Configure an external volume in Snowflake

  4. Create an Upsolver-managed Iceberg table

  5. Mirror the table in Snowflake

  6. Create a job to populate your Iceberg table

  7. Monitor and maintain your mirrored tables


Step 1

Create a connection to Snowflake

The first step is to create a connection to Snowflake from Upsolver. This connection requires the necessary credentials to access your database.

When you create a connection in Upsolver, other users in your organization will be able to use this connection, and it is always on, meaning you don't need to reconnect each time you want to use it.

Here's the code:

// SYNTAX
CREATE SNOWFLAKE CONNECTION <connection_identifier> 
    CONNECTION_STRING = '<connection_string>' 
    USER_NAME = '<user_name>'
    PASSWORD = '<password>'; 

// EXAMPLE
CREATE SNOWFLAKE CONNECTION my_snowflake_connection
    CONNECTION_STRING = 'jdbc:snowflake://snowflakedemo.us-east-1.snowflakecomputing.com?db=DEMO_DB'
    USER_NAME = 'my_username'
    PASSWORD = 'my_password'
    COMMENT = 'Connection to mirror Iceberg tables to Snowflake';

In the above example, a connection named my_snowflake_connection was created, providing permission for Upsolver to connect to the DEMO_DB database. An optional COMMENT was included to denote that this connection will be used for mirroring Iceberg tables.

Snowflake Connection

Step 2

Create a catalog integration in Snowflake

Next, you will need to create a catalog integration within your Snowflake database to enable it to connect with data held in Iceberg tables within the object store.

Here's the code:

// SYNTAX
CREATE CATALOG INTEGRATION <catalog_integration_name>
  CATALOG_SOURCE = OBJECT_STORE
  TABLE_FORMAT = ICEBERG
  ENABLED = TRUE;
  
// EXAMPLE
CREATE CATALOG INTEGRATION myCatalogInt
  CATALOG_SOURCE = OBJECT_STORE
  TABLE_FORMAT = ICEBERG
  ENABLED = TRUE;

The example above creates a catalog integration named myCatalogInt. The CATALOG_SOURCE is set to OBJECT_STORE to define the location as external cloud storage.

We set the TABLE_FORMAT as ICEBERG, and ENABLED should be TRUE.

More Information

Step 3

Configure an S3 external volume in Snowflake

Ensure that both Snowflake and Upsolver have the necessary permissions for this location.

Create External Volume Syntax:

// SYNTAX
CREATE EXTERNAL VOLUME <volume_name>
  STORAGE_LOCATIONS =
    (
      (
        NAME = <volume_name>
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://<bucket>[/<path>/]'
        STORAGE_AWS_ROLE_ARN = '<iam_role>'
        STORAGE_AWS_EXTERNAL_ID = '<external_id>'
      )  
    );

// EXAMPLE 
CREATE EXTERNAL VOLUME iceberg_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'iceberg_location'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://testbucket/testpath/'
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::111111111111:role/<upsolver-role-*>'
        STORAGE_AWS_EXTERNAL_ID = 'my_external_id'
      )  
    );

Step 4

Create an Upsolver-managed Iceberg table

Example:

CREATE ICEBERG TABLE default_glue_catalog.my_database.my_iceberg_table();

Step 5

Create a mirror table in Snowflake

Next, you can establish a mirror table in your Snowflake database.

When you create a mirror table in Upsolver, an external table is generated in Snowflake. Changes are automatically mirrored with every change (every Iceberg commit) from your Upsolver-managed Iceberg table to Snowflake.

Here's the code:

// SYNTAX
CREATE MIRROR FOR <glue_connection.schema.tbl_name>
    IN <sf_connection.schema.tbl_name>
    CATALOG_INTEGRATION = <CATALOG_INTEGRATION>
    EXTERNAL_VOLUME = <EXTERNAL_VOLUME>
    MIRROR_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] };
    
// EXAMPLE
CREATE MIRROR FOR default_glue_catalog.demo.my_iceberg_table
    IN my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl
    CATALOG_INTEGRATION = myCatalogInt
    EXTERNAL_VOLUME = iceberg_volume
    MIRROR_INTERVAL = 1 HOUR;  

In the above example, we will mirror the my_iceberg_table Upsolver-managed Iceberg table to a mirrored table named mirror_iceberg_tbl within Snowflake. The MIRROR_INTERVAL option enables us to control how often the Snowflake table should be mirrored in order to control Snowflake charges, in this example, we set the frequency to 1 HOUR.

Step 6

Create a job to populate the Iceberg table

Now that we have defined and mirrored our table, the next step is to create a job to populate it with the data we want to query from Snowflake.

In our example, we will create a job to ingest data from an Amazon Kinesis stream.

Here's the code:

// CREATE A JOB TO INGEST DATA FROM KINESIS TO OUR ICEBERG TABLE
CREATE SYNC JOB ingest_kinesis_data
    START_FROM = BEGINNING
    CONTENT_TYPE = JSON
AS COPY FROM KINESIS my_kinesis_connection
    STREAM = 'sample-stream'
INTO default_glue_catalog.demo.my_iceberg_table; 

In the above code, we create a job named ingest_kinesis_data to load data into our Upsolver-managed Iceberg table, my_iceberg_table, and the job will run until paused or dropped.

Congratulations, you can now query the data in your Upsolver-managed Iceberg table from your Snowflake database.

Step 7

Monitor and maintain your mirror tables

Upsolver includes system tables that you can query for monitoring your entities. The mirrors system table provides information about all tables you have shared with Snowflake.

In Upsolver, open a new Worksheet, and run the following code:

// QUERY THE MIRRORS SYSTEM TABLE
SELECT *
FROM system.information_schema.mirrors;

This table details each mirrored table within your organization, including the status to show if the mirroring process is Running or Paused.

System Tables

It is easy to pause and resume mirroring between your Upsolver-managed Iceberg table and your Snowflake table by running the following code from a Worksheet:

// PAUSE THE MIRROR
PAUSE MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl;

// RESUME THE MIRROR
RESUME MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl;

When you resume a mirror, all data that was not mirrored during the time when it was paused, is then replicated to the mirrored table.

To drop mirroring between two tables, execute the following statement from your Worksheet, using the DROP_FROM_SNOWFLAKE option to determine if the table should be dropped from your Snowflake database, as shown below:

// DROP THE MIRROR TABLE BUT LEAVE IN SNOWFLAKE
DROP MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl 
    DROP_FROM_SNOWFLAKE = FALSE;
    
// DROP THE MIRROR TABLE AND REMOVE FROM SNOWFLAKE
DROP MIRROR my_snowflake_connection.mirror_demo_schema.mirror_iceberg_tbl 
    DROP_FROM_SNOWFLAKE = TRUE;    

Conclusion

In this guide, you learned how to create a connection to your Snowflake database from Upsolver. You saw how to create a catalog integration to enable Snowflake to connect to an object store, and then learned how to configure an S3 external volume from Snowflake to point to the storage location where your Upsolver-managed Iceberg tables are located.

You then saw how to create an Upsolver-managed Iceberg table and mirrored your table so you can query the data from your Snowflake database. Then you discovered the options for creating a job to populate your new table from Amazon Kinesis. Lastly, you learned how to monitor and maintain your mirrored tables.


Try it yourself

To query your Upsolver-managed Iceberg tables from Snowflake, follow these steps:

Last updated 9 months ago

Before you begin, please ensure you have followed the guide to and created the necessary roles and permissions for Upsolver to access your Amazon S3 storage.

For more information, connection options, and examples, please see the SQL reference guide to creating a connection.

Please see the page on how to in the Snowflake documentation for further details.

Now that we have created the catalog integration, the next step is to configure an S3 external volume within snowflake. It is imperative that this external volume points to the same S3 storage location utilized by your Upsolver-managed Iceberg table, which you will create in .

Follow the Snowflake documentation to . Configure all required AWS role and permission as explained in Snowflake's doc.

that you'd like to populate and replicate to Snowflake.

Upsolver supports streaming, database (CDC), and file sources, so please see the to create a job for your exact requirements.

Please see the system table reference to discover the data provided by this table.

Create a to your Snowflake database.

Configure a from Snowflake.

Configure an for Amazon S3 in Snowflake.

Create an .

the iceberg table in Snowflake.

Create a to populate your Iceberg table.

Monitor your tables using the Upsolver system tables.

Deploy Upsolver on AWS
Configure a catalog integration for files in object storage
Configure an external volume for Amazon S3
Create an Upsolver-managed Iceberg table
catalog integration
external volume
Step 4
Snowflake
SQL reference
Mirrors
connection
Upsolver-managed Iceberg table
Mirror
job
mirror