Output to Snowflake

Prerequisites

Ensure that you have a Snowflake connection with the correct permissions to write to your target table. Additionally, this target table should already exist within Snowflake before writing to it using Upsolver. Furthermore, you will need a storage connection that has access to the bucket you would like the job to use to store the intermediate files used while running the job. Finally, you should also have a staging table created previously that contains your ingested data.

Create a job writing to Snowflake

After you have fulfilled the prerequisites, you can create an INSERT job as follows:

CREATE SYNC JOB load_data_to_snowflake
   RUN_INTERVAL = 1 MINUTE
   START_FROM = BEGINNING
   COMMENT = 'insert data into snowflake target table'
AS INSERT INTO SNOWFLAKE your_snowflake_connection.your_schema.target_table 
   MAP_COLUMNS_BY_NAME
      SELECT customer_id,
             COUNT(DISTINCT order_id) AS num_orders,
             SUM(net_total) AS total_spent,
             MIN(order_date) AS first_purchase,
             MAX(order_date) AS last_purchase
      FROM default_glue_catalog.your_database.staging_table
      WHERE time_filter()
      GROUP BY customer_id;

This example only uses a subset of all job options available when writing to Snowflake.

Depending on your use case, you may want to configure a different set of options. For instance, this example contains an aggregation, which means you may want to configure the AGGREGATION_PARALLELISM option.

Alter a job writing to Snowflake

Certain job options are considered mutable, meaning that in some cases, you can run a SQL command to alter an existing transformation job rather than having to create a new one.

For example, take the job we created as an example earlier:

CREATE SYNC JOB load_data_to_snowflake
   RUN_INTERVAL = 1 MINUTE
   START_FROM = BEGINNING
   COMMENT = 'insert data into snowflake target table'
AS INSERT INTO SNOWFLAKE your_snowflake_connection.your_schema.target_table 
   MAP_COLUMNS_BY_NAME
      SELECT customer_id,
             COUNT(DISTINCT order_id) AS num_orders,
             SUM(net_total) AS total_spent,
             MIN(order_date) AS first_purchase,
             MAX(order_date) AS last_purchase
      FROM default_glue_catalog.your_database.staging_table
      WHERE time_filter()
      GROUP BY customer_id;

If you wanted to keep the job as is but just change the cluster that is running the job, you can run the following command:

ALTER JOB load_data_to_snowflake
    SET COMPUTE_CLUSTER = high_memory_cluster;

Note that some options such as RUN_INTERVAL cannot be altered once the connection has been created.

Drop a job writing to Snowflake

If you no longer need a certain job, you can easily drop it with the following SQL command:

DROP JOB load_data_to_snowflake;

Learn More


Last updated