Output to Snowflake

Prerequisites

Create a job writing to Snowflake

After you have fulfilled the prerequisites, you can create an INSERT job as follows:

CREATE SYNC JOB load_data_to_snowflake
   RUN_INTERVAL = 1 MINUTE
   START_FROM = BEGINNING
   COMMENT = 'insert data into snowflake target table'
AS INSERT INTO SNOWFLAKE your_snowflake_connection.your_schema.target_table 
   MAP_COLUMNS_BY_NAME
      SELECT customer_id,
             COUNT(DISTINCT order_id) AS num_orders,
             SUM(net_total) AS total_spent,
             MIN(order_date) AS first_purchase,
             MAX(order_date) AS last_purchase
      FROM default_glue_catalog.your_database.staging_table
      WHERE time_filter()
      GROUP BY customer_id;

This example only uses a subset of all job options available when writing to Snowflake.

Depending on your use case, you may want to configure a different set of options. For instance, this example contains an aggregation, which means you may want to configure the AGGREGATION_PARALLELISM option.

Alter a job writing to Snowflake

Certain job options are considered mutable, meaning that in some cases, you can run a SQL command to alter an existing transformation job rather than having to create a new one.

For example, take the job we created as an example earlier:

CREATE SYNC JOB load_data_to_snowflake
   RUN_INTERVAL = 1 MINUTE
   START_FROM = BEGINNING
   COMMENT = 'insert data into snowflake target table'
AS INSERT INTO SNOWFLAKE your_snowflake_connection.your_schema.target_table 
   MAP_COLUMNS_BY_NAME
      SELECT customer_id,
             COUNT(DISTINCT order_id) AS num_orders,
             SUM(net_total) AS total_spent,
             MIN(order_date) AS first_purchase,
             MAX(order_date) AS last_purchase
      FROM default_glue_catalog.your_database.staging_table
      WHERE time_filter()
      GROUP BY customer_id;

If you wanted to keep the job as is but just change the cluster that is running the job, you can run the following command:

ALTER JOB load_data_to_snowflake
    SET COMPUTE_CLUSTER = high_memory_cluster;

Note that some options such as RUN_INTERVAL cannot be altered once the connection has been created.

Drop a job writing to Snowflake

If you no longer need a certain job, you can easily drop it with the following SQL command:

DROP JOB load_data_to_snowflake;

Learn More

Last updated