LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Quickstarts
Quickstarts
  • Quickstarts
  • DATA INGESTION WIZARD
    • Using the Wizard
      • Source Set-up
        • Amazon Kinesis
        • Amazon S3
        • Apache Kafka
        • Confluent Cloud
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • PostgreSQL
      • Target Set-up
        • Amazon Redshift
        • AWS Glue Data Catalog
        • ClickHouse
        • Polaris Catalog
        • Snowflake
      • Job Configuration
        • Job Configuration
        • Job Configuration for CDC
      • Review and Run Job
  • CONNECTORS
    • Connectors
      • Amazon Kinesis
      • Amazon Redshift
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • ClickHouse
      • Confluent Cloud
      • Elasticsearch
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • Polaris Catalog
      • PostgreSQL
      • Snowflake
  • JOBS
    • Ingestion
      • Job Basics
        • Ingest to a Staging Table
        • Output to a Target Table
      • Stream and File Sources
        • Amazon Kinesis
        • Amazon S3
        • Apache Kafka
        • Confluent Kafka
      • CDC Sources
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • PostgreSQL
    • Transformation
      • Updating Data
        • Upsert Data to a Target Table
        • Delete Data from a Target Table
        • Aggregate and Output Data
        • Join Two Data Streams
      • Data Targets
        • Output to Amazon Athena
        • Output to Amazon Redshift
        • Output to Amazon S3
        • Output to Elasticsearch
        • Output to Snowflake
  • APACHE ICEBERG
    • Optimize Your Iceberg Tables
    • Install the Iceberg Table Analyzer
Powered by GitBook
On this page
  • Prerequisites
  • Create a job writing into Amazon S3
  • Alter a job writing to Amazon S3
  • Drop a job writing to Amazon S3
  1. JOBS
  2. Transformation
  3. Data Targets

Output to Amazon S3

Last updated 1 year ago

Prerequisites

Ensure that you have an connection with the correct permissions to write to your target bucket.

Create a job writing into Amazon S3

After you have fulfilled the prerequisites, you can create an job as follows:

CREATE SYNC JOB insert_into_s3
    START_FROM = BEGINNING
    FILE_FORMAT = (type = CSV)
    COMPRESSION = GZIP
    DATE_PATTERN = 'yyyy-MM-dd-HH-mm'
    RUN_INTERVAL = 1 MINUTE
    COMPUTE_CLUSTER = "Default Compute"
AS INSERT INTO S3 s3_output_connection 
    LOCATION = 's3://your-bucket-name/path/to/folder/'
    -- Use the SELECT statement to choose columns from the source and 
    --implement your business logic transformations.
    SELECT 
      column1 AS s3_column1, 
      MD5(column2) AS s3_column2 -- hash or mask columns using built-in functions
    FROM default_glue_catalog.your_schema.your_raw_data_table
    WHERE time_filter();

This example only demonstrates an example of all job options available when writing to Amazon S3. Depending on your use case, you may want to configure a different set of options. For example, set the FILE_FORMAT to Parquet or the COMPRESSION to SNAPPY

Alter a job writing to Amazon S3

Certain job options are considered mutable, meaning that in some cases, you can run a SQL command to alter an existing transformation job rather than having to create a new one.

For example, take the job we created as an example earlier:

CREATE SYNC JOB insert_into_s3
    START_FROM = BEGINNING
    FILE_FORMAT = (type = CSV)
    COMPRESSION = GZIP
    DATE_PATTERN = 'yyyy-MM-dd-HH-mm'
    RUN_INTERVAL = 1 MINUTE
    COMPUTE_CLUSTER = "Default Compute"
AS INSERT INTO S3 s3_output_connection 
    LOCATION = 's3://your-bucket-name/path/to/folder/'
    -- Use the SELECT statement to choose columns from the source and 
    -- implement your business logic transformations.
    SELECT 
      column1 AS s3_column1, 
      MD5(column2) AS s3_column2 -- hash or mask columns using built-in functions
    FROM default_glue_catalog.your_schema.your_raw_data_table
    WHERE time_filter();

If you wanted to keep the job as is but just change the cluster that is running the job, you can run the following command:

ALTER JOB insert_into_s3
    SET COMPUTE_CLUSTER = high_memory_cluster;

Note that some options such as RUN_INTERVAL or the FILE_FORMAT cannot be altered once the connection has been created.

Drop a job writing to Amazon S3

If you no longer need a certain job, you can easily drop it with the following SQL command:

DROP JOB insert_into_s3;

Learn More

For the full list of job options with syntax and detailed descriptions, see the transformation job options for .

See the SQL command reference for more details and examples.

Amazon S3
INSERT
Amazon S3
INSERT