Output to Amazon S3
Prerequisites
Ensure that you have an Amazon S3 connection with the correct permissions to write to your target bucket.
Create a job writing into Amazon S3
After you have fulfilled the prerequisites, you can create an INSERT
job as follows:
CREATE SYNC JOB insert_into_s3
START_FROM = BEGINNING
FILE_FORMAT = (type = CSV)
COMPRESSION = GZIP
DATE_PATTERN = 'yyyy-MM-dd-HH-mm'
RUN_INTERVAL = 1 MINUTE
COMPUTE_CLUSTER = "Default Compute"
AS INSERT INTO S3 s3_output_connection
LOCATION = 's3://your-bucket-name/path/to/folder/'
-- Use the SELECT statement to choose columns from the source and
--implement your business logic transformations.
SELECT
column1 AS s3_column1,
MD5(column2) AS s3_column2 -- hash or mask columns using built-in functions
FROM default_glue_catalog.your_schema.your_raw_data_table
WHERE time_filter();
This example only demonstrates an example of all job options available when writing to Amazon S3. Depending on your use case, you may want to configure a different set of options. For example, set the FILE_FORMAT
to Parquet or the COMPRESSION
to SNAPPY
Alter a job writing to Amazon S3
Certain job options are considered mutable, meaning that in some cases, you can run a SQL command to alter an existing transformation job rather than having to create a new one.
For example, take the job we created as an example earlier:
CREATE SYNC JOB insert_into_s3
START_FROM = BEGINNING
FILE_FORMAT = (type = CSV)
COMPRESSION = GZIP
DATE_PATTERN = 'yyyy-MM-dd-HH-mm'
RUN_INTERVAL = 1 MINUTE
COMPUTE_CLUSTER = "Default Compute"
AS INSERT INTO S3 s3_output_connection
LOCATION = 's3://your-bucket-name/path/to/folder/'
-- Use the SELECT statement to choose columns from the source and
-- implement your business logic transformations.
SELECT
column1 AS s3_column1,
MD5(column2) AS s3_column2 -- hash or mask columns using built-in functions
FROM default_glue_catalog.your_schema.your_raw_data_table
WHERE time_filter();
If you wanted to keep the job as is but just change the cluster that is running the job, you can run the following command:
ALTER JOB insert_into_s3
SET COMPUTE_CLUSTER = high_memory_cluster;
Note that some options such as RUN_INTERVAL
or the FILE_FORMAT
cannot be altered once the connection has been created.
Drop a job writing to Amazon S3
If you no longer need a certain job, you can easily drop it with the following SQL command:
DROP JOB insert_into_s3;
Last updated