Ensure that you have an Amazon S3 connection with the correct permissions to write to your target bucket.
Create a job writing into Amazon S3
CREATE SYNC JOB insert_into_s3 START_FROM = BEGINNINGFILE_FORMAT= (type= CSV)COMPRESSION= GZIP DATE_PATTERN ='yyyy-MM-dd-HH-mm' RUN_INTERVAL =1MINUTE COMPUTE_CLUSTER ="Default Compute"ASINSERT INTO S3 s3_output_connection LOCATION='s3://your-bucket-name/path/to/folder/'-- Use the SELECT statement to choose columns from the source and --implement your business logic transformations.SELECT column1 AS s3_column1, MD5(column2) AS s3_column2 -- hash or mask columns using built-in functionsFROM default_glue_catalog.your_schema.your_raw_data_tableWHERE time_filter();
This example only demonstrates an example of all job options available when writing to Amazon S3. Depending on your use case, you may want to configure a different set of options. For example, set the FILE_FORMAT to Parquet or the COMPRESSION to SNAPPY
Alter a job writing to Amazon S3
Certain job options are considered mutable, meaning that in some cases, you can run a SQL command to alter an existing transformation job rather than having to create a new one.
For example, take the job we created as an example earlier:
CREATE SYNC JOB insert_into_s3 START_FROM = BEGINNINGFILE_FORMAT= (type= CSV)COMPRESSION= GZIP DATE_PATTERN ='yyyy-MM-dd-HH-mm' RUN_INTERVAL =1MINUTE COMPUTE_CLUSTER ="Default Compute"ASINSERT INTO S3 s3_output_connection LOCATION='s3://your-bucket-name/path/to/folder/'-- Use the SELECT statement to choose columns from the source and -- implement your business logic transformations.SELECT column1 AS s3_column1, MD5(column2) AS s3_column2 -- hash or mask columns using built-in functionsFROM default_glue_catalog.your_schema.your_raw_data_tableWHERE time_filter();
If you wanted to keep the job as is but just change the cluster that is running the job, you can run the following command:
ALTER JOB insert_into_s3SET COMPUTE_CLUSTER = high_memory_cluster;
Note that some options such as RUN_INTERVAL or the FILE_FORMAT cannot be altered once the connection has been created.
Drop a job writing to Amazon S3
If you no longer need a certain job, you can easily drop it with the following SQL command:
DROP JOB insert_into_s3;
Learn More
For the full list of job options with syntax and detailed descriptions, see the transformation job options for Amazon S3.
See the INSERT SQL command reference for more details and examples.
Last updated
After you have fulfilled the prerequisites, you can create an job as follows: