Amazon S3
Job options
Jump to
Amazon S3 job options:
General job options:
AGGREGATION_PARALLELISM
— editable
AGGREGATION_PARALLELISM
— editableType: integer
Default: 1
(Optional) Only supported when the query contains aggregations. Formally known as "output sharding."
COMPRESSION
COMPRESSION
Values: { NONE | GZIP | SNAPPY | ZSTD }
Default: NONE
(Optional) The compression for the output files.
DATE_PATTERN
DATE_PATTERN
Type: text
Default: 'yyyy/MM/dd/HH/mm'
FILE_FORMAT
FILE_FORMAT
Values: { CSV | TSV | AVRO | PARQUET | JSON }
The file format for the output file. The following options can be configured for CSV and TSV formats:
CSV
CSV
DELIMITER
Type: text
Default: ,
(Optional) Configures the delimiter to separate the values in the output file. For binary targets, use DELIMITER = '\u0001'
TSV
TSV
HEADERLESS
Type: Boolean
Default: false
(Optional) When true
, the column names are used as the header row in the output file.
OUTPUT_OFFSET
OUTPUT_OFFSET
Value: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }
Default: 0
(Optional) By default, the file 2023/01/01/00/01 contains data for 2023-01-01 00:00 - 2023-01-01 00:00.59.999. Setting OUTPUT_OFFSET to 1 MINUTE
add to that so a value of the first minute will move the file name to 02, if you want to move it back you can use negative values.
Location Options
LOCATION
LOCATION
Type: text
The target location to write files to, as a full S3 URI. The location URI pattern can include macros referring to data columns, this allows custom partitioning of the data in the target location.
Column: {col:<column-name>}
This macro will be replaced with the value of the column provided. The column provided must appear in the select statement of the job.
Example location URI:
s3://my-bucket/some/prefix/{time:yyyy-MM-dd-HH-mm}/{col:country}/output.json
Last updated