Amazon S3
Job options
Jump to
Amazon S3 job options:
General job options:
AGGREGATION_PARALLELISM
— editable
AGGREGATION_PARALLELISM
— editableType: integer
Default: 1
(Optional) Only supported when the query contains aggregations. Formally known as "output sharding."
COMPRESSION
COMPRESSION
Values: { NONE | GZIP | SNAPPY | ZSTD }
Default: NONE
(Optional) The compression for the output files.
DATE_PATTERN
DATE_PATTERN
Type: text
Default: 'yyyy/MM/dd/HH/mm'
(Optional) Upsolver uses the date pattern to partition the output on the S3 bucket. Upsolver supports partitioning up to the minute, for example: 'yyyy/MM/dd/HH/mm'. For more options, see Java SimpleDateFormat
FILE_FORMAT
FILE_FORMAT
Values: { CSV | TSV | AVRO | PARQUET | JSON }
The file format for the output file. The following options can be configured for CSV and TSV formats:
CSV
CSV
DELIMITER
Type: text
Default: ,
(Optional) Configures the delimiter to separate the values in the output file. For binary targets, use DELIMITER = '\u0001'
TSV
TSV
HEADERLESS
Type: Boolean
Default: false
(Optional) When true
, the column names are used as the header row in the output file.
OUTPUT_OFFSET
OUTPUT_OFFSET
Value: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }
Default: 0
(Optional) By default, the file 2023/01/01/00/01 contains data for 2023-01-01 00:00 - 2023-01-01 00:00.59.999. Setting OUTPUT_OFFSET to 1 MINUTE
add to that so a value of the first minute will move the file name to 02, if you want to move it back you can use negative values.
Location Options
LOCATION
LOCATION
Type: text
The target location to write files to, as a full S3 URI. The location URI pattern can include macros referring to data columns, this allows custom partitioning of the data in the target location.
Supported macros:
Time: {time:<date-pattern>}
This macro will be replaced with the job execution time at runtime. The date pattern provided must be in Java's date formatting syntax. Only a single-time macro can be used in the location.
Column: {col:<column-name>}
This macro will be replaced with the value of the column provided. The column provided must appear in the select statement of the job.
Shard: {shard:format}
This macro will be replaced by the output shard number writing the current file. It is important to use this as part of your pattern if you are using RUN_PARALLELISM,
otherwise, each shard will overwrite the file.
The supported format is a subset of Java's string fromat syntax. The supported options are either:
1. %0xd - Will result in a shard number padded with x-1 leading 0's. For example, %05d will result in 00001 for shard number 1.
2. %d - Will simply use the shard number with no padding.
Usually, it's recommended to include padding to ensure alphabetical sorting of the output files.
If the location provided ends with a /
and contains no date pattern, a default date pattern is added to the end of the path
Example location URI:
s3://my-bucket/some/prefix/{time:yyyy-MM-dd-HH-mm}/{col:country}/output.json
Last updated