LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Reference
Reference
  • Reference
  • ✨Learning Paths
    • Amazon Kinesis
    • Amazon S3
    • Apache Kafka
    • Confluent Cloud
    • Microsoft SQL Server
    • MongoDB
    • MySQL
    • PostgreSQL
  • SQL COMMANDS
    • Clusters
      • CREATE CLUSTER
      • ALTER CLUSTER
      • ROLL CLUSTER
      • STOP/START CLUSTER
      • DROP CLUSTER
    • Connections
      • CREATE CONNECTION
        • Amazon Kinesis
        • Amazon Redshift
        • Amazon S3
        • Apache Kafka
        • AWS Glue Data Catalog
        • ClickHouse
        • Confluent Cloud
        • Elasticsearch
        • Hive Metastore
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • Polaris Catalog
        • Iceberg REST catalogs
        • PostgreSQL
        • Snowflake
      • ALTER CONNECTION
      • DROP CONNECTION
      • CDC Connections with SSL
    • Jobs
      • CREATE JOB
        • Ingestion
          • Amazon Kinesis
          • Amazon S3
          • Apache Kafka
          • Confluent Kafka
          • Content Types
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Replication
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Transformation
          • INSERT
            • MAP_COLUMNS_BY_NAME
            • UNNEST
          • MERGE
          • SELECT
          • Job Options
            • Amazon Redshift
            • Amazon S3
            • Apache Iceberg
            • ClickHouse
            • Data Lake Tables
            • Elasticsearch
            • PostgreSQL
            • Snowflake
        • Monitoring
          • Amazon CloudWatch
          • Datadog
          • Dynatrace
      • ALTER JOB
      • PAUSE / RESUME JOB
      • DROP JOB
    • Materialized Views
      • CREATE MATERIALIZED VIEW
      • ALTER MATERIALIZED VIEW
      • DROP MATERIALIZED VIEW
      • Join with a Materialized View
    • Tables
      • CREATE TABLE
      • ALTER TABLE
      • DROP TABLE
    • Iceberg Tables
      • Upsolver Managed Tables
        • CREATE ICEBERG TABLE
        • ALTER ICEBERG TABLE
        • DROP ICEBERG TABLE
        • OPTIMIZE ICEBERG TABLE
      • External Iceberg Tables
        • CREATE EXTERNAL ICEBERG TABLE
        • DROP EXTERNAL ICEBERG TABLE
      • Mirror Iceberg Tables
        • CREATE MIRROR
        • ALTER MIRROR
        • PAUSE / RESUME MIRROR
        • DROP MIRROR
  • FUNCTIONS & OPERATORS
    • Data Types
    • Functions
      • Aggregate
        • APPROX_COUNT_DISTINCT
        • APPROX_COUNT_DISTINCT_EACH
        • AVG
        • AVG_EACH
        • AVG_TIME_SERIES
        • COLLECT_SET
        • COLLECT_SET_EACH
        • COUNT
        • COUNT(*)
        • COUNT(DISTINCT ...)
        • COUNT_EACH
        • COUNT_IF
        • DYNAMIC_SESSIONS
        • FIRST
        • FIRST_ARRAY
        • FIRST_EACH
        • FIRST_TIME_SERIES
        • LAST
        • LAST_ARRAY
        • LAST_EACH
        • LAST_K
        • LAST_K_EACH
        • LAST_TIME_SERIES
        • MAX
        • MAX_BY
        • MAX_EACH
        • MAX_TIME_SERIES
        • MIN
        • MIN_BY
        • MIN_EACH
        • MIN_TIME_SERIES
        • SESSION_COUNT
        • STD_DEV
        • STD_DEV_EACH
        • STRING_MAX
        • STRING_MAX_EACH
        • STRING_MIN_EACH
        • SUM
        • SUM_EACH
        • SUM_TIME_SERIES
        • WEIGHTED_AVERAGE
      • Array
        • ARRAY_DISTINCT
        • ARRAY_JOIN
        • ARRAY_MAX
        • ARRAY_MIN
        • ARRAY_SORT
        • ARRAY_SORT_DESC
        • ARRAY_SUM
        • COUNT_VALUES_IF
        • COUNT_VALUES
        • ELEMENT_AT
        • FIRST_ELEMENT
        • LAST_ELEMENT
        • VALUE_INDEX_IN_ARRAY
        • VALUE_INDEX_IN_ROW
      • Comparison
        • GREATEST
        • LEAST
      • Conditional
        • COALESCE
        • IF_ELSE
        • NULL_IF
      • Date & Time
        • ADD_TIME_ZONE_OFFSET
        • DATE
        • DATE_ADD
        • DATE_DIFF
        • DATE_TRUNC
        • DAY
        • DAY_OF_WEEK
        • DAY_OF_YEAR
        • EXTRACT_TIMESTAMP
        • EXTRACT
        • FORMAT_DATETIME
        • FROM_ISO8601_DATE
        • FROM_UNIXTIME
        • HOUR
        • MILLISECOND
        • MINUTE
        • MONTH
        • QUARTER
        • RUN_END_TIME
        • RUN_START_TIME
        • SECOND
        • SUBTRACT_TIME_ZONE_OFFSET
        • TO_UNIX_EPOCH_MILLIS
        • TO_UNIX_EPOCH_SECONDS
        • TO_UNIXTIME
        • WEEK
        • YEAR_OF_WEEK
        • YEAR
      • Filter
        • IS_DUPLICATE
        • NOT
      • Interval
        • PARSE_DURATION
      • Mathematical
        • ABS
        • CBRT
        • CEIL
        • CEILING
        • DEGREES
        • EXP
        • FLOOR
        • GET_SHARD_NUMBER
        • LN
        • LOG
        • LOG2
        • LOG10
        • MOD
        • MODULO
        • POW
        • POWER
        • RADIANS
        • RAND
        • RANDOM
        • RECIPROCAL
        • ROUND
        • SIGN
        • SORT_VALUES
        • SQRT
        • TRUNCATE
      • Regular Expressions
        • REGEXP_EXTRACT
        • REGEXP_EXTRACT_ALL
        • REGEXP_LIKE
        • REGEX_MATCH_POSITION
        • REGEX_NAMED_GROUPS
        • REGEXP_REPLACE
      • Spatial
        • ST_DISTANCE
        • ST_WGS84_DISTANCE
        • WKT_SPATIAL_CONTAINS
        • WKT_SPATIAL_INTERSECT
      • String
        • BASE64_DECODE
        • BASE64_TO_HEX
        • BYTES_SUBSTRING
        • CONCAT
        • DATE
        • JOIN_ARRAYS
        • LENGTH
        • LOWER
        • LPAD
        • LTRIM
        • MD5
        • PARSE_DATETIME
        • REPLACE
        • REVERSE
        • RPAD
        • RTRIM
        • SHA1
        • SHA3_512
        • SHA256
        • SHA512
        • SORT_VALUES
        • SPLIT
        • SPLIT_TO_RECORD
        • STRING_FORMAT
        • STRIP_MARGIN
        • STRIP_PREFIX
        • STRIP_SUFFIX
        • STRPOS
        • SUBSTR
        • SUBSTRING
        • TRANSLATE
        • TRIM_CHARS
        • TRIM
        • UPPER
        • UUID_GENERATOR
        • XX_HASH
      • Structural
        • FROM_KEY_VALUE
        • GET_RANGE
        • JOIN_ALL_BY_KEY
        • JSON_PATH
        • JSON_TO_RECORD
        • MAP_WITH_INDEX
        • QUERY_STRING_TO_RECORD
        • RECORD_TO_JSON
        • SORT_BY
        • TO_ARRAY
        • ZIP_WITH_INDEX
        • ZIP
      • Trigonometric
        • COS
        • SIN
        • TAN
        • TANH
      • Type Conversion
        • CAST
        • CHR
        • DECIMAL_TO_HEX
        • HEX_TO_DECIMAL
        • TO_BIGINT
        • TO_DOUBLE
        • TO_STRING
      • URL
        • TOP_PRIVATE_DOMAIN
        • URL_DECODE
        • URL_ENCODE
        • URL_PARSER
    • Operators
      • Comparison
      • Conditional
        • CASE
      • Logical
      • Mathematical
      • String
  • MONITORING
    • Clusters
    • Datasets
      • Ingested Data
        • Column
      • Lineage
      • Data Violations
      • Statistics
      • Maintenance
        • Compactions
        • Expire Snapshots
        • Orphan Files
      • Columns
      • Partitions
      • Properties
    • Job Status
      • Stream & File Sources
        • Monitoring
        • Graphs
        • Lineage
        • Settings
      • CDC Sources
        • Monitoring
        • Replication Settings
        • Job Settings
    • System Catalog
      • Information Schema
        • Clusters
        • Columns
        • Connections
        • Jobs
        • Mirrors
        • Tables
        • Users
        • Views
      • Insights
        • job_output_column_stats
        • dataset_column_stats
      • Monitoring
        • CDC Status
        • Clusters
        • Expectations
        • Jobs
        • Partition Statistics
        • Recent Compactions
        • Running Queries
        • Table Statistics
      • Task Executions Table
  • GENERAL
    • Common SQL Syntax
    • View Entity Syntax
    • Keyboard Shortcuts
Powered by GitBook
On this page
  • Job options
  • Location Options
  1. SQL COMMANDS
  2. Jobs
  3. CREATE JOB
  4. Transformation
  5. Job Options

Amazon S3

Job options

[ AGGREGATION_PARALLELISM = <integer> ]
[ COMMENT = '<comment>' ]
[ COMPRESSION = { NONE 
                | GZIP 
                | SNAPPY 
                | ZSTD } ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ DATE_PATTERN = '<date_pattern>' ]
[ END_AT = { NOW | timestamp } ]
[ FILE_FORMAT = { CSV 
                | JSON 
                | PARQUET
                | AVRO 
                | TSV } ]
[ OUTPUT_OFFSET = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ RUN_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ RUN_PARALLELISM = <integer> ]
[ START_FROM = { NOW | BEGINNING | timestamp } ]

Jump to

Amazon S3 job options:

General job options:

AGGREGATION_PARALLELISM — editable

Type: integer

Default: 1

(Optional) Only supported when the query contains aggregations. Formally known as "output sharding."

COMPRESSION

Values: { NONE | GZIP | SNAPPY | ZSTD }

Default: NONE

(Optional) The compression for the output files.

DATE_PATTERN

Type: text

Default: 'yyyy/MM/dd/HH/mm'

FILE_FORMAT

Values: { CSV | TSV | AVRO | PARQUET | JSON }

The file format for the output file. The following options can be configured for CSV and TSV formats:

CSV

FILE_FORMAT = (
    TYPE = CSV
    [ DELIMITER = '<delimiter>' ]
  )

DELIMITER

Type: text

Default: ,

(Optional) Configures the delimiter to separate the values in the output file. For binary targets, use DELIMITER = '\u0001'

TSV

FILE_FORMAT = (
    TYPE = TSV
    [ HEADERLESS = { TRUE | FALSE } ]
  )

HEADERLESS

Type: Boolean

Default: false

(Optional) When true, the column names are used as the header row in the output file.

OUTPUT_OFFSET

Value: <integer> { MINUTE[S] | HOUR[S] | DAY[S] }

Default: 0

(Optional) By default, the file 2023/01/01/00/01 contains data for 2023-01-01 00:00 - 2023-01-01 00:00.59.999. Setting OUTPUT_OFFSET to 1 MINUTE add to that so a value of the first minute will move the file name to 02, if you want to move it back you can use negative values.

Location Options

LOCATION

Type: text

The target location to write files to, as a full S3 URI. The location URI pattern can include macros referring to data columns, this allows custom partitioning of the data in the target location.

Column: {col:<column-name>} This macro will be replaced with the value of the column provided. The column provided must appear in the select statement of the job.

If the location provided ends with a / and contains no date pattern, a default date pattern is added to the end of the path

Example location URI: s3://my-bucket/some/prefix/{time:yyyy-MM-dd-HH-mm}/{col:country}/output.json

Last updated 11 months ago

(Optional) Upsolver uses the date pattern to partition the output on the S3 bucket. Upsolver supports partitioning up to the minute, for example: 'yyyy/MM/dd/HH/mm'. For more options, see

Supported macros: Time: {time:<date-pattern>} This macro will be replaced with the job execution time at runtime. The date pattern provided must be in . Only a single-time macro can be used in the location.

Shard: {shard:format} This macro will be replaced by the output shard number writing the current file. It is important to use this as part of your pattern if you are using RUN_PARALLELISM, otherwise, each shard will overwrite the file. The supported format is a subset of . The supported options are either: 1. %0xd - Will result in a shard number padded with x-1 leading 0's. For example, %05d will result in 00001 for shard number 1. 2. %d - Will simply use the shard number with no padding. Usually, it's recommended to include padding to ensure alphabetical sorting of the output files.

Java SimpleDateFormat
Java's date formatting syntax
Java's string fromat syntax
COMPRESSION
DATE_PATTERN
FILE_FORMAT
OUTPUT_OFFSET
AGGREGATION_PARALLELISM
COMMENT
COMPUTE_CLUSTER
END_AT
RUN_INTERVAL
RUN_PARALLELISM
START_FROM