CREATE EXTERNAL ICEBERG TABLE
Syntax
Jump to
Table identifier
Table identifiers are provided in the following format:
Note that only metastore connection types are accepted for the catalog name.
Valid table names match the following identifier format:
Table options
Jump to
COMMENT
— editable
COMMENT
— editableType: text
(Optional) A description or comment regarding this table.
OPTIMIZE
— editable
OPTIMIZE
— editableType: Boolean
Default: false
(Optional) When false
, the tables will be not optimized by Upsolver. When true
, files are compacted to reduce storage costs and increase query scan performance.
RETENTION_DATE_PARTITION
RETENTION_DATE_PARTITION
Type: identifier
Default: The only partition column of type date
(Optional) This configures the partition column to be used to determine whether the retention period has passed for a given record.
This option is required if you have more than one date partition column.
TABLE_DATA_RETENTION
— editable
TABLE_DATA_RETENTION
— editableValue: <integer> DAYS
(Optional) When set, data in partitions that have passed the retention period are deleted from the table. Number of days can range between 1 and 9999.
This option is not a deterministic mechanism that deletes data when it immediately surpasses the defined threshold. This mechanism is closer to the lifecycle policies on common blob storage services, such as Amazon S3, and is designed to save storage costs, not to delete data based on a specific time. Therefore when data passes the retention period, it will be deleted at some point in the future, and can no longer be relied on to exist, though Upsolver aims to delete it within a reasonable timeframe.
You should be aware that transformation job that reads from a table with a defined data retention may or may not read data that has surpassed the retention threshold.
For example, if the current time is 2023-02-23 12:30:00 UTC, and you have defined TABLE_DATA_RETENTION = 2 days
, you can expect data written during 2023-02-23, 2023-02-22, and 2023-02-21 to exist in the table. The retention threshold truncates data to the nearest day, so when the time changes to 2023-02-24 00:00:00 UTC, you can no longer expect data from 2023-02-21 to be present in the table, although it might be there for a while.
Note that you need at least one date partition column for this option to work.
Examples
Basic table
The example below creates an external Iceberg table that Upsolver manages for you:
Table with data retention
The following example creates an external Iceberg table with data retention. First, we create a table named orders in our Glue catalog. The PARTITIONED BY
option specifies the partition column as order_date.
Next, we create an external Iceberg column in Upsolver, with the data retention based on the order_date column. The data will be retained for 2 days:
Last updated