CREATE MATERIALIZED VIEW

This command creates a materialized view that contains the defined key column as well as any defined aggregated columns.

All materialized views must be based on an aggregated query.

Syntax

CREATE [SYNC] MATERIALIZED VIEW <mv_identifier> AS

    SELECT <column> [ [AS] <key_column_name> ] 
           [, <AGGREGATED_FUNCTION>(<column_name>) [AS] <aggregated_column_name>] 
           [,...]
        FROM <source> [ [AS] alias ]
        [ LET identifier = expression [, ...] ] 
        [ WHERE [ { $commit_time | $event_time } >= timestamp '...'] ]
            [ AND <where_filters> ]
        [ GROUP BY { <field_name> | <ordinal_position> } [, ...] ]
        [ WINDOW <integer> <time_unit> ]

[ STORAGE_CONNECTION = <connection_identifier>
  STORAGE_LOCATION = '<storage_location>' ] 
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ QUERY_CLUSTER = <query_cluster_identifier> ]
[ MAX_TIME_TRAVEL_DURATION = integer days ]

Jump to

Materialized view options

STORAGE_CONNECTION

Type: identifier

Default: The storage connection of the first table in the FROM statement

(Optional) The storage connection associated with the STORAGE_LOCATION for the table's underlying files.

Only a storage type connection can be used here (e.g. S3, Blob storage, GCS, Oracle object storage), and it should match the catalog's metastore. For example, if Glue is used as the metastore, only S3 is allowed as a storage connection.

When set, STORAGE_LOCATION must be configured as well to provide a path to store the data.

STORAGE_LOCATION

Type: text

Default: The storage location of the first table in the FROM statement

(Optional) The storage location for the materialized view's underlying files. It should be provided in the format s3://bucket_name/path_to_data. This option is required when STORAGE_CONNECTION is set.

When set, STORAGE_CONNECTION must be configured as well to provide a connection with access to write to the specified storage location.

COMPUTE_CLUSTER — editable

Type: identifier

Default: The compute cluster of the first source table within the SELECT statement

(Optional) The compute cluster that processes the materialized view.

QUERY_CLUSTER — editable

Type: identifier

(Optional) The query cluster that processes the materialized view.

MAX_TIME_TRAVEL_DURATION — editable

Type: integer

Default: infinite

(Optional) The number of days that the state information maintained by the materialized view should be retained. By default, the state is maintained indefinitely, enabling you to travel to any point in time from the creation of the materialized view.

WINDOW

Type: integer { MINUTE[S] | HOUR[S] | DAY[S] }

(Optional) The time span over which the aggregation should be applied, e.g. 1 HOUR.

Examples

CREATE SYNC MATERIALIZED VIEW default_glue_catalog.upsolver_samples.physical_store_orders_materialized_view AS 
    SELECT orderid, 
       LAST(saleinfo.source) as source,
       LAST(saleinfo.store.location.country) as country,
       LAST(saleinfo.store.location.name) as name,
       LAST(saleinfo.store.servicedby.employeeid) as employeeid,
       LAST(saleinfo.store.servicedby.firstname) as firstname,
       LAST(saleinfo.store.servicedby.lastname) as lastname
    FROM default_glue_catalog.upsolver_samples.sales_info_raw_data
    GROUP BY orderid
    WINDOW = 1 HOUR;

To learn how to use your materialized view, see: Join with MATERIALIZED VIEW

Last updated