Join with MATERIALIZED VIEW

Materialized views can be joined with other SQLake tables to perform a lookup operation that returns the values corresponding to the given key column values.

Syntax

SELECT ...
    FROM <table> [ [AS] <table_alias> ]
    LEFT [OUTER] JOIN [ARRAY] <mv_name> [ [AS] <alias> ]
    ON { <alias> | <mv_name> }.<column> = { <table> | <table_alias> }.<column> [, ...]
    [AS OF <time_expression>]

Jump to

ARRAY keyword

This keyword should be used when the materialized view has a key column that is an array.

Note that this uses the array itself as the key. To use the items within the arrays as the key column, you should first flatten your data by writing to a new table using UNNEST with an INSERT job.

ON statement

The ON statement defines the columns on which the materialized view is joined with the main table.

All key columns of your materialized view should be included in this section.

AS OF

AS OF is relevant when the business case requires time synchronization between the data in the main table and the materialized view. It allows users to define the point in time at which the materialized view is joined with the main table. It lets you create a delay in pipeline processing or join with a state at a specific point in time of the materialized view.

Value: AS OF time_expression

AS OF is optional. In case it isn't included, the main table will always join the latest available data in the materialized view. It is recommended to use AS OF with a SYNC JOB.

When to use AS OF:

You want to link a click with an ad impression. Business wise the pipeline needs to process the click after the impression is added to the materialized view. Although both streams are expected to arrive in the right order in most cases, you can ensure it by delaying the click processing by a few minutes. Use AS OF $event_time + INTERVAL '5' MINUTE. When selecting an interval value, keep in mind that it will cause a delay in your pipeline data freshness.

You want to join the main table with the data that was available at the end of 2022 in the materialized view. Use AS OF timestamp '2023-01-01 00:00:00'

Join examples

CREATE SYNC JOB join_two_tables_orders_with_last_employee
    START_FROM = BEGINNING
    ADD_MISSING_COLUMNS = TRUE
    RUN_INTERVAL = 1 MINUTE
    AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data MAP_COLUMNS_BY_NAME
    SELECT 
       orderid, 
       mv.employeeid as employeeid, 
       mv.firstname as firstname, 
       mv.lastname as lastname
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    LEFT JOIN default_glue_catalog.upsolver_samples.physical_store_orders_materialized_view AS mv
      AS OF $event_time + INTERVAL '5' MINUTE
      ON mv.orderid = orderid
    WHERE mv.source = 'Store'
    AND time_filter();

Last updated