Comment on page
Join with MATERIALIZED VIEW
Materialized views can be joined with other SQLake tables to perform a lookup operation that returns the values corresponding to the given key column values.
SELECT ...
FROM <table> [ [AS] <table_alias> ]
LEFT [OUTER] JOIN [ARRAY] <mv_name> [ [AS] <alias> ]
ON { <alias> | <mv_name> }.<column> = { <table> | <table_alias> }.<column> [, ...]
[AS OF <time_expression>]
This keyword should be used when the materialized view has a key column that is an array.
The
ON
statement defines the columns on which the materialized view is joined with the main table.All key columns of your materialized view should be included in this section.
AS OF is relevant when the business case requires time synchronization between the data in the main table and the materialized view. It allows users to define the point in time at which the materialized view is joined with the main table. It lets you create a delay in pipeline processing or join with a state at a specific point in time of the materialized view.
Value:
AS OF time_expression
AS OF is optional. In case it isn't included, the main table will always join the latest available data in the materialized view. It is recommended to use AS OF with a SYNC JOB.
When to use AS OF:
You want to link a click with an ad impression. Business wise the pipeline needs to process the click after the impression is added to the materialized view. Although both streams are expected to arrive in the right order in most cases, you can ensure it by delaying the click processing by a few minutes. Use
AS OF $event_time + INTERVAL '5' MINUTE
. When selecting an interval value, keep in mind that it will cause a delay in your pipeline data freshness.You want to join the main table with the data that was available at the end of 2022 in the materialized view. Use
AS OF timestamp '2023-01-01 00:00:00'
CREATE SYNC JOB join_two_tables_orders_with_last_employee
START_FROM = BEGINNING
ADD_MISSING_COLUMNS = TRUE
RUN_INTERVAL = 1 MINUTE
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data MAP_COLUMNS_BY_NAME
SELECT
orderid,
mv.employeeid as employeeid,
mv.firstname as firstname,
mv.lastname as lastname
FROM default_glue_catalog.upsolver_samples.orders_raw_data
LEFT JOIN default_glue_catalog.upsolver_samples.physical_store_orders_materialized_view AS mv
AS OF $event_time + INTERVAL '5' MINUTE
ON mv.orderid = orderid
WHERE mv.source = 'Store'
AND time_filter();
Last modified 2mo ago