SELECT
The
LET
clause allows you to create a calculated field that can be used within your query. You can create multiple fields by separating each expression with a comma (e.g. LET a = b + c, d = e - f
).The first
LET
clause can only be used for calculations on raw columns while the second LET
clause can include calculations based on aggregated columns.These clauses can be particularly useful for certain use cases when working with arrays. For examples, see: Working with arrays.
Note that since these fields do not exist within the source table, the column types must be provided if you select them as columns within your query.
WHERE
clause filters can be added to filter out records based on the raw columns.Note that including
$commit_time between run_start_time() [ - INTERVAL '<integer>' <time_unit> ] AND run_end_time()
in the WHERE
clause is mandatory. If omitted, the query fails to run.The
run_start_time()
and run_end_time()
here are based on the RUN_INTERVAL
of your job set in the job options. By default the run interval is 1 minute, meaning that the job executes every minute. This is reflected in the
run_start_time()
and run_end_time()
values shifting by a minute for every execution of the job.Note that subtracting a time interval is only supported for aggregated outputs; you can, however, still adjust the
run_start_time()
and run_end_time()
by configuring the RUN_INTERVAL
.For example, if you would like the job to process data from the last hour each time that it executes, you can set
RUN_INTERVAL = 1 HOUR
. Just note that this also means that the job will only run once every hour.When aggregating data, you may want to include data that does not lie between the run time window. In such a case, you should subtract a time interval from
run_start_time()
to adjust the time window to include all the desired data within your aggregation.Subtracting a time interval does not change how frequently the job runs. Thus, if you create a 30 minute window by filtering on
$commit_time between run_start_time() - INTERVAL '29' MINUTE AND run_end_time()
, the job still executes every minute by default unless the RUN_INTERVAL
is adjusted.If you actually just want the last 30 minutes of data aggregated every 30 minutes, then instead of subtracting a time interval, you can simply just set
RUN_INTERVAL = 30 MINUTES
.Note that only
INTERVAL DAY TO SECOND
is supported when subtracting a time interval from the run time window.The
GROUP BY
clause specifies the column(s) used to group records together. It is typically used in conjunction with some aggregate function(s) within the selected columns.Note that all non-aggregate columns selected must be present in the
GROUP BY
clause.Additionally, all partition columns should appear within the
GROUP BY
clause as mapping an aggregated column to a partition column is not allowed.The
HAVING
clause can be used to filter out records based on aggregate columns.The
LIMIT
clause can be used to limit the amount of records written to the target location.This can be helpful when previewing the output of your job.
Last modified 7mo ago