Links

SELECT

Syntax

Jump to

LET

The LET clause allows you to create a calculated field that can be used within your query. You can create multiple fields by separating each expression with a comma (e.g. LET a = b + c, d = e - f).
The first LET clause can only be used for calculations on raw columns while the second LET clause can include calculations based on aggregated columns.
These clauses can be particularly useful for certain use cases when working with arrays. For examples, see: Working with arrays.
Note that since these fields do not exist within the source table, the column types must be provided if you select them as columns within your query.

WHERE

WHERE clause filters can be added to filter out records based on the raw columns.
Note that including $commit_time between run_start_time() [ - INTERVAL '<integer>' <time_unit> ] AND run_end_time() in the WHERE clause is mandatory. If omitted, the query fails to run.
The run_start_time() and run_end_time() here are based on the RUN_INTERVAL of your job set in the job options.
By default the run interval is 1 minute, meaning that the job executes every minute. This is reflected in the run_start_time() and run_end_time() values shifting by a minute for every execution of the job.
Note that subtracting a time interval is only supported for aggregated outputs; you can, however, still adjust the run_start_time() and run_end_time() by configuring the RUN_INTERVAL.
For example, if you would like the job to process data from the last hour each time that it executes, you can set RUN_INTERVAL = 1 HOUR. Just note that this also means that the job will only run once every hour.

Aggregated outputs

When aggregating data, you may want to include data that does not lie between the run time window. In such a case, you should subtract a time interval from run_start_time() to adjust the time window to include all the desired data within your aggregation.
Subtracting a time interval does not change how frequently the job runs. Thus, if you create a 30 minute window by filtering on $commit_time between run_start_time() - INTERVAL '29' MINUTE AND run_end_time(), the job still executes every minute by default unless the RUN_INTERVAL is adjusted.
If you actually just want the last 30 minutes of data aggregated every 30 minutes, then instead of subtracting a time interval, you can simply just set RUN_INTERVAL = 30 MINUTES.
Note that only INTERVAL DAY TO SECOND is supported when subtracting a time interval from the run time window.

GROUP BY

The GROUP BY clause specifies the column(s) used to group records together. It is typically used in conjunction with some aggregate function(s) within the selected columns.
Note that all non-aggregate columns selected must be present in the GROUP BY clause.
Additionally, all partition columns should appear within the GROUP BY clause as mapping an aggregated column to a partition column is not allowed.

HAVING

The HAVING clause can be used to filter out records based on aggregate columns.

LIMIT

The LIMIT clause can be used to limit the amount of records written to the target location.
This can be helpful when previewing the output of your job.
Last modified 7mo ago