SELECT
Syntax
WHERE TIME_FILTER()
- Default behavior depending on the context.WHERE TIME_FILTER($event_time, interval '5' minute)
- Specifies a 5-minute window based on$event_time
.WHERE TIME_FILTER($event_time, interval '1' minute, timestamp '2023-04-04 12:20:00')
- Queries rows ingested between 12:19:00 and 12:20:00 based on$event_time
.
In most situations, the default values should suffice, resulting in job definitions with a simple WHERE
condition like this:
Current Limitations
The
TIME_FILTER
function cannot be used outside the WHERE clause.It must be used as a standalone expression and cannot be used as input in other functions or operators.
Note: Prior to the introduction of the time_filter()
function, a job's WHERE
clause was required to use the following clause:
$commit_time between run_start_time() [ - INTERVAL '<integer>' <time_unit> ] AND run_end_time()
The time_filter()
function should be used in new jobs instead. However, you might see the old syntax in templates or older jobs in your account.
Aggregated outputs
When aggregating data using Upsolver, the windowSize
parameter in the TIME_FILTER
function is significant, determining the range of source data that will be included in each aggregation.
By default aggregated outputs use the RUN_INTERVAL
as the windowSize
. This means that each execution of the job will aggregate just the data since the last execution. You may want to aggregate larger ranges of data but output them more frequently.
Example
Let's say we want a job to aggregate data from the last 1 hour, but we want it to output that aggregation result every 5 minutes. To achieve this we can set a RUN_INTERVAL
of 5 minutes in our job options and a windowSize
of 60 minutes in our time_filter()
:
GROUP BY
GROUP BY
The GROUP BY
clause specifies the column(s) used to group records together. It is typically used in conjunction with some aggregate function(s) within the selected columns.
Note that all non-aggregate columns selected must be present in the GROUP BY
clause.
Additionally, all partition columns should appear within the GROUP BY
clause as mapping an aggregated column to a partition column is not allowed.
HAVING
HAVING
The HAVING
clause can be used to filter out records based on aggregate columns.
LIMIT
LIMIT
The LIMIT
clause can be used to limit the amount of records written to the target location.
This can be helpful when previewing the output of your job.