APPROX_COUNT_DISTINCT

Approximates the number of distinct non-null input values.

This function is used as an approximation of COUNT(DISTINCT ...) in order to improve performance. It should be used when there are relatively few (under one million) rows, and where the total number of distinct values is high.

Syntax

APPROX_COUNT_DISTINCT(X)

Arguments

`X`

Type: any type

The values to be counted.

Returns

Type: bigint

Returns the approximate number of distinct non-null input values.

If all input values are null, zero is returned.

Examples

Transformation job example

SQL

CREATE JOB function_operator_example
    ADD_MISSING_COLUMNS = TRUE
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data
    MAP_COLUMNS_BY_NAME
    SELECT 
        LOWER(ordertype) AS ordertype,
        ROUND(AVG(nettotal), 2) AS avgtotal
    FROM default_glue_catalog.upsolver_samples.orders_raw_data 
    WHERE $commit_time BETWEEN run_start_time() - PARSE_DURATION('1d') 
        AND run_end_time()
    GROUP BY 1
    LIMIT 2;

Query result

ordertype

avgtotal

pickup

1062.14

shipping

1061.91

Last updated 1 year ago