Upsolver
Search…
APPROX_COUNT_DISTINCT_EACH
The approximate count of distinct values per group in the time window.

Syntax

APPROX_COUNT_DISTINCT_EACH([MAX VALUES,] GROUP, VALUE) ‌

Arguments

MAX VALUES: The maximum number of groups. Groups beyond this amount will be discarded. In some cases, discarding groups via MAX VALUES may cause some data to be discarded from groups that would otherwise have been kept. MAX VALUES is optional. If it is omitted there is, in effect, no limit. GROUP: The field in which to group the results. VALUE: The field to count distinct values relative to each group.

Returns

A number

Notes

NULLs are not counted.

‌ Example

Data

1
[
2
{
3
"GROUP_ID":"G1",
4
"USER_ID":"U1",
5
"CONNECTION_TIME":"2020-06-26 02:31:29,573"
6
},
7
{
8
"GROUP_ID":"G1",
9
"USER_ID":"U2",
10
"CONNECTION_TIME":"2020-06-26 18:11:45,783"
11
},
12
{
13
"GROUP_ID":"G2",
14
"USER_ID":"Z1",
15
"CONNECTION_TIME":"2020-06-26 23:54:27,687"
16
},
17
{
18
"GROUP_ID":"G2",
19
"CONNECTION_TIME":"2020-07-26 23:54:27,687"
20
},
21
{
22
"GROUP_ID":"G1",
23
"USER_ID":"U2",
24
"CONNECTION_TIME":"2021-07-01 02:31:29,573"
25
},
26
{
27
"GROUP_ID":"G1",
28
"USER_ID":"U1",
29
"CONNECTION_TIME":"2021-07-01 18:11:45,783"
30
},
31
{
32
"GROUP_ID":"G2",
33
"USER_ID":"Z1",
34
"CONNECTION_TIME":"2021-07-01 23:54:27,687"
35
}
36
]
Copied!

Query

Count the number of distinct USER_IDs for each unique GROUP_ID:
1
SET
2
DATE_TIME = TO_DATE(data.CONNECTION_TIME);
3
SELECT
4
APPROX_COUNT_DISTINCT_EACH(data.GROUP_ID, data.USER_ID)
5
AS approx_count_distinct_each_data_group_id__data_user_id
6
FROM
7
"SAMPLE_DATA_G1U1 - json"
Copied!

Results:

1
{
2
"approx_count_distinct_each_data_group_id__data_user_id":[
3
{
4
"key":"G1",
5
"value":2
6
},
7
{
8
"key":"G2",
9
"value":1
10
}
11
]
12
}
Copied!

Dialog