Aggregation Functions

This page covers the aggregation functions available in Upsolver.

Aggregations are functions for grouping multiple events together to form a more significant result.

Unlike databases, Upsolver runs continuous queries rather than ad-hoc queries. With Upsolver aggregation results are updated incrementally with every incoming event.

Aggregation functions require windowing to split a stream into buckets of data that can be aggregated.

APPROX_COUNT_DISTINCT

The approximate number of distinct values in the time window.

Use this function instead of COUNT_DISTINCT to improve performance, but only when there are not many (under 1M) rows in the result.

APPROX_COUNT_DISTINCT_EACH

The approximate count of distinct values per group in the time window.

AVG

The average value in the time window.

AVG_EACH

The average value in the time window grouped by the given key.

AVG_TIME_SERIES

The average of the values per time interval.

COLLECT_SET

A set of all values encountered in the time interval.

COLLECT_SET_EACH

A set of all values encountered in the time interval grouped by the given key.

COUNT

The number of values in the time window.

Example:

For the following stream of events:

{"id": "1", "data": 2}
{"id": "1", "data": 3}
{"id": "2", "data": 5}
{"id": "3", "data": 8}

Using this aggregation with primary key id and COUNT(data) produces the following data:

Primary Key id

COUNT(data)

1

2

2

1

3

1

COUNT(*)

The number of items in the time window.

COUNT_DISTINCT

Counts the number of distinct values that appeared in the column per key value.

Example:

For the following stream of events:

{"id": "1", "data": "a"}
{"id": "1", "data": "b"}
{"id": "2", "data": "c"}
{"id": "2", "data": "c"}
{"id": "3", "data": "c"}

Using this aggregation with primary key id and COUNT_DISTINCT(data) produces the following data:

Primary Key id

COUNT_DISTINCT(data)

1

2

2

1

3

1

COUNT_EACH

The number of items grouped by the given key.

COUNT_IF

The number of true values in the time window.

Syntax

COUNT_IF (expr)

Arguments

  • expr: a BOOLEAN expression that can be either a calculated

    field or a column from the data streams

Returns

INT

Example

Contents of Data Stream:

{"type": "event","id": "1", data" : "sample data", "extendeddata": "apple"}
{"type": "event","id": "2", data" : "sample data", "extendeddata": "watermelon"}
{"type": "event","id": "3", data" : "sample data", "extendeddata": "cucumber"}
{"type": "event","id": "4", data" : "sample data", "extendeddata": "Strawberry"}

Query:

SELECT type,
COUNT_IF(data = 'sample data') AS data,
COUNT_IF(data = 'sample data' AND extendeddata='apple') AS apples
FROM stream
GROUP BY type

Results:

type

data

apples

event

4

1

DECAYED_SUM

Performs a sum on the value and decays that sum based on the decay factor and how old the original data is.

DELETE

Delete the record when this is set.

DYNAMIC_SESSIONS

Gets the current sessions, where a session is defined as events separated by no more than windowSize time. If two window sizes are passed, the larger one is used.

FIRST

The first value in the time window.

Example:

For the following stream of events:

{"id": "1", "data": 3}
{"id": "1", "data": 2}
{"id": "2", "data": 5}
{"id": "3", "data": 8}

Using this aggregation with primary key id and SUM(data) produces the following data:

Primary Key id

FIRST(data)

1

3

2

5

3

8

FIRST_ARRAY

The first array of values in the time window.

FIRST_EACH

The first value per group.

FIRST_TIME_SERIES

The first value per time interval.

LAST

The last value in the time window.

Example:

For the following stream of events:

{"id": "1", "data": 3}
{"id": "1", "data": 2}
{"id": "2", "data": 5}
{"id": "3", "data": 8}

Using this aggregation with primary key id and LAST(data) produces the following data:

Primary Key id

LAST(data)

1

2

2

5

3

8

LAST_ARRAY

The last array of values in the time window.

LAST_EACH

The last value per group.

LAST_K

The last k values.

LAST_K_EACH

The last k values per group.

LAST_TIME_SERIES

The last value per interval.

MAX

The maximum value in the time window.

Example:

For the following stream of events:

{"id": "1", "data": 2}
{"id": "1", "data": 3}
{"id": "2", "data": 5}
{"id": "3", "data": 8}

Using this aggregation with primary key id and MAX(data) produces the following data:

Primary Key id

MAX(data)

1

3

2

5

3

8

MAX_BY

The value correlating to the maximum sort in the time window.

MAX_EACH

The maximum value per group.

MAX_TIME_SERIES

The maximum value per time interval.

MIN

The minimum value in the time window.

Example:

For the following stream of events:

{"id": "1", "data": 2}
{"id": "1", "data": 3}
{"id": "2", "data": 5}
{"id": "3", "data": 8}

Using this aggregation with primary key id and MIN(data) produces the following data:

Primary Key id

MIN(data)

1

2

2

5

MIN_BY

The value correlating to the minimum sort in the time window.

MIN_EACH

The minimum value per group.

MIN_TIME_SERIES

The minimum value per time interval.

SESSION_COUNT

Stores the number of sessions.

A session includes all the events in the aggregation that are at most windowSize apart in their value.

Example:

For the following stream of events:

{"id": "John", "time": 1}
{"id": "John", "time": 4}
{"id": "John", "time": 2}
{"id": "John", "time": 7}

Using this aggregation with primary key id and SESSION_COUNT(time, 2) will produce the following data:

Primary Key id

SESSION _COUNT(time, 2)

"John"

2

With SESSION_COUNT(time, 2), a session is defined as a distance between two events of up to or including 2. Thus, 1, 2, and 4 are one session and 7 is the second session.

STD_DEV

The standard deviation of values in the time window.

STD_DEV_EACH

The standard deviation of the value per group.

STRING_MAX

The maximum string value in the time window sorted case-sensitive lexicographically.

STRING_MAX_EACH

Stores the maximum string value per group, sorted case-sensitive lexicographically.

STRING_MIN

The minimum string value in the time window sorted case-sensitive lexicographically.

STRING_MIN_EACH

Stores the minimum string value per group, sorted case-sensitive lexicographically.

SUM

The sum of the values in the time window.

Example:

For the following stream of events:

{"id": "1", "data": 2}
{"id": "1", "data": 3}
{"id": "2", "data": 5}
{"id": "3", "data": 8}

Using this aggregation with primary key id and SUM(data) produces the following data:

Primary Key id

SUM(data)

1

5

2

5

3

8

SUM_EACH

The sum of the values per group.

SUM_TIME_SERIES

The sum of the values per time interval.

WEIGHTED_AVERAGE

The weighted average of a value in the time window.