dataset_column_stats

Overview

The system.insights.dataset_column_stats table offers comprehensive statistics about columns within datasets in your catalog. This table serves as a tool for data profiling, query performance optimization, and monitoring schema changes, all powered by Upsolver's real-time stream processing engine.

Columns

The following table describes the columns contained in the system.insights.dataset_column_stats system table:

Column Name

Data Type

Description

catalog

STRING

The name of the catalog where the dataset resides.

dataset

STRING

The name of the dataset.

column_name

STRING

The name of the column.

column_type

STRING

The data type of the column (e.g., STRING, BIGINT, etc.).

density

INT

The density of data in the column.

density_in_parent

INT

Density of this column relative to its parent, if applicable.

total_count

INT

Total number of records in the column.

min_distinct_values

INT

Minimum number of distinct values found in the column.

max_distinct_values

INT

Maximum number of distinct values found in the column.

values_appear_unique

BOOLEAN

A Boolean flag that indicates if values appear to be unique based on the current statistics. Note: The value may not be unique due to infrequent repetitions.

top_values

ARRAY

An array of the most frequent values along with their counts.

min_value

VARIANT

Smallest value in the column.

max_value

VARIANT

Largest value in the column.

length_distribution

ARRAY

An array representing the distribution of the length of values in the column.

value_distribution

ARRAY

An array representing the distribution of values in the column.

first_seen

TIMESTAMP

Timestamp when the data in the column was first seen.

last_seen

TIMESTAMP

Timestamp when the data in the column was last seen.

Last updated 1 year ago