dataset_column_stats
Overview
The system.insights.dataset_column_stats
table offers comprehensive statistics about columns within datasets in your catalog. This table serves as a tool for data profiling, query performance optimization, and monitoring schema changes, all powered by Upsolver's real-time stream processing engine.
Columns
The following table describes the columns contained in the system.insights.dataset_column_stats
system table:
catalog
STRING
The name of the catalog where the dataset resides.
dataset
STRING
The name of the dataset.
column_name
STRING
The name of the column.
column_type
STRING
The data type of the column (e.g., STRING, BIGINT, etc.).
density
INT
The density of data in the column.
density_in_parent
INT
Density of this column relative to its parent, if applicable.
total_count
INT
Total number of records in the column.
min_distinct_values
INT
Minimum number of distinct values found in the column.
max_distinct_values
INT
Maximum number of distinct values found in the column.
values_appear_unique
BOOLEAN
A Boolean flag that indicates if values appear to be unique based on the current statistics. Note: The value may not be unique due to infrequent repetitions.
top_values
ARRAY
An array of the most frequent values along with their counts.
min_value
VARIANT
Smallest value in the column.
max_value
VARIANT
Largest value in the column.
length_distribution
ARRAY
An array representing the distribution of the length of values in the column.
value_distribution
ARRAY
An array representing the distribution of values in the column.
first_seen
TIMESTAMP
Timestamp when the data in the column was first seen.
last_seen
TIMESTAMP
Timestamp when the data in the column was last seen.
Last updated