Comment on page
dataset_column_stats
The
system.insights.dataset_column_stats
table offers comprehensive statistics about columns within datasets in your catalog. This table serves as a tool for data profiling, query performance optimization, and monitoring schema changes, all powered by Upsolver's real-time stream processing engine.The following table describes the columns contained in the
system.insights.dataset_column_stats
system table:Column Name | Data Type | Description |
---|---|---|
catalog | STRING | The name of the catalog where the dataset resides. |
dataset | STRING | The name of the dataset. |
column_name | STRING | The name of the column. |
column_type | STRING | The data type of the column (e.g., STRING, BIGINT, etc.). |
density | INT | The density of data in the column. |
density_in_parent | INT | Density of this column relative to its parent, if applicable. |
total_count | INT | Total number of records in the column. |
min_distinct_values | INT | Minimum number of distinct values found in the column. |
max_distinct_values | INT | Maximum number of distinct values found in the column. |
values_appear_unique | BOOLEAN | A Boolean flag that indicates if values appear to be unique based on the current statistics. Note: The value may not be unique due to infrequent repetitions. |
top_values | ARRAY | An array of the most frequent values along with their counts. |
min_value | VARIANT | Smallest value in the column. |
max_value | VARIANT | Largest value in the column. |
length_distribution | ARRAY | An array representing the distribution of the length of values in the column. |
value_distribution | ARRAY | An array representing the distribution of values in the column. |
first_seen | TIMESTAMP | Timestamp when the data in the column was first seen. |
last_seen | TIMESTAMP | Timestamp when the data in the column was last seen. |
Last modified 1mo ago