dataset_column_stats
Overview
The system.insights.dataset_column_stats
table offers comprehensive statistics about columns within datasets in your catalog. This table serves as a tool for data profiling, query performance optimization, and monitoring schema changes, all powered by Upsolver's real-time stream processing engine.
Columns
The following table describes the columns contained in the system.insights.dataset_column_stats
system table:
Column Name | Data Type | Description |
---|---|---|
| STRING | The name of the catalog where the dataset resides. |
| STRING | The name of the dataset. |
| STRING | The name of the column. |
| STRING | The data type of the column (e.g., STRING, BIGINT, etc.). |
| INT | The density of data in the column. |
| INT | Density of this column relative to its parent, if applicable. |
| INT | Total number of records in the column. |
| INT | Minimum number of distinct values found in the column. |
| INT | Maximum number of distinct values found in the column. |
| BOOLEAN | A Boolean flag that indicates if values appear to be unique based on the current statistics. Note: The value may not be unique due to infrequent repetitions. |
| ARRAY | An array of the most frequent values along with their counts. |
| VARIANT | Smallest value in the column. |
| VARIANT | Largest value in the column. |
| ARRAY | An array representing the distribution of the length of values in the column. |
| ARRAY | An array representing the distribution of values in the column. |
| TIMESTAMP | Timestamp when the data in the column was first seen. |
| TIMESTAMP | Timestamp when the data in the column was last seen. |
Last updated