Insights

Overview

The system.insights schema provides a collection of system tables that offer deep insights into various aspects of your Upsolver environment. These tables are designed to assist data engineers, data scientists, and administrators in tasks ranging from query optimization, job monitoring, data profiling, to compliance checks. The tables under this schema are generated and updated in real-time by Upsolver's stream processing engine.

Key Tables

Here are two of the critical tables within the system.insights schema:

`job_output_column_stats`

Purpose: Provides statistics about columns that are part of the output of Upsolver jobs.
Common Use Cases: Job monitoring, query optimization on job outputs.

Learn more about job_output_column_stats.

`dataset_column_stats`

Purpose: Offers detailed statistics about columns within datasets.
Common Use Cases: Data profiling, query optimization, real-time analytics.

Learn more about dataset_column_stats.

Features and Use Cases

Query Optimization: By understanding data density, value distributions, and other statistics, users can optimize queries for performance.
Data Profiling: Before diving into analytics, users can profile their data to understand characteristics like value ranges, uniqueness, and more.
Job Monitoring: Gain insights into the behavior and output of Upsolver jobs, helping with debugging and quality assurance.
Real-Time Analytics: Utilize real-time statistics for generating real-time analytics dashboards.
Audit and Compliance: Tables such as job_output_column_stats offer first_seen and last_seen timestamps, useful for auditing and compliance checks.
Alerting on Schema Changes: Detect changes in data density, distinct values, or new columns to trigger alerts for schema modifications.

Accessing Tables in `system.insights`

To query a table within the system.insights schema, you can use the following SQL syntax, replacing <table_name> with the name of the table you want to query:

SELECT * FROM system.insights.<table_name>;

Notes

These tables are read-only and are updated in real-time.
Deleted data does not affect the statistics; the statistics only reflect data written during the stream processing.
The tables can be queried like any other Upsolver tables, but they cannot be modified or deleted.

Last updated 1 year ago