Insights
Overview
The system.insights
schema provides a collection of system tables that offer deep insights into various aspects of your Upsolver environment. These tables are designed to assist data engineers, data scientists, and administrators in tasks ranging from query optimization, job monitoring, data profiling, to compliance checks. The tables under this schema are generated and updated in real-time by Upsolver's stream processing engine.
Key Tables
Here are two of the critical tables within the system.insights
schema:
job_output_column_stats
job_output_column_stats
Purpose: Provides statistics about columns that are part of the output of Upsolver jobs.
Common Use Cases: Job monitoring, query optimization on job outputs.
Learn more about job_output_column_stats
.
dataset_column_stats
dataset_column_stats
Purpose: Offers detailed statistics about columns within datasets.
Common Use Cases: Data profiling, query optimization, real-time analytics.
Learn more about dataset_column_stats
.
Features and Use Cases
Query Optimization: By understanding data density, value distributions, and other statistics, users can optimize queries for performance.
Data Profiling: Before diving into analytics, users can profile their data to understand characteristics like value ranges, uniqueness, and more.
Job Monitoring: Gain insights into the behavior and output of Upsolver jobs, helping with debugging and quality assurance.
Real-Time Analytics: Utilize real-time statistics for generating real-time analytics dashboards.
Audit and Compliance: Tables such as
job_output_column_stats
offerfirst_seen
andlast_seen
timestamps, useful for auditing and compliance checks.Alerting on Schema Changes: Detect changes in data density, distinct values, or new columns to trigger alerts for schema modifications.
Accessing Tables in system.insights
system.insights
To query a table within the system.insights
schema, you can use the following SQL syntax, replacing <table_name>
with the name of the table you want to query:
Notes
These tables are read-only and are updated in real-time.
Deleted data does not affect the statistics; the statistics only reflect data written during the stream processing.
The tables can be queried like any other Upsolver tables, but they cannot be modified or deleted.
Last updated