Upsolver SQLake
Search…
⌃K

Task executions table

The system information in Upsolver is designed to help users to monitor and troubleshoot. They provide detailed internal insights to users. Upsolver jobs are broken down into various tasks. Each task is responsible for working with data, performing maintenance work and more. This section describes the task execution table and what each column means to help users to understand their jobs.
The task executions table allows you to monitor the executions of tasks that run your jobs and maintain your tables. You may use this table to monitor and troubleshoot your jobs.
It can be queried by selecting from SystemTables.logs.task_executions.
Internal note: The table name will be updated once we implement our new tree SystemTables.logs.task_executions is in the tree structure as of 7/29/2022
TODO: Delete this once implementation is complete. Database should be . system_information.task_executions
Task execution records: This section includes a list of fields in the task_executions table. It includes the field name, data type, as well as a short description of how to interpret each value.
Stage names: SQLake operations comprise of multiple stages that execute tasks to complete a job. This section describes each of these stages that can be found in the stage_name field. This can help you to better understand the progress of your jobs and identify status of each stage.
Task event types: Each stage is the logical grouping of 1 or more tasks. This section describes the types of tasks that can be executed, in no particular order. A list of descriptions that explains various task event types found in the task_event_types field and what they indicate. This can help you better understand the status of each task.

Task execution records

Each record within the table describes a task being executed.
The following table lists all the available fields for each task:
Field name
Data type
Description
cluster_name
string
The name of the cluster that processed this task.
cluster_id
string
The unique ID of the cluster that processed this task.
cloud_server_name
string
The ID of the cloud instance this job is running on.
stage_name
string
Describes the type of task being executed. For descriptions of the different stage names, see: Stage names
job_name
string
The name of the job that the task belongs to.
job_id
string
The unique ID of the job that the task belongs to.
task_name
string
The name of the task formatted as the job_id with some prefix or suffix descriptor attached.
task_start_time
timestamp
The start time of the window of data being processed. This corresponds to the value of run_start_time() within transformation jobs.
The difference between the task_start_time and task_end_time corresponds to the RUN_INTERVAL configured within the job options for transformation jobs. For data ingestion jobs, it defaults to 1 minute.
task_end_time
timestamp
The end time of the window of data being processed. This corresponds to the value of run_end_time() within transformation jobs.
The difference between the task_start_time and task_end_time corresponds to the RUN_INTERVAL configured within the job options for transformation jobs. For data ingestion jobs, it defaults to 1 minute.
shard
bigint
The shard number corresponding to this task.
total_shards
bigint
The total number of shards used to process the job for this execution.
This corresponds to the value configured by the EXECUTION_PARALLELISM job option. If the value of EXECUTION_PARALLELISM is altered at any point, the total_shards for future tasks belonging to that job are updated to match.
task_start_processing_time
timestamp
The time the task started being processed.
task_end_processing_time
timestamp
The time the task finished being processed.
task_items_read
bigint
The total number of records read.
bytes_read
bigint
The total bytes ingested from the source data in its original form, including header information.
bytes_read_as_json
bigint
The total bytes ingested from the source data if it were in a JSON format.
This is the number used to determine the volume of data scanned for billing purposes.
duration
bigint
The time in milliseconds it took to process this task.
This is equivalent to the difference between the task_start_processing_time and task_end_processing_time.
task_delay_from_start
bigint
The delay in milliseconds between the end of the data window and when the task began processing.
This is equivalent to the difference between the task_end_time and task_start_processing_time.
task_classification
string
The classification of the task as user, system, input, or metadata based on the type of task being executed.
task_error_message
string
The error message, if an error is encountered.
task_event_type
string
Classifies the task into event types.
For descriptions of the different event types, see: Task event types
organization_name
string
The name of your organization that the task belongs to.
log_processing_time
timestamp
The time the log record was processed.
organization_id
string
The unique ID of your organization. It’s the same as the organization name.
partition_date_str
string
The partition date as a string.
partition_date
date
The date column the table is partitioned by. Always qualify a partition_date filter in your queries to avoid full scans.
upsolver_schema_version
bigint
The system table's schema version. It changes when the user edits the output job that's written to this table.

Stage names

Stage name
Description
file discovery
Discovers the files within a file-based data source such as S3, Azure Blob Storage, or Google Cloud Storage.
data ingestion
Pulls data from the data source.
parse data
Parses the data discovered during "file discovery" or "data ingestion" stage.
Ingestion state maintenance
Performs maintenance work when data is being ingested.
write to storage
Writes output to object store.
write to target
Writes the data to the target location.
cleanup
Deletes old files that are unnecessary.
This can be cleaning up unneeded files after compaction or removing other temporary files such as deleting batcher files once the data has been parsed.
table state maintenance
Collects and maintains metadata about files as they are written to tables.
This metadata is later used to perform tasks such as maintaining the file system, running compactions, running queries, and more.
retention
Deletes old data and metadata that have passed the retention period as configured when the table was created.
build indices
Builds indices for materialized views by reading the raw data and creating small files for the data that are then compacted and merged together.
compact indices
Compacts indices for materialized views after they have been built.
aggregation
Builds and compacts indices to perform aggregation for aggregated outputs.
collect statistics
Gathers metadata from the ingestion or output job by generating indexes.
compact statistics
Compacts and merges the metadata index.
partition metadata
Processes metadata for partition management and maintenance.
partition maintenance
Creates new partitions and deletes old ones.
partition management
Creates new partitions and deletes old ones.
count distinct metadata
Collects the number of distinct values for a field.
event type metadata
Builds the metadata index for a field when an event type is set in Upsolver Classic. This allows us to filter by event type and show statistics per event type.
upsert metadata
Maintains metadata about primary keys in order to know how and where to perform updates when they arrive as events.
monitoring metadata
Ensures metadata is being written successfully.
dedup index
Builds the dedup index. This index is used to run IS_DUPLICATE calculations.
coordinate compaction
Coordinates partition compactions by checking available files. Simultaneously maintains other table metadata.
compaction
Compacts smaller files into larger ones to optimize query performance. Only performed when writing to a data lake output.
upsert compaction
Compacts data from multiple files to delete old rows that have a newer update.
compaction state maintenance
Performs maintenance work to ensure compaction state is healthy.
maintenance
Performs general maintenance tasks.
internal task
Performs tasks for working with connections to external environments.

Task event types

Event type
Description
started
The task has begun being processed.
finished
The task has been successfully completed.
heartbeat
An indicator that the task is still running. This is sent every 5 minutes. Users can determine if a task is long running and the current state of the task (so it has the current duration, read bytes and etc…)
Canceled
The task was canceled.
no-resources
Indicates a lack of resources to start a task. This is usually due to a connection limitation.
failed
The task has failed. Check task_error_message to better understand the error encountered.
failed-build
Failed to build a task.
failed-recoverable
An intermittent error has occurred (e.g. reading a file that was modified while reading it). The task will retry and recover from the error and the resulting data will be consistent.
dry-run-failed
The task from our automated testing process testing a new version has failed.
ignored-dry-run-failure
The dry run is ignored due to false positives.
Last modified 4mo ago