LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Reference
Reference
  • Reference
  • ✨Learning Paths
    • Amazon Kinesis
    • Amazon S3
    • Apache Kafka
    • Confluent Cloud
    • Microsoft SQL Server
    • MongoDB
    • MySQL
    • PostgreSQL
  • SQL COMMANDS
    • Clusters
      • CREATE CLUSTER
      • ALTER CLUSTER
      • ROLL CLUSTER
      • STOP/START CLUSTER
      • DROP CLUSTER
    • Connections
      • CREATE CONNECTION
        • Amazon Kinesis
        • Amazon Redshift
        • Amazon S3
        • Apache Kafka
        • AWS Glue Data Catalog
        • ClickHouse
        • Confluent Cloud
        • Elasticsearch
        • Hive Metastore
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • Polaris Catalog
        • Iceberg REST catalogs
        • PostgreSQL
        • Snowflake
      • ALTER CONNECTION
      • DROP CONNECTION
      • CDC Connections with SSL
    • Jobs
      • CREATE JOB
        • Ingestion
          • Amazon Kinesis
          • Amazon S3
          • Apache Kafka
          • Confluent Kafka
          • Content Types
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Replication
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Transformation
          • INSERT
            • MAP_COLUMNS_BY_NAME
            • UNNEST
          • MERGE
          • SELECT
          • Job Options
            • Amazon Redshift
            • Amazon S3
            • Apache Iceberg
            • ClickHouse
            • Data Lake Tables
            • Elasticsearch
            • PostgreSQL
            • Snowflake
        • Monitoring
          • Amazon CloudWatch
          • Datadog
          • Dynatrace
      • ALTER JOB
      • PAUSE / RESUME JOB
      • DROP JOB
    • Materialized Views
      • CREATE MATERIALIZED VIEW
      • ALTER MATERIALIZED VIEW
      • DROP MATERIALIZED VIEW
      • Join with a Materialized View
    • Tables
      • CREATE TABLE
      • ALTER TABLE
      • DROP TABLE
    • Iceberg Tables
      • Upsolver Managed Tables
        • CREATE ICEBERG TABLE
        • ALTER ICEBERG TABLE
        • DROP ICEBERG TABLE
        • OPTIMIZE ICEBERG TABLE
      • External Iceberg Tables
        • CREATE EXTERNAL ICEBERG TABLE
        • DROP EXTERNAL ICEBERG TABLE
      • Mirror Iceberg Tables
        • CREATE MIRROR
        • ALTER MIRROR
        • PAUSE / RESUME MIRROR
        • DROP MIRROR
  • FUNCTIONS & OPERATORS
    • Data Types
    • Functions
      • Aggregate
        • APPROX_COUNT_DISTINCT
        • APPROX_COUNT_DISTINCT_EACH
        • AVG
        • AVG_EACH
        • AVG_TIME_SERIES
        • COLLECT_SET
        • COLLECT_SET_EACH
        • COUNT
        • COUNT(*)
        • COUNT(DISTINCT ...)
        • COUNT_EACH
        • COUNT_IF
        • DYNAMIC_SESSIONS
        • FIRST
        • FIRST_ARRAY
        • FIRST_EACH
        • FIRST_TIME_SERIES
        • LAST
        • LAST_ARRAY
        • LAST_EACH
        • LAST_K
        • LAST_K_EACH
        • LAST_TIME_SERIES
        • MAX
        • MAX_BY
        • MAX_EACH
        • MAX_TIME_SERIES
        • MIN
        • MIN_BY
        • MIN_EACH
        • MIN_TIME_SERIES
        • SESSION_COUNT
        • STD_DEV
        • STD_DEV_EACH
        • STRING_MAX
        • STRING_MAX_EACH
        • STRING_MIN_EACH
        • SUM
        • SUM_EACH
        • SUM_TIME_SERIES
        • WEIGHTED_AVERAGE
      • Array
        • ARRAY_DISTINCT
        • ARRAY_JOIN
        • ARRAY_MAX
        • ARRAY_MIN
        • ARRAY_SORT
        • ARRAY_SORT_DESC
        • ARRAY_SUM
        • COUNT_VALUES_IF
        • COUNT_VALUES
        • ELEMENT_AT
        • FIRST_ELEMENT
        • LAST_ELEMENT
        • VALUE_INDEX_IN_ARRAY
        • VALUE_INDEX_IN_ROW
      • Comparison
        • GREATEST
        • LEAST
      • Conditional
        • COALESCE
        • IF_ELSE
        • NULL_IF
      • Date & Time
        • ADD_TIME_ZONE_OFFSET
        • DATE
        • DATE_ADD
        • DATE_DIFF
        • DATE_TRUNC
        • DAY
        • DAY_OF_WEEK
        • DAY_OF_YEAR
        • EXTRACT_TIMESTAMP
        • EXTRACT
        • FORMAT_DATETIME
        • FROM_ISO8601_DATE
        • FROM_UNIXTIME
        • HOUR
        • MILLISECOND
        • MINUTE
        • MONTH
        • QUARTER
        • RUN_END_TIME
        • RUN_START_TIME
        • SECOND
        • SUBTRACT_TIME_ZONE_OFFSET
        • TO_UNIX_EPOCH_MILLIS
        • TO_UNIX_EPOCH_SECONDS
        • TO_UNIXTIME
        • WEEK
        • YEAR_OF_WEEK
        • YEAR
      • Filter
        • IS_DUPLICATE
        • NOT
      • Interval
        • PARSE_DURATION
      • Mathematical
        • ABS
        • CBRT
        • CEIL
        • CEILING
        • DEGREES
        • EXP
        • FLOOR
        • GET_SHARD_NUMBER
        • LN
        • LOG
        • LOG2
        • LOG10
        • MOD
        • MODULO
        • POW
        • POWER
        • RADIANS
        • RAND
        • RANDOM
        • RECIPROCAL
        • ROUND
        • SIGN
        • SORT_VALUES
        • SQRT
        • TRUNCATE
      • Regular Expressions
        • REGEXP_EXTRACT
        • REGEXP_EXTRACT_ALL
        • REGEXP_LIKE
        • REGEX_MATCH_POSITION
        • REGEX_NAMED_GROUPS
        • REGEXP_REPLACE
      • Spatial
        • ST_DISTANCE
        • ST_WGS84_DISTANCE
        • WKT_SPATIAL_CONTAINS
        • WKT_SPATIAL_INTERSECT
      • String
        • BASE64_DECODE
        • BASE64_TO_HEX
        • BYTES_SUBSTRING
        • CONCAT
        • DATE
        • JOIN_ARRAYS
        • LENGTH
        • LOWER
        • LPAD
        • LTRIM
        • MD5
        • PARSE_DATETIME
        • REPLACE
        • REVERSE
        • RPAD
        • RTRIM
        • SHA1
        • SHA3_512
        • SHA256
        • SHA512
        • SORT_VALUES
        • SPLIT
        • SPLIT_TO_RECORD
        • STRING_FORMAT
        • STRIP_MARGIN
        • STRIP_PREFIX
        • STRIP_SUFFIX
        • STRPOS
        • SUBSTR
        • SUBSTRING
        • TRANSLATE
        • TRIM_CHARS
        • TRIM
        • UPPER
        • UUID_GENERATOR
        • XX_HASH
      • Structural
        • FROM_KEY_VALUE
        • GET_RANGE
        • JOIN_ALL_BY_KEY
        • JSON_PATH
        • JSON_TO_RECORD
        • MAP_WITH_INDEX
        • QUERY_STRING_TO_RECORD
        • RECORD_TO_JSON
        • SORT_BY
        • TO_ARRAY
        • ZIP_WITH_INDEX
        • ZIP
      • Trigonometric
        • COS
        • SIN
        • TAN
        • TANH
      • Type Conversion
        • CAST
        • CHR
        • DECIMAL_TO_HEX
        • HEX_TO_DECIMAL
        • TO_BIGINT
        • TO_DOUBLE
        • TO_STRING
      • URL
        • TOP_PRIVATE_DOMAIN
        • URL_DECODE
        • URL_ENCODE
        • URL_PARSER
    • Operators
      • Comparison
      • Conditional
        • CASE
      • Logical
      • Mathematical
      • String
  • MONITORING
    • Clusters
    • Datasets
      • Ingested Data
        • Column
      • Lineage
      • Data Violations
      • Statistics
      • Maintenance
        • Compactions
        • Expire Snapshots
        • Orphan Files
      • Columns
      • Partitions
      • Properties
    • Job Status
      • Stream & File Sources
        • Monitoring
        • Graphs
        • Lineage
        • Settings
      • CDC Sources
        • Monitoring
        • Replication Settings
        • Job Settings
    • System Catalog
      • Information Schema
        • Clusters
        • Columns
        • Connections
        • Jobs
        • Mirrors
        • Tables
        • Users
        • Views
      • Insights
        • job_output_column_stats
        • dataset_column_stats
      • Monitoring
        • CDC Status
        • Clusters
        • Expectations
        • Jobs
        • Partition Statistics
        • Recent Compactions
        • Running Queries
        • Table Statistics
      • Task Executions Table
  • GENERAL
    • Common SQL Syntax
    • View Entity Syntax
    • Keyboard Shortcuts
Powered by GitBook
On this page
  • Job executions currently running
  • Job executions currently in queue
  • Job executions completed - today
  • Job executions completed - lifetime
  • Job executions currently waiting for dependencies
  • Job executions currently retrying after failure
  • Execution Failure Reason
  1. MONITORING
  2. Job Status
  3. Stream & File Sources
  4. Monitoring (V1)

Job Execution Status

These metrics provide information about your job, enabling you to monitor performance and check for issues.

Last updated 11 months ago

Job executions currently running

Metric type

Informational

About this metric

The number of currently running job executions.

Timeframe

Now

More information

A job is scheduled to run every defined time interval (i.e every minute, hour, day etc). A job execution is defined as an execution of a specific time interval. As long as the job is up-to-date, a single job execution will be running according to the expected running schedule.

In case there's a backlog, multiple job executions can run concurrently. Each execution will handle a different time interval. This can happen, for example, due to historical running of the job or a spike in the amount of data which causes the job execution duration to exceed the time between intervals.

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

For additional columns, alter this statement and use SELECT *.

SELECT STRING_FORMAT('{0,number,#,###}', running_executions) AS     running_executions, 'OK' AS running_executions_severity 
FROM system.monitoring.jobs 
WHERE job_id = '<job_id>';

Job executions currently in queue

Metric type

Warning

About this metric

The number of queued job executions pending.

Limits

Warn when > 0

Timeframe

Now

More information

This metric is the count of runnable jobs not running. The value for an up-to-date job should be 0. If you are performing a replay, this number could be very high, potentially in the thousands. However, as the replay runs, the number should steadily decrease at the rate at which work is being done. If it is not a replay, it can mean that the cluster is not big enough to handle the workload. A value below 10 is considered acceptable, but you may have a problem if it is increasing, and the cluster is not managing to keep up with the workload.

Be aware that an increasing queue will cause latency issues with your data. However, if the number is constant over time and the latency is acceptable, then this may not be an issue for you. If the queue increases, it should be investigated.

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

For additional columns, alter this statement and use SELECT *.

SELECT STRING_FORMAT('{0,number,#,###}', queued_executions) AS queued_executions, IF_ELSE(queued_executions > 0, 'WARNING', 'OK') AS queued_executions_severity 
FROM system.monitoring.jobs 
WHERE job_id = '<job_id>';

Consider increasing the cluster's server limit to allow more executions to run in parallel. This will allow the job to remain up-to-date and/or close any pending backlog.

Job executions completed - today

Metric type

Informational

About this metric

The number of job executions completed today.

Timeframe

Today (midnight UTC to now)

More information

The total number of job executions completed today. A job is scheduled to run every defined time interval (i.e. every minute, hour, day etc). A job execution is defined as an execution of a single time interval.

For additional columns, alter this statement and use SELECT *.

SELECT STRING_FORMAT('{0,number,#,###}', completed_executions_today) AS completed_executions_today, 'OK' AS completed_executions_today_severity 
FROM system.monitoring.jobs 
WHERE job_id '<job_id>';

Job executions completed - lifetime

Metric type

Informational

About this metric

The number of job executions completed over the lifetime of the job.

Timeframe

Job lifetime

More information

The total number of job executions completed over the lifetime of the job. A job is scheduled to run every defined time interval (i.e. every minute, hour, day etc). A job execution is defined as an execution of a single time interval.

For additional columns, alter this statement and use SELECT *.

SELECT STRING_FORMAT('{0,number,#,###}', total_completed_executions) AS total_completed_executions, 'OK' AS total_completed_executions_severity
FROM system.monitoring.jobs 
WHERE job_id '<job_id>';

Job executions currently waiting for dependencies

Metric type

Informational

About this metric

The number of job executions that are waiting for a dependency to complete.

More information

The number of job executions currently pending upon other jobs to complete their work before this execution can run.

For additional columns, alter this statement and use SELECT *.

SELECT STRING_FORMAT('{0,number,#,###}', executions_waiting_for_dependencies) AS executions_waiting_for_dependencies, 'OK' AS executions_waiting_for_dependencies_severity
FROM system.monitoring.jobs 
WHERE job_id '<job_id>';

Job executions currently retrying after failure

Metric type

Warning

About this metric

The number of job executions that encountered an error and are currently retrying.

Limits

Error when > 0

More information

The number of job executions that encountered an error and are currently retrying. Ideally, this should be 0. If a job encounters a transient error, the value will disappear after the retry is successful, otherwise, investigation is required to fix the issue. The retry will continue as long as the issue occurs and will stop only once it is resolved.

For additional columns, alter this statement and use SELECT *.

SELECT STRING_FORMAT('{0,number,#,###}', executions_retrying_after_failure) AS executions_retrying_after_failure, IF_ELSE(executions_retrying_after_failure > 0, 'ERROR', 'OK') AS executions_retrying_after_failure_severity 
FROM system.monitoring.jobs 
WHERE job_id '<job_id>';

See thelogs.tasks.task_executions system table for details and error messages.

Run the following query to return tasks that have encountered an error:

SELECT job_id, job_name, stage_name, job_type, 
        task_name, task_start_time, task_end_time, task_error_message
FROM logs.tasks.task_executions 
WHERE task_error_message IS NOT NULL
ORDER BY job_id;

Execution Failure Reason

Metric type

Warning

About this metric

The error message detailing why the job failed.

Timeframe

Now

More information

The error message detailing why the job failed, which will be unique to your job. You can adapt the query under the See All Events (SQL Syntax) tab to view the full error for the message relevant to your job.

For additional columns, alter this statement and use SELECT *.

SELECT execution_failure_reason
FROM system.monitoring.jobs 
WHERE job_id '<job_id>';

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.

Details
Details
Details
Details
Details
Details
Details