Job Execution Status
These metrics provide information about your job, enabling you to monitor performance and check for issues.
Job executions currently running
Metric type | Informational |
About this metric | The number of currently running job executions. |
Timeframe | Now |
More information
A job is scheduled to run every defined time interval (i.e every minute, hour, day etc). A job execution is defined as an execution of a specific time interval. As long as the job is up-to-date, a single job execution will be running according to the expected running schedule.
In case there's a backlog, multiple job executions can run concurrently. Each execution will handle a different time interval. This can happen, for example, due to historical running of the job or a spike in the amount of data which causes the job execution duration to exceed the time between intervals.
Job executions currently in queue
Metric type | Warning |
About this metric | The number of queued job executions pending. |
Limits | Warn when > 0 |
Timeframe | Now |
More information
This metric is the count of runnable jobs not running. The value for an up-to-date job should be 0. If you are performing a replay, this number could be very high, potentially in the thousands. However, as the replay runs, the number should steadily decrease at the rate at which work is being done. If it is not a replay, it can mean that the cluster is not big enough to handle the workload. A value below 10 is considered acceptable, but you may have a problem if it is increasing, and the cluster is not managing to keep up with the workload.
Be aware that an increasing queue will cause latency issues with your data. However, if the number is constant over time and the latency is acceptable, then this may not be an issue for you. If the queue increases, it should be investigated.
Job executions completed - today
Metric type | Informational |
About this metric | The number of job executions completed today. |
Timeframe | Today (midnight UTC to now) |
More information
The total number of job executions completed today. A job is scheduled to run every defined time interval (i.e. every minute, hour, day etc). A job execution is defined as an execution of a single time interval.
Job executions completed - lifetime
Metric type | Informational |
About this metric | The number of job executions completed over the lifetime of the job. |
Timeframe | Job lifetime |
More information
The total number of job executions completed over the lifetime of the job. A job is scheduled to run every defined time interval (i.e. every minute, hour, day etc). A job execution is defined as an execution of a single time interval.
Job executions currently waiting for dependencies
Job executions currently retrying after failure
Metric type | Warning |
About this metric | The number of job executions that encountered an error and are currently retrying. |
Limits | Error when > 0 |
More information
The number of job executions that encountered an error and are currently retrying. Ideally, this should be 0. If a job encounters a transient error, the value will disappear after the retry is successful, otherwise, investigation is required to fix the issue. The retry will continue as long as the issue occurs and will stop only once it is resolved.
Execution Failure Reason
Metric type | Warning |
About this metric | The error message detailing why the job failed. |
Timeframe | Now |
More information
The error message detailing why the job failed, which will be unique to your job. You can adapt the query under the See All Events (SQL Syntax) tab to view the full error for the message relevant to your job.
Last updated