Job status
This section explains how to use metrics to monitor and administer your Upsolver jobs.
When you create a job in Upsolver, metrics are continuously gathered to monitor performance, issues, and errors. These metrics provide extensive information on the running of the job, the data processed by the job - whether an ingestion or transformation job - and details regarding your cluster. You can use these metrics to ensure your jobs run as expected, and that data is flowing efficiently.
The value given by a metric is useful on its own, though some metrics, when considered in combination with related metrics, provide a greater understanding of the workings of your job. It is helpful to refer to this guide to ensure you diagnose issues quickly and effectively.
To view metrics for a job, click the Jobs link on the menu to open the All Jobs page. This page provides an overview of all your jobs:
Column | Description |
---|---|
Job | The name of the job. |
Status | |
Delay | The backlog of events being processed, with the delay measured in time, e.g. 2 Minutes, Up to date. |
Errors Last Day | The number of errors encountered by the job since midnight (UTC). |
Events Over Time | A graph of events processed since the job started. Hover your mouse over the graph for the exact number of events processed at a point in time. |
Created At | Time indicator to show how long since the job was created. |
From... To | A graphic with icons to represent the source and target of the job, displaying where the data is being read from and loaded into. |
Either, click on a Job to view the metrics, or use the Filters button to display the filter options. The filters enable you to search by Job name, or you can use the Status or Errors Last Day filters for example to dig into specific job issues.
Status | Icon | Description |
---|---|---|
Running | Green | The job is running. |
Paused | Grey | The job is paused. |
Deleting | Grey | The job is deleting the intermediate data. |
Deleted | Grey | The target data has been deleted and the job has been dropped. |
Completed | Grey | The job has reached its end date and all work is complete. |
Clean-up | Grey | Following a user-defined completion date of a job, the data has surpassed its retention period and is being deleted from the target. |
Each job page includes the following metrics:
- Cluster: this is the name of the compute cluster which is running the job. Use the
ALTER JOB
command if you need to change the cluster. - Unresolved Errors (Last Hour): if errors have occurred within your job in the last hour, the count of errors displays here. Click on the metric card or navigate to the end of the page to view the Parse errors over time.
Click the Job Details button to open the Information window. Here you will find the SQL command specified to create your job and configured options. This is helpful if you want to make changes to your job using the
ALTER JOB
command. This window includes information relevant to your job type, and data source and destination. Furthermore, you can discover which user created the job, when the job was created, and when it was last modified.
Some metrics may not apply to the job type you created or only be relevant to particular data sources or target destinations. You can toggle the Summary / All button to control which metrics are visible to your job:
- Summary: Toggle to display the most relevant metrics to your job.
- All: Toggle to view all metrics that are applicable to your job.
The Job Execution Status metrics include all jobs running on the cluster:
Metric | Description |
---|---|
The number of currently running job executions. | |
The number of queued job executions pending. | |
The number of job executions completed today. | |
The number of job executions completed over the lifetime of the job. | |
The number of job executions that are waiting for a dependency to complete. | |
The number of job executions that encountered an error and are currently retrying. | |
The error message detailing why the job failed. |
The following metrics provide monitoring information regarding the data processed by your job:
Metric | Description |
---|---|
The total number of rows scanned by completed executions today. This is a measure of rows that were processed successfully. | |
The number of rows that were filtered out because they didn’t pass the WHERE clause predicate defined in the job. | |
The average number of rows scanned per each job execution. | |
The maximum number of rows scanned in a single job execution today. | |
The number of rows in the source table that have not been processed yet. | |
The number of files to load discovered by the job. | |
The number of bytes to load discovered in the source stream. | |
The number of items that failed to parse. This value represents a lower bound as malformed items may corrupt subsequent items in the same file as well. | |
The number of rows written to the target by the job. | |
The number of rows that were filtered out because they did not pass the HAVING clause predicate defined in the job. | |
The number of rows that were filtered out because some or all of the partition columns were NULL or empty string. | |
The number of rows that were filtered out because some or all of the primary key columns were NULL. | |
The size of the data written by the job. | |
The number of columns written to by the job. This value can change over time if the query uses * in the SELECT clause. | |
The number of sparse columns written to today. A sparse column is a column that appears in less than 0.01% of all rows. |
The Cluster metrics help you monitor the performance of, and diagnose issues with, your cluster:
Metric | Description |
---|---|
Represents how much of the server's processing capacity is in use. | |
The number of job tasks pending execution in the cluster queue. | |
The percentage of time that the server is doing garbage collection rather than working. | |
The percent of bytes re-loaded into memory from disk. | |
How many server crashes happened in the job’s cluster today. |
The Job Status page includes visualizations to provide immediate insight into the performance of your job over a given period of time. You will find the visualizations located below the metrics, and you can use the highly configurable Lifetime button to show data over a specific time span.
The visualizations include:
Visual | Description |
---|---|
Shows the backlog being processed over time based on the time picker range. | |
The number of events processed over time, based on the time picker. | |
CPU utilization of a cluster. You can see the resources the job is consuming relative to the entire cluster utilization. | |
The number of errors that occurred over time, based on the time picker. | |
This table provides you with a view of the errors encountered by the job. | |
The number of parse errors that occurred over time, based on the time picker. | |
This table displays information on the file and the resulting error message. |
Last modified 2mo ago