Links

Job Status

This section explains how to use metrics to monitor and administer your Upsolver jobs.
When you create a job in Upsolver, metrics are continuously gathered to monitor performance, issues, and errors. These metrics provide extensive information on the running of the job, the data processed by the job - whether an ingestion or transformation job - and details regarding your cluster. You can use these metrics to ensure your jobs run as expected, and that data is flowing efficiently.
The value given by a metric is useful on its own, though some metrics, when considered in combination with related metrics, provide a greater understanding of the workings of your job. It is helpful to refer to this guide to ensure you diagnose issues quickly and effectively.

All Jobs

To view metrics for a job, click the Jobs link on the menu to open the Jobs page. This page provides an overview of all your jobs:
Column
Description
Job
The name of the job.
Status
The Status indicates whether the job is running or in another phase.
Backlog
The backlog of events being processed, with the delay measured in time, e.g. 2 Minutes, Up to date.
Events Over Time
A graph of events processed since the job started. Hover your mouse over the graph for the exact number of events processed at a point in time.
Created At
Time indicator to show how long since the job was created.
Cluster
The name of the cluster that processes the job, and the cluster status. Click on the cluster to view more details.
Source
The icon and name of the data source where the data is read from, e.g. Kafka, PostgreSQL CDC.
Target
The icon and name of the data target where the data is loaded, e.g. Redshift, Snowflake.
Either, click on a Job to view the metrics, or use the Filters button to display the filter options. The filters enable you to search by Job name, or you can use the Status or Source filters for example, to dig into specific job issues.

Status

Each job will be in one of the following statuses. The filters show only the statuses that your jobs are in, so if your jobs collectively have a status in Running, Completed, and Paused, for example, you will only see filters for these three statuses.
Status
Icon Color
Description
Running
Green
The job is running.
Failed (Retrying)
Red
The job encountered fatal errors that are currently preventing or will prevent, the job from proceeding.
Warnings
Orange
There is a problem with the job or the cluster. Click on the job to view the warning message.
Deleting
Black
The job is deleting the intermediate data.
Paused
Grey
The job is paused.
Completed
Blue
The job has reached its end date and all work is complete.
Cluster Stopped
Grey
The cluster has stopped, preventing the job from running.

Job Overview

Each job page includes the following metrics:
  • Status: the current state of the job. See Job Status for a full list of states.
  • Cluster: this is the name of the compute cluster which is running the job. Use the ALTER JOB command if you need to change the cluster.
  • Unresolved Errors (Last Hour): if errors have occurred within your job in the last hour, the count of errors displays here. Click on the metric card or navigate to the end of the page to view the Parse Errors Over Time.

Job Details

Click the Job Details button to open the Information window. Here you will find the SQL command specified to create your job and configured options. This is helpful if you want to make changes to your job using the ALTER JOB command.
This window includes information relevant to your job type, and data source and destination. Furthermore, you can discover which user created the job, when the job was created, and when it was last modified.

Metric groups

Some metrics may not apply to the job type you created or only be relevant to particular data sources or target destinations. You can toggle the Summary / All button to control which metrics are visible to your job:
  • Summary: Toggle to display the most relevant metrics to your job.
  • All: Toggle to view all metrics that are applicable to your job.
Job metrics are categorized into three groups:

Job Execution Status

The Job Execution Status metrics include all jobs running on the cluster:
Metric
Description
The number of currently running job executions.
The number of queued job executions pending.
The number of job executions completed today.
The number of job executions completed over the lifetime of the job.
The number of job executions that are waiting for a dependency to complete.
The number of job executions that encountered an error and are currently retrying.
The error message detailing why the job failed.

Data Scanned

The following metrics provide monitoring information regarding the data processed by your job:
Metric
Description
The total number of rows scanned by completed executions today. This is a measure of rows that were processed successfully.
The number of rows that were filtered out because they didn’t pass the WHERE clause predicate defined in the job.
The average number of rows scanned per job execution.
The maximum number of rows scanned in a single job execution today.
The number of rows in the source table that have not been processed yet.
The number of files to load discovered by the job.
The number of bytes to load discovered in the source stream.
The number of items that failed to parse. This value represents a lower bound as malformed items may corrupt subsequent items in the same file as well.
The number of rows written to the target by the job.
The number of rows that were filtered out because they did not pass the HAVING clause predicate defined in the job.
The number of rows that were filtered out because some or all of the partition columns were NULL or empty string.
The number of rows that were filtered out because some or all of the primary key columns were NULL.
The size of the data written by the job.
The number of columns written to by the job. This value can change over time if the query uses * in the SELECT clause.
The number of sparse columns written to today. A sparse column is a column that appears in less than 0.01% of all rows.

Cluster

The Cluster metrics help you monitor the performance of, and diagnose issues with, your cluster:
Metric
Description
Represents how much of the server's processing capacity is in use.
The number of job tasks pending execution in the cluster queue.
The percentage of time that the server is doing garbage collection rather than working.
The percent of bytes re-loaded into memory from disk.
How many server crashes happened in the job’s cluster today.

Progress Over Time

The Job Status page includes visualizations to provide immediate insight into the performance of your job over a given period of time. You will find the visualizations located below the metrics, and you can use the highly configurable Lifetime button to show data over a specific time span.
The visualizations include:
Visual
Description
Delay
Shows the backlog being processed over time based on the time picker range.
The number of events processed over time, based on the time picker.
CPU utilization of a cluster. You can see the resources the job is consuming relative to the entire cluster utilization.
The number of errors that occurred over time, based on the time picker.
This table provides you with a view of the errors encountered by the job.
The number of parse errors that occurred over time, based on the time picker.
This table displays information on the file and the resulting error message.
Last modified 16d ago