Data Scanned
These metrics provide insight into the data processed by tasks within your job.
Last updated
These metrics provide insight into the data processed by tasks within your job.
Last updated
Metric type
Informational
About this metric
The total number of rows scanned by completed executions today. This is a measure of rows that were processed successfully.
Timeframe
Today (midnight UTC to now)
This informative metric shows accumulative progress. If the value is 0, then the job has not yet processed anything or has not started.
If and is greater than 0, but the rows scanned in completed executions are 0, then your source doesn't contain any data. This should increase in line with the number of completed executions.
Metric type
Warning
About this metric
The number of rows that were filtered out because they didn’t pass the WHERE clause predicate defined in the job.
Limits
Timeframe
Today (midnight UTC to now)
The number of rows that were filtered out because some or all of the primary key columns were NULL. If this behavior is intended, the rows can be filtered out in the WHERE
clause.
Metric type
Informational
About this metric
The average number of rows scanned per job execution.
Timeframe
Today (midnight UTC to now)
This informational metric explains how much work is taking place within each execution. If this number is low, then there is a lot of overhead for a single execution, or if high, it may indicate that you have high latency.
There is no target value for this metric, however, it should be viewed in comparison with your expectations of how much work should be done in each execution.
Metric type
Warning
About this metric
The maximum number of rows scanned in a single job execution today.
Limits
Timeframe
Today (midnight UTC to now)
In streaming data, data should arrive at a fixed cadence. This means you should not experience a cycle of seeing a spike of data arriving, and then no work.
Metric type
Informational
About this metric
The number of rows in the source table that have not been processed yet. Only rows that have been committed to the source table are included.
The number of rows in the source table that have not been processed yet. Only rows that have been committed to the source table are included.
Metric type
Warning
About this metric
The number of files to load discovered by the job.
Limits
Error when = 0
Timeframe
Today (midnight UTC to now)
This metric applies to ingestion jobs copying data from Amazon S3, and counts the number of discovered files that match the job, but have not yet been parsed.
If your job didn't find any files, the pattern you used to discover the files needs correcting. However, this can be 0 at the very start of the job, otherwise, you need to recreate the job with the correct file pattern.
Metric type
Informational
About this metric
The number of items that failed to parse. This value represents a lower bound as malformed items may corrupt subsequent items in the same file as well.
Limits
Error when > 0
Timeframe
Today (midnight UTC to now)
This metric only applies to ingestion jobs and counts the number of errors when a file or row could not be parsed. Generally, the value should be 0. If this value is above 0, you should understand why these parse errors exist e.g. the file is in the wrong format, or not formed, or corrupted.
Metric type
Informational
About this metric
The number of rows written to the target by the job.
Timeframe
Today (midnight UTC to now)
If you are expecting scanned and written rows to match and they don’t, you need to investigate the cause of this. Similarly if you have a flattening operation that you expect to increase the number of written rows and this doesn’t happen, investigation is required.
Metric type
Informational
About this metric
The number of rows that were filtered out because they didn't pass the HAVING clause predicate defined in the job.
Timeframe
Today (midnight UTC to now)
The number of rows that were filtered out because they didn't pass the HAVING
clause predicate defined in the job.
Metric type
Warning
About this metric
The number of rows that were filtered out because some or all of the partition columns were NULL or empty string.
Limit
Error when > 0
Timeframe
Today (midnight UTC to now)
If you are writing to a partition table and one of the partitions has a NULL value or empty string, the row will be filtered out. This is not usually intended behavior and flags that this is a user error requiring investigation.
If this behavior is intended, the rows can be filtered out in the WHERE
clause.
Metric type
Warning
About this metric
The number of rows that were filtered out because some or all of the primary key columns were NULL.
Limits
Error when > 0
Timeframe
Today (midnight UTC to now)
Rows are filtered out when a primary key is NULL. If this behavior is intended, the rows can be filtered out in the WHERE
clause.
Metric type
Informational
About this metric
The size of the data written by the job.
Timeframe
Today (midnight UTC to now)
Informative metric to provide a sense of scale of the data and how much is being done. If you expect this value to be more or less there is most likely a mistake in the configuration of the job.
Metric type
Warning
About this metric
The number of columns written to by the job. This value can change over time if the query uses * in the select clause.
Limits
Warn when > 500
Timeframe
Today (midnight UTC to now)
This is a fixed number if you’re not using a SELECT *
statement. You can have as many columns as you want in Upsolver, but a lot of columns can cause problems downstream in query engines such as Athena or Glue. Furthermore, this may not be what the user intended, as it can be difficult to work with a lot of columns.
It is best practice to ensure you keep your tables to a maximum of a few hundred columns for downstream support and performance.
Metric type
Warning
About this metric
The number of sparse columns written to today. A sparse column is a column that appears in less than 0.01% of all rows.
Limits
Warn when > 50% of the number of columns
Timeframe
Today (midnight UTC to now)
The number of sparse columns written today. A sparse column is a column that appears in less than 0.01% of all rows. This often happens when the job is writing to a high number of columns, but those columns only show up in one or two events.
If you have a lot of sparse columns in your data, this is often because of malformed data or unexpected results. This makes it hard to work with the data downstream, so it is best to transform the data so that there are fewer columns.
Error when > 0 AND equal to the
Warn when > 1,000,000 AND 10 * today
This value should be similar to to ensure spikes and dips are not happening, and some jobs are not working harder than other executions. A big difference between the two may be indicative of performance and latency issues.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Failed to parse some of the events in the source location. See the job monitoring page for .
Written rows relate to the . A scanned row will result in a written row unless it was filtered, or an aggregation reduced the number of scanned to written rows. For example, it may scan 1,000,000 rows, perform an aggregation, and write the result as a single row. Conversely, a flattening operation to unnest data can result in more rows written than scanned.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.
Run the following SQL command in a query window in Upsolver, replacing <job_id> with the Id for your job. The Id for your job can be found in the section under the Settings tab.