Links

Jobs

The jobs system table includes information about the active jobs in your organization.
From a worksheet within Upsolver, run the following query:
SELECT *
FROM system.information_schema.jobs;
-- If you have a lot of jobs, you can use LIMIT to reduce the result set
-- LIMIT 50;
The system.information_schema.jobs table includes the following columns, which may differ depending on the type of job, e.g. ingestion, transformation, or materialized view, and the data source or destination, e.g. database or content type:
Column
Type
Description
comment
string
An optional description for the job.
created_at
datetime
The date and time when the job was created.
created_by
string
The name of the user who created the job.
id
string
The unique identifier for the job.
job_type
string
The type of job e.g. copy (ingestion), insert (transformation).
joins_with[]
string
The name of the materialized view that the SELECT statement joins to in the transformation job.
modified_at
string
The date and time when the job was modified.
modified_by
datetime
The name of the user who modified the job.
name
string
The name of the job.
properties{}
string
See the Properties table below.
source
string
The name of the data source where data is ingested from.
source_type
string
The data source, e.g. s3, kinesis, glue_catalog
status
string
The job status, e.g. running, completed. For more information, see Job status.
target
[
target_name (string),
target_type(string)
]
string
The name of the target, e.g. default_glue_catalog.upsolver_samples.orders_raw_data, and the target type, e.g. glue_catalog, snowflake.

Properties

The properties column comprises information about the job options specified when the job was created or altered. The options may differ depending on the type of job, e.g. ingestion, transformation, or materialized view, and the data source or destination:
Column
Type
Description
ADD_MISSING_COLUMNS
Boolean
Returns TRUE or FALSE. When TRUE, columns that don't exist in the target table are automatically added on discovery.
AGGREGATION_PARALLELISM
integer
The value that is optionally configured when an aggregation is included in a transformation job.
BULK_MAX_SIZE_BYTES
integer
The size in MB of each bulk insert into an Elasticsearch index.
COMMENT
string
Optional comment to describe the job.
COMPRESSION
string
The source compression type.
COMPUTE_CLUSTER
string
The compute cluster to run the job.
CONSUMER_PROPERTIES
string
The properties configured for the Kafka consumer that override the properties in the connection.
CONTENT_TYPE
string
The file format of the content being read by the job.
CONTENT_TYPE
string
Returns the options specified for the CONTENT_TYPE option. See the Content types table below.
CREATE_TABLE_IF_MISSING
Boolean
Returns TRUE or FALSE to indicate if a new table should be created on the target.
DATE_PATTERN
string
The date pattern of the partitions on the Amazon S3 bucket to read from.
DDL_FILTER[][]
string
The list of DDL expressions that the job will ignore when an error is returned from the Debezium engine.
DEDUPLICATE_WITH [
WINDOW (string)
]
string
Returns the timeframe window for applying deduplication.
DELETE_FILES_AFTER_LOAD
Boolean
Returns TRUE or FALSE to indicate if files are deleted from the storage source after ingestion into the target.
END_AT
string
Returns the time to stop ingesting data.
EXCLUDE_COLUMNS[][]
string
Returns the columns that the job has been instructed to ignore on ingestion.
EXCLUDED_TABLES[][]
string
Returns the tables that the job has been instructed to ignore on ingestion.
FAIL_ON_WRITE_ERROR
Boolean
Returns TRUE or FALSE to indicate if the job should fail when an on-write error occurs.
FILE_PATTERN
string
The date pattern of the partitions on the Amazon S3 bucket to read from.
HEARTBEAT_TABLE
string
The name of the heartbeat table to use when ingesting CDC data from PostgreSQL.
INITIAL_LOAD_PATTERN
string
The regex pattern for loading files separately from the specified date pattern.
INITIAL_LOAD_PREFIX
string
The prefix to specify that files matching this value should be loaded immediately when the job runs.
INTERMEDIATE_STORAGE_CONNECTION
string
The intermediate storage location name.
INTERMEDIATE STORAGE_LOCATION
string
The location (URI) for intermediate storage.
KEEP_EXISTING_VALUES_WHEN_NULL
Boolean
Returns TRUE or FALSE to indicate whether updates to the table preserve the previous non-null value.
MAX_DELAY
string
Returns the timeframe for instructing the job how far back to check for new files.
METADATA_RETENTION
string
Returns the time that the metadata statistics will be saved.
OUTPUT_OFFSET
string
Returns the timeframe for instructing the job how far back to move data into the next file.
PARSE_JSON_COLUMNS
Boolean
Returns TRUE or FALSE to indicate whether the job should parse JSON columns into a struct matching the JSON value when ingesting from PostgreSQL.
PUBLICATION_NAME
string
The name of the publication in PostgreSQL to ingest data from.
READ_VIA_PARTITION_ROOT
Boolean
Returns TRUE or FALSE to indicate the ingestion job should read via the root partition of a PostgreSQL database.
READER_SHARDS
integer
Returns the number of readers used in parallel to run the stream.
RUN_INTERVAL
string
The timeframe that determines how often the job runs.
SKIP_FAILED_FILES
Boolean
Returns TRUE or FALSE to indicate if the job should ignore files that fail and continue executing.
SKIP_SNAPSHOTS
Boolean
Returns TRUE or FALSE to indicate whether or not an initial snapshot should be taken.
START_FROM
string
The time when the job should start ingesting data.
STORE_RAW_DATA
Boolean
Returns TRUE or FALSE to indicate if an additional copy of the data is stored in its original format
WRITE_INTERVAL
string
The time-frequency for how often the job runs.

Content types

The CONTENT_TYPE column comprises the content type options specified when the job was created:
Column
Type
Description
ALLOW_DUPLICATE_HEADERS
Boolean
Returns TRUE or FALSE to indicate if repeat headers are allowed.
DELIMITER
string
The delimiter character used for columns in the CSV file.
HEADER[][]
string
The column names if the first row in the file is a header row.
HEADER_LINE
string
The comma-separated list of header names.
INFER_TYPES
Boolean
Returns TRUE or FALSE to indicate if the data type is inferred or all data is treated as a string.
MAX_COLUMNS
integer
The number of columns allocated when reading a row from a CSV or TSV file.
MULTI_LINE
Boolean
Returns TRUE or FALSE to indicate if the pattern is matched against the whole input.
NULL_VALUE
string
Returns TRUE or FALSE to indicate if the values in the file that match the provided value are interpreted as null.
PATTERN
string
The Java regular expression value.
QUOTE_ESCAPE_CHAR
string
The character used for escaping quotes inside an already quoted value.
SCHEMA_REGISTRY_URL
string
Returns the Avro schema registry URL.
SPLIT_ROOT_ARRAY
Boolean
Returns TRUE or FALSE to indicate if a root object that is an array is parsed as separate events.
STORE_JSON_AS_STRING
Boolean
Returns TRUE or FALSE to indicate if a copy of the original JSON is stored as a string value in an additional column.
STORE_ROOT_AS_STRING
Boolean
Returns TRUE or FALSE to indicate if a copy of the XML is stored as a string in an additional column.
TYPE
string
The file type to be ingested, e.g. CSV, JSON, XML.
Please refer to the Jobs SQL command reference for more information on configuring job options.
Last modified 6mo ago