Jobs

The jobs system table includes information about the active jobs in your organization.

From a worksheet within Upsolver, run the following query:

SELECT * 
FROM system.information_schema.jobs;
-- If you have a lot of jobs, you can use LIMIT to reduce the result set 
-- LIMIT 50; 

The system.information_schema.jobs table includes the following columns, which may differ depending on the type of job, e.g. ingestion, transformation, or materialized view, and the data source or destination, e.g. database or content type:

ColumnTypeDescription

comment

string

An optional description for the job.

created_at

datetime

The date and time when the job was created.

created_by

string

The name of the user who created the job.

id

string

The unique identifier for the job.

job_type

string

The type of job e.g. copy (ingestion), insert (transformation).

joins_with[]

string

The name of the materialized view that the SELECT statement joins to in the transformation job.

modified_at

string

The date and time when the job was modified.

modified_by

datetime

The name of the user who modified the job.

name

string

The name of the job.

properties{}

string

See the Properties table below.

source

string

The name of the data source where data is ingested from.

source_type

string

The data source, e.g. s3, kinesis, glue_catalog

status

string

The job status, e.g. running, completed. For more information, see Job status.

target

[

target_name (string),

target_type(string)

]

string

The name of the target, e.g. default_glue_catalog.upsolver_samples.orders_raw_data, and the target type, e.g. glue_catalog, snowflake.

Properties

The properties column comprises information about the job options specified when the job was created or altered. The options may differ depending on the type of job, e.g. ingestion, transformation, or materialized view, and the data source or destination:

ColumnType Description

ADD_MISSING_COLUMNS

Boolean

Returns TRUE or FALSE. When TRUE, columns that don't exist in the target table are automatically added on discovery.

AGGREGATION_PARALLELISM

integer

The value that is optionally configured when an aggregation is included in a transformation job.

BULK_MAX_SIZE_BYTES

integer

The size in MB of each bulk insert into an Elasticsearch index.

COMMENT

string

Optional comment to describe the job.

COMPRESSION

string

The source compression type.

COMPUTE_CLUSTER

string

The compute cluster to run the job.

CONSUMER_PROPERTIES

string

The properties configured for the Kafka consumer that override the properties in the connection.

CONTENT_TYPE

string

The file format of the content being read by the job.

CONTENT_TYPE

string

Returns the options specified for the CONTENT_TYPE option. See the Content types table below.

CREATE_TABLE_IF_MISSING

Boolean

Returns TRUE or FALSE to indicate if a new table should be created on the target.

DATE_PATTERN

string

The date pattern of the partitions on the Amazon S3 bucket to read from.

DDL_FILTER[][]

string

The list of DDL expressions that the job will ignore when an error is returned from the Debezium engine.

DEDUPLICATE_WITH [

WINDOW (string)

]

string

Returns the timeframe window for applying deduplication.

DELETE_FILES_AFTER_LOAD

Boolean

Returns TRUE or FALSE to indicate if files are deleted from the storage source after ingestion into the target.

END_AT

string

Returns the time to stop ingesting data.

EXCLUDE_COLUMNS[][]

string

Returns the columns that the job has been instructed to ignore on ingestion.

EXCLUDED_TABLES[][]

string

Returns the tables that the job has been instructed to ignore on ingestion.

FAIL_ON_WRITE_ERROR

Boolean

Returns TRUE or FALSE to indicate if the job should fail when an on-write error occurs.

FILE_PATTERN

string

The date pattern of the partitions on the Amazon S3 bucket to read from.

HEARTBEAT_TABLE

string

The name of the heartbeat table to use when ingesting CDC data from PostgreSQL.

INITIAL_LOAD_PATTERN

string

The regex pattern for loading files separately from the specified date pattern.

INITIAL_LOAD_PREFIX

string

The prefix to specify that files matching this value should be loaded immediately when the job runs.

INTERMEDIATE_STORAGE_CONNECTION

string

The intermediate storage location name.

INTERMEDIATE STORAGE_LOCATION

string

The location (URI) for intermediate storage.

KEEP_EXISTING_VALUES_WHEN_NULL

Boolean

Returns TRUE or FALSE to indicate whether updates to the table preserve the previous non-null value.

MAX_DELAY

string

Returns the timeframe for instructing the job how far back to check for new files.

METADATA_RETENTION

string

Returns the time that the metadata statistics will be saved.

OUTPUT_OFFSET

string

Returns the timeframe for instructing the job how far back to move data into the next file.

PARSE_JSON_COLUMNS

Boolean

Returns TRUE or FALSE to indicate whether the job should parse JSON columns into a struct matching the JSON value when ingesting from PostgreSQL.

PUBLICATION_NAME

string

The name of the publication in PostgreSQL to ingest data from.

READ_VIA_PARTITION_ROOT

Boolean

Returns TRUE or FALSE to indicate the ingestion job should read via the root partition of a PostgreSQL database.

READER_SHARDS

integer

Returns the number of readers used in parallel to run the stream.

RUN_INTERVAL

string

The timeframe that determines how often the job runs.

SKIP_FAILED_FILES

Boolean

Returns TRUE or FALSE to indicate if the job should ignore files that fail and continue executing.

SKIP_SNAPSHOTS

Boolean

Returns TRUE or FALSE to indicate whether or not an initial snapshot should be taken.

START_FROM

string

The time when the job should start ingesting data.

STORE_RAW_DATA

Boolean

Returns TRUE or FALSE to indicate if an additional copy of the data is stored in its original format

WRITE_INTERVAL

string

The time-frequency for how often the job runs.

Content types

The CONTENT_TYPE column comprises the content type options specified when the job was created:

ColumnTypeDescription

ALLOW_DUPLICATE_HEADERS

Boolean

Returns TRUE or FALSE to indicate if repeat headers are allowed.

DELIMITER

string

The delimiter character used for columns in the CSV file.

HEADER[][]

string

The column names if the first row in the file is a header row.

HEADER_LINE

string

The comma-separated list of header names.

INFER_TYPES

Boolean

Returns TRUE or FALSE to indicate if the data type is inferred or all data is treated as a string.

MAX_COLUMNS

integer

The number of columns allocated when reading a row from a CSV or TSV file.

MULTI_LINE

Boolean

Returns TRUE or FALSE to indicate if the pattern is matched against the whole input.

NULL_VALUE

string

Returns TRUE or FALSE to indicate if the values in the file that match the provided value are interpreted as null.

PATTERN

string

The Java regular expression value.

QUOTE_ESCAPE_CHAR

string

The character used for escaping quotes inside an already quoted value.

SCHEMA_REGISTRY_URL

string

Returns the Avro schema registry URL.

SPLIT_ROOT_ARRAY

Boolean

Returns TRUE or FALSE to indicate if a root object that is an array is parsed as separate events.

STORE_JSON_AS_STRING

Boolean

Returns TRUE or FALSE to indicate if a copy of the original JSON is stored as a string value in an additional column.

STORE_ROOT_AS_STRING

Boolean

Returns TRUE or FALSE to indicate if a copy of the XML is stored as a string in an additional column.

TYPE

string

The file type to be ingested, e.g. CSV, JSON, XML.

Please refer to the Jobs SQL command reference for more information on configuring job options.

Last updated