Jobs
The jobs system table includes information about the active jobs in your organization.
From a worksheet within Upsolver, run the following query:
The system.information_schema.jobs table includes the following columns, which may differ depending on the type of job, e.g. ingestion, transformation, or materialized view, and the data source or destination, e.g. database or content type:
Column | Type | Description |
---|---|---|
comment | string | An optional description for the job. |
created_at | datetime | The date and time when the job was created. |
created_by | string | The name of the user who created the job. |
id | string | The unique identifier for the job. |
job_type | string | The type of job e.g. copy (ingestion), insert (transformation). |
joins_with[] | string | The name of the materialized view that the |
modified_at | string | The date and time when the job was modified. |
modified_by | datetime | The name of the user who modified the job. |
name | string | The name of the job. |
properties{} | string | See the Properties table below. |
source | string | The name of the data source where data is ingested from. |
source_type | string | The data source, e.g. s3, kinesis, glue_catalog |
status | string | The job status, e.g. running, completed. For more information, see Job Status. |
target [ target_name (string), target_type(string) ] | string | The name of the target, e.g. default_glue_catalog.upsolver_samples.orders_raw_data, and the target type, e.g. glue_catalog, snowflake. |
Properties
The properties column comprises information about the job options specified when the job was created or altered. The options may differ depending on the type of job, e.g. ingestion, transformation, or materialized view, and the data source or destination:
Column | Type | Description |
---|---|---|
ADD_MISSING_COLUMNS | Boolean | Returns TRUE or FALSE. When TRUE, columns that don't exist in the target table are automatically added on discovery. |
AGGREGATION_PARALLELISM | integer | The value that is optionally configured when an aggregation is included in a transformation job. |
BULK_MAX_SIZE_BYTES | integer | The size in MB of each bulk insert into an Elasticsearch index. |
COMMENT | string | Optional comment to describe the job. |
COMPRESSION | string | The source compression type. |
COMPUTE_CLUSTER | string | The compute cluster to run the job. |
CONSUMER_PROPERTIES | string | The properties configured for the Kafka consumer that override the properties in the connection. |
CONTENT_TYPE | string | The file format of the content being read by the job. |
CONTENT_TYPE | string | Returns the options specified for the |
CREATE_TABLE_IF_MISSING | Boolean | Returns TRUE or FALSE to indicate if a new table should be created on the target. |
DATE_PATTERN | string | The date pattern of the partitions on the Amazon S3 bucket to read from. |
DDL_FILTER[][] | string | The list of DDL expressions that the job will ignore when an error is returned from the Debezium engine. |
DEDUPLICATE_WITH [ WINDOW (string) ] | string | Returns the timeframe window for applying deduplication. |
DELETE_FILES_AFTER_LOAD | Boolean | Returns TRUE or FALSE to indicate if files are deleted from the storage source after ingestion into the target. |
END_AT | string | Returns the time to stop ingesting data. |
EXCLUDE_COLUMNS[][] | string | Returns the columns that the job has been instructed to ignore on ingestion. |
EXCLUDED_TABLES[][] | string | Returns the tables that the job has been instructed to ignore on ingestion. |
FAIL_ON_WRITE_ERROR | Boolean | Returns TRUE or FALSE to indicate if the job should fail when an on-write error occurs. |
FILE_PATTERN | string | The date pattern of the partitions on the Amazon S3 bucket to read from. |
HEARTBEAT_TABLE | string | The name of the heartbeat table to use when ingesting CDC data from PostgreSQL. |
INITIAL_LOAD_PATTERN | string | The regex pattern for loading files separately from the specified date pattern. |
INITIAL_LOAD_PREFIX | string | The prefix to specify that files matching this value should be loaded immediately when the job runs. |
INTERMEDIATE_STORAGE_CONNECTION | string | The intermediate storage location name. |
INTERMEDIATE STORAGE_LOCATION | string | The location (URI) for intermediate storage. |
KEEP_EXISTING_VALUES_WHEN_NULL | Boolean | Returns TRUE or FALSE to indicate whether updates to the table preserve the previous non-null value. |
MAX_DELAY | string | Returns the timeframe for instructing the job how far back to check for new files. |
METADATA_RETENTION | string | Returns the time that the metadata statistics will be saved. |
OUTPUT_OFFSET | string | Returns the timeframe for instructing the job how far back to move data into the next file. |
PARSE_JSON_COLUMNS | Boolean | Returns TRUE or FALSE to indicate whether the job should parse JSON columns into a struct matching the JSON value when ingesting from PostgreSQL. |
PUBLICATION_NAME | string | The name of the publication in PostgreSQL to ingest data from. |
READ_VIA_PARTITION_ROOT | Boolean | Returns TRUE or FALSE to indicate the ingestion job should read via the root partition of a PostgreSQL database. |
READER_SHARDS | integer | Returns the number of readers used in parallel to run the stream. |
RUN_INTERVAL | string | The timeframe that determines how often the job runs. |
SKIP_FAILED_FILES | Boolean | Returns TRUE or FALSE to indicate if the job should ignore files that fail and continue executing. |
SKIP_SNAPSHOTS | Boolean | Returns TRUE or FALSE to indicate whether or not an initial snapshot should be taken. |
START_FROM | string | The time when the job should start ingesting data. |
STORE_RAW_DATA | Boolean | Returns TRUE or FALSE to indicate if an additional copy of the data is stored in its original format |
WRITE_INTERVAL | string | The time-frequency for how often the job runs. |
Content types
The CONTENT_TYPE column comprises the content type options specified when the job was created:
Column | Type | Description |
---|---|---|
ALLOW_DUPLICATE_HEADERS | Boolean | Returns TRUE or FALSE to indicate if repeat headers are allowed. |
DELIMITER | string | The delimiter character used for columns in the CSV file. |
HEADER[][] | string | The column names if the first row in the file is a header row. |
HEADER_LINE | string | The comma-separated list of header names. |
INFER_TYPES | Boolean | Returns TRUE or FALSE to indicate if the data type is inferred or all data is treated as a string. |
MAX_COLUMNS | integer | The number of columns allocated when reading a row from a CSV or TSV file. |
MULTI_LINE | Boolean | Returns TRUE or FALSE to indicate if the pattern is matched against the whole input. |
NULL_VALUE | string | Returns TRUE or FALSE to indicate if the values in the file that match the provided value are interpreted as null. |
PATTERN | string | The Java regular expression value. |
QUOTE_ESCAPE_CHAR | string | The character used for escaping quotes inside an already quoted value. |
SCHEMA_REGISTRY_URL | string | Returns the Avro schema registry URL. |
SPLIT_ROOT_ARRAY | Boolean | Returns TRUE or FALSE to indicate if a root object that is an array is parsed as separate events. |
STORE_JSON_AS_STRING | Boolean | Returns TRUE or FALSE to indicate if a copy of the original JSON is stored as a string value in an additional column. |
STORE_ROOT_AS_STRING | Boolean | Returns TRUE or FALSE to indicate if a copy of the XML is stored as a string in an additional column. |
TYPE | string | The file type to be ingested, e.g. CSV, JSON, XML. |
For more information, please refer to the Jobs SQL command reference.
Last updated