Content types
This page describes the job options for ingesting data from different types of content.
When reading in your data, additional options can be configured for the following content types:
CSV
INFER_TYPES
INFER_TYPES
Type: Boolean
(Optional) When true
, each column's data type is inferred as one of the following types: string
, integer
, double
, Boolean
.
When false
, all data is treated as a string.
HEADER
HEADER
Type: array
Default: Empty string
(Optional) A comma-separated list of column names.
When the CSV data include a header as the first row, the HEADER
property can be omitted. By omitting this property, it tells Upsolver that a header row can be found in the data and it will take the following actions:
Use the first row for column names
Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the HEADER
property when creating a JOB
. This tells Upsolver to take the following actions:
Use the provided
HEADER
property for column namesDo not skip the first row since it contains data
If your data does not include a header row and you do not set a HEADER
property when creating the job, Upsolver will assume the first row is a header and not process it.
HEADER_LINE
HEADER_LINE
Type: string
Default: Empty string
(Optional) A string containing a comma-separated list of header names. This is an alternative to HEADER
.
DELIMITER
DELIMITER
Type: text
Default: ,
(Optional) The delimiter used for columns in the CSV file
QUOTE_ESCAPE_CHAR
QUOTE_ESCAPE_CHAR
Type: text
Default: "
(Optional) Defines the character used for escaping quotes inside an already quoted value.
NULL_VALUE
NULL_VALUE
Type: text
(Optional) Values in the CSV that match the provided value are interpreted as null.
MAX_COLUMNS
MAX_COLUMNS
Type: integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
ALLOW_DUPLICATE_HEADERS
ALLOW_DUPLICATE_HEADERS
Type: Boolean
Default: false
(Optional) When true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.
TSV
INFER_TYPES
INFER_TYPES
Type: Boolean
(Optional) When true
, each column's data types are inferred as one of the following types: string
, integer
, double
, Boolean
.
When false
, all data is treated as a string.
HEADER
HEADER
Type: array
Default: Empty string
(Optional) A comma-separated list of column names.
When the TSV data include a header as the first row, HEADER
property can be omitted. By omitting this property, it tells Upsolver that a header row can be found in the data and it will take the following actions:
Use the first row for column names
Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the HEADER
property when creating a JOB
. This tells Upsolver to take the following actions:
Use the provided
HEADER
property for column namesDo not skip the first row since it contains data
If your data does not include a header row and you do not set a HEADER
property when creating the job, Upsolver will assume the first row is a header and not process it.
HEADER_LINE
HEADER_LINE
Type: string
Default: Empty string
(Optional) A string containing a comma-separated list of header names. This is an alternative to HEADER
.
NULL_VALUE
NULL_VALUE
Type: text
(Optional) Values in the TSV that match the provided value are interpreted as null.
MAX_COLUMNS
MAX_COLUMNS
Type: integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
ALLOW_DUPLICATE_HEADERS
ALLOW_DUPLICATE_HEADERS
Type: Boolean
Default: false
(Optional) When true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.
JSON
STORE_JSON_AS_STRING
STORE_JSON_AS_STRING
Type: Boolean
Default: false
(Optional) When true
, a copy of the original JSON is stored as a string value in an additional column.
SCHEMA_REGISTRY
Note that only Avro schemas are currently supported.
SCHEMA_REGISTRY_URL
SCHEMA_REGISTRY_URL
Type: text
Avro schema registry URL. To support schema evolution add {id}
to the URL and Upsolver will embed the id from the AVRO header.
For example, https://schema-registry.service.yourdomain.com/schemas/ids/{id}
FIXED_WIDTH
COLUMNS
COLUMNS
Type: list
(Optional) An array of the name, start index, and end index for each column in the file.
INFER_TYPES
INFER_TYPES
Type: Boolean
Default: false
(Optional) When true
, each column's data type is inferred. When false
, all data is treated as a string.
REGEX
See: Java Pattern
PATTERN
PATTERN
Type: text
(Optional) The pattern to match against the input. Named groups are extracted from the data.
MULTILINE
MULTILINE
Type: Boolean
Default: false
(Optional) When true
, the pattern is matched against the whole input. When false
, it is matched against each line of the input.
INFER_TYPES
INFER_TYPES
Type: Boolean
Default: false
(Optional) When true
, each column's data types is inferred. When false
, all data is treated as a string.
SPLIT_LINES
PATTERN
PATTERN
Type: text
(Optional) A regular expression pattern to split the data by. If left empty, the data is split by lines.
XML
STORE_ROOT_AS_STRING
STORE_ROOT_AS_STRING
Type: Boolean
Default: false
(Optional) When true
, a copy of the XML is stored as a string in an additional column.
Last updated