Content Types
When reading in your data, additional options can be configured for the following content types:
CSV
INFER_TYPES
INFER_TYPES
Type: Boolean
(Optional) When true
, each column's data type is inferred as one of the following types: string
, integer
, double
, Boolean
.
When false
, all data is treated as a string.
HEADER
HEADER
Type: array
Default: Empty string
(Optional) An comma-separated list of column names.
When the CSV data include a header as the first row, HEADER
property can be omitted. By omitting this property, it tells Upsolver that a header row can be found in the data and it will take the following actions:
Use the first row for column names
Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the HEADER
property when creating a JOB
. This tells Upsolver to take the following actions:
Use the provided
HEADER
property for column namesDo not skip the first row since it contains data
If your data does not include a header row and you do not set a HEADER
property when creating the job, Upsolver will assume the first row is a header and not process it.
HEADER_LINE
HEADER_LINE
Type: string
Default: Empty string
(Optional) A string containing a comma-separated list of header names. This is an alternative to HEADER
.
DELIMITER
DELIMITER
Type: text
Default: ,
(Optional) The delimiter used for columns in the CSV file
QUOTE_ESCAPE_CHAR
QUOTE_ESCAPE_CHAR
Type: text
Default: "
(Optional) Defines the character used for escaping quotes inside an already quoted value.
NULL_VALUE
NULL_VALUE
Type: text
(Optional) Values in the CSV that match the provided value are interpreted as null.
MAX_COLUMNS
MAX_COLUMNS
Type: integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
ALLOW_DUPLICATE_HEADERS
ALLOW_DUPLICATE_HEADERS
Type: Boolean
Default: false
(Optional) When true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.
TSV
INFER_TYPES
INFER_TYPES
Type: Boolean
(Optional) When true
, each column's data types are inferred as one of the following types: string
, integer
, double
, Boolean
.
When false
, all data is treated as a string.
HEADER
HEADER
Type: string
Default: Empty string
(Optional) A string containing a comma separated list of column names.
When the TSV data include a header as the first row, HEADER
property can be omitted. By omitting this property, it tells Upsolver that a header row can be found in the data and it will take the following actions:
Use the first row for column names
Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the HEADER
property when creating a JOB
. This tells Upsolver to take the following actions:
Use the provided
HEADER
property for column namesDo not skip the first row since it contains data
If your data does not include a header row and you do not set a HEADER
property when creating the job, Upsolver will assume the first row is a header and not process it.
HEADER_LINE
HEADER_LINE
Type: string
Default: Empty string
(Optional) A string containing a comma-separated list of header names. This is an alternative to HEADER
.
NULL_VALUE
NULL_VALUE
Type: text
(Optional) Values in the TSV that match the provided value are interpreted as null.
MAX_COLUMNS
MAX_COLUMNS
Type: integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
ALLOW_DUPLICATE_HEADERS
ALLOW_DUPLICATE_HEADERS
Type: Boolean
Default: false
(Optional) When true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.
JSON
SPLIT_ROOT_ARRAY
SPLIT_ROOT_ARRAY
Type: Boolean
Default: true
(Optional) When true
, a root object that is an array is parsed as separate events. When false
, it is parsed as a single event that contains only an array.
STORE_JSON_AS_STRING
STORE_JSON_AS_STRING
Type: Boolean
Default: false
(Optional) When true
, a copy of the original JSON is stored as a string value in an additional column.
AVRO_SCHEMA_REGISTRY
Note that only Avro schemas are currently supported.
SCHEMA_REGISTRY_URL
SCHEMA_REGISTRY_URL
Type: text
Avro schema registry URL. To support schema evolution add {id}
to the URL and Upsolver will embed the id from the AVRO header.
For example, https://schema-registry.service.yourdomain.com/schemas/ids/{id}
FIXED_WIDTH
COLUMNS
COLUMNS
Type: list
(Optional) An array of the name, start index, and end index for each column in the file.
INFER_TYPES
INFER_TYPES
Type: Boolean
Default: false
(Optional) When true
, each column's data type is inferred. When false
, all data is treated as a string.
REGEX
See Java Pattern for more information.
PATTERN
PATTERN
Type: text
(Optional) The pattern to match against the input. Named groups are extracted from the data.
MULTILINE
MULTILINE
Type: Boolean
Default: false
(Optional) When true
, the pattern is matched against the whole input. When false
, it is matched against each line of the input.
INFER_TYPES
INFER_TYPES
Type: Boolean
Default: false
(Optional) When true
, each column's data types is inferred. When false
, all data is treated as a string.
SPLIT_LINES
PATTERN
PATTERN
Type: text
(Optional) A regular expression pattern to split the data by. If left empty, the data is split by lines.
XML
STORE_ROOT_AS_STRING
STORE_ROOT_AS_STRING
Type: Boolean
Default: false
(Optional) When true
, a copy of the XML is stored as a string in an additional column.