Links

Content type options

When reading in your data, additional options can be configured for the following content types:

CSV

CONTENT_TYPE = (
TYPE = CSV
INFER_TYPES = { TRUE | FALSE }
[ HEADER = '<header>' ]
[ DELIMITER = '<delimiter>' ]
[ QUOTE_ESCAPE_CHAR = '<char>' ]
[ NULL_VALUE = '<null_value>' ]
[ MAX_COLUMNS = <integer> ]
[ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)

INFER_TYPES

Type: boolean
(Optional) When true, each column's data types are inferred as one of the following types: string, integer, double, boolean.
When false, all data is treated as a string.
Type: string
Default: Empty string
(Optional) A string containing a comma separated list of column names.
When the CSV data include a header as the first row, HEADER property can be omitted. By omitting this property, it tells SQLake that a header row can be found in the data and it will take the following actions:
  1. 1.
    Use the first row for column names
  2. 2.
    Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the HEADER property when creating a JOB. This tells SQLake to take the following actions:
  1. 1.
    Use the provided HEADER property for column names
  2. 2.
    Do not skip the first row since it contains data
If your data does not include a header row and you do not set a HEADER property when creating the job, SQLake will assume the first row is a header and not process it.

DELIMITER

Type: text
Default: ,
(Optional) The delimiter used for columns in the CSV file

QUOTE_ESCAPE_CHAR

Type: text
Default: "
(Optional) Defines the character used for escaping quotes inside an already quoted value.

NULL_VALUE

Type: text
(Optional) Values in the CSV that match the provided value are interpreted as null.

MAX_COLUMNS

Type: integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.

ALLOW_DUPLICATE_HEADERS

Type: boolean
Default: false
(Optional) When true, repeat headers are allowed. Numeric suffixes are added for disambiguation.

TSV

CONTENT_TYPE = (
TYPE = TSV
INFER_TYPES = { TRUE | FALSE }
[ HEADER = '<header>' ]
[ NULL_VALUE = '<null_value>' ]
[ MAX_COLUMNS = <integer> ]
[ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)

INFER_TYPES

Type: boolean
(Optional) When true, each column's data types are inferred as one of the following types: string, integer, double, boolean.
When false, all data is treated as a string.

HEADER

Type: string
Default: Empty string
(Optional) A string containing a comma separated list of column names.
When the TSV data include a header as the first row, HEADER property can be omitted. By omitting this property, it tells SQLake that a header row can be found in the data and it will take the following actions:
  1. 1.
    Use the first row for column names
  2. 2.
    Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the HEADER property when creating a JOB. This tells SQLake to take the following actions:
  1. 1.
    Use the provided HEADER property for column names
  2. 2.
    Do not skip the first row since it contains data
If your data does not include a header row and you do not set a HEADER property when creating the job, SQLake will assume the first row is a header and not process it.

NULL_VALUE

Type: text
(Optional) Values in the TSV that match the provided value are interpreted as null.

MAX_COLUMNS

Type: integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.

ALLOW_DUPLICATE_HEADERS

Type: boolean
Default: false
(Optional) When true, repeat headers are allowed. Numeric suffixes are added for disambiguation.

JSON

CONTENT_TYPE = (
TYPE = JSON
[ SPLIT_ROOT_ARRAY = { TRUE | FALSE } ]
[ STORE_JSON_AS_STRING = { TRUE | FALSE } ]
)

SPLIT_ROOT_ARRAY

Type: boolean
Default: true
(Optional) When true, a root object that is an array is parsed as separate events. When false, it is parsed as a single event which contains only an array.

STORE_JSON_AS_STRING

Type: boolean
Default: false
(Optional) When true, a copy of the original JSON is stored as a string value in an additional column.

SCHEMA_REGISTRY

Note that only Avro schemas are currently supported.
CONTENT_TYPE = (
TYPE = SCHEMA_REGISTRY
SCHEMA_REGISTRY_URL = '<url>'
)

SCHEMA_REGISTRY_URL

Type: text
Avro schema registry URL. To support schema evolution add {id} to the URL and Upsolver will embed the id from the AVRO header. For example, https://schema-registry.service.yourdomain.com/schemas/ids/{id}

FIXED_WIDTH

CONTENT_TYPE = (
TYPE = FIXED_WIDTH
[ COLUMNS = ( (COLUMN_NAME = '<column_name>'
START_INDEX = <integer>
END_INDEX = <integer>) [,...] ) ]
[ INFER_TYPES = { TRUE | FALSE } ]
)

COLUMNS

Type: list
(Optional) An array of the name, start index, and end index for each column in the file.

INFER_TYPES

Type: boolean
Default: false
(Optional) When true, each column's data types are inferred. When false, all data is treated as a string.

REGEX

See: Java Pattern
CONTENT_TYPE = (
TYPE = REGEX
[ PATTERN = '<pattern>' ]
[ MULTILINE = { TRUE | FALSE } ]
[ INFER_TYPES = { TRUE | FALSE } ]
)

PATTERN

Type: text
(Optional) The pattern to match against the input. Named groups are extracted from the data.

MULTILINE

Type: boolean
Default: false
(Optional) When true, the pattern is matched against the whole input. When false, it is matched against each line of the input.

INFER_TYPES

Type: boolean
Default: false
(Optional) When true, each column's data types are inferred. When false, all data is treated as a string.

SPLIT_LINES

CONTENT_TYPE = (
TYPE = SPLIT_LINES
PATTERN = '<pattern>'
)

PATTERN

Type: text
(Optional) A regular expression pattern to split the data by. If left empty, the data is split by lines.

XML

CONTENT_TYPE = (
TYPE = XML
[ STORE_ROOT_AS_STRING = { TRUE | FALSE } ]
)

STORE_ROOT_AS_STRING

Type: boolean
Default: false
(Optional) When true, a copy of the XML is stored as a string in an additional column.