Content types
This page describes the job options for ingesting data from different types of content.
When reading in your data, additional options can be configured for the following content types:
CONTENT_TYPE = (
TYPE = CSV
INFER_TYPES = { TRUE | FALSE }
[ HEADER = '<header>' ]
[ HEADER_LNE = '<header>,<header>,...' ]
[ DELIMITER = '<delimiter>' ]
[ QUOTE_ESCAPE_CHAR = '<char>' ]
[ NULL_VALUE = '<null_value>' ]
[ MAX_COLUMNS = <integer> ]
[ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)
Type:
boolean
(Optional) When
true
, each column's data type is inferred as one of the following types: string
, integer
, double
, boolean
.When
false
, all data is treated as a string.Type:
string
Default: Empty string
(Optional) A string containing a comma-separated list of column names.
When the CSV data include a header as the first row, the
HEADER
property can be omitted. By omitting this property, it tells SQLake that a header row can be found in the data and it will take the following actions:- 1.Use the first row for column names
- 2.Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the
HEADER
property when creating a JOB
. This tells SQLake to take the following actions:- 1.Use the provided
HEADER
property for column names - 2.Do not skip the first row since it contains data
If your data does not include a header row and you do not set a
HEADER
property when creating the job, SQLake will assume the first row is a header and not process it.Type:
string
Default: Empty string
(Optional) A string containing a comma-separated list of header names. This is an alternative to
HEADER
.Type:
text
Default:
,
(Optional) The delimiter used for columns in the CSV file
Type:
text
Default:
"
(Optional) Defines the character used for escaping quotes inside an already quoted value.
Type:
text
(Optional) Values in the CSV that match the provided value are interpreted as null.
Type:
integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
Type:
boolean
Default:
false
(Optional) When
true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.CONTENT_TYPE = (
TYPE = TSV
INFER_TYPES = { TRUE | FALSE }
[ HEADER = '<header>' ]
[ HEADER_LNE = '<header>,<header>,...' ]
[ NULL_VALUE = '<null_value>' ]
[ MAX_COLUMNS = <integer> ]
[ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)
Type:
boolean
(Optional) When
true
, each column's data types are inferred as one of the following types: string
, integer
, double
, boolean
.When
false
, all data is treated as a string.Type:
string
Default: Empty string
(Optional) A string containing a comma separated list of column names.
When the TSV data include a header as the first row,
HEADER
property can be omitted. By omitting this property, it tells SQLake that a header row can be found in the data and it will take the following actions:- 1.Use the first row for column names
- 2.Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the
HEADER
property when creating a JOB
. This tells SQLake to take the following actions:- 1.Use the provided
HEADER
property for column names - 2.Do not skip the first row since it contains data
If your data does not include a header row and you do not set a
HEADER
property when creating the job, SQLake will assume the first row is a header and not process it.Type:
string
Default: Empty string
(Optional) A string containing a comma-separated list of header names. This is an alternative to
HEADER
.Type:
text
(Optional) Values in the TSV that match the provided value are interpreted as null.
Type:
integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
Type:
boolean
Default:
false
(Optional) When
true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.CONTENT_TYPE = (
TYPE = JSON
[ SPLIT_ROOT_ARRAY = { TRUE | FALSE } ]
[ STORE_JSON_AS_STRING = { TRUE | FALSE } ]
)
Type:
boolean
Default:
true
(Optional) When
true
, a root object that is an array is parsed as separate events. When false
, it is parsed as a single event which contains only an array.Type:
boolean
Default:
false
(Optional) When
true
, a copy of the original JSON is stored as a string value in an additional column.Note that only Avro schemas are currently supported.
CONTENT_TYPE = (
TYPE = SCHEMA_REGISTRY
SCHEMA_REGISTRY_URL = '<url>'
)
Type:
text
Avro schema registry URL. To support schema evolution add
{id}
to the URL and Upsolver will embed the id from the AVRO header. For example, https://schema-registry.service.yourdomain.com/schemas/ids/{id}
CONTENT_TYPE = (
TYPE = FIXED_WIDTH
[ COLUMNS = ( (COLUMN_NAME = '<column_name>'
START_INDEX = <integer>
END_INDEX = <integer>) [,...] ) ]
[ INFER_TYPES = { TRUE | FALSE } ]
)
Type:
list
(Optional) An array of the name, start index, and end index for each column in the file.
Type:
boolean
Default:
false
(Optional) When
true
, each column's data type is inferred. When false
, all data is treated as a string.CONTENT_TYPE = (
TYPE = REGEX
[ PATTERN = '<pattern>' ]
[ MULTILINE = { TRUE | FALSE } ]
[ INFER_TYPES = { TRUE | FALSE } ]
)
Type:
text
(Optional) The pattern to match against the input. Named groups are extracted from the data.
Type:
boolean
Default:
false
(Optional) When
true
, the pattern is matched against the whole input. When false
, it is matched against each line of the input.Type:
boolean
Default:
false
(Optional) When
true
, each column's data types is inferred. When false
, all data is treated as a string.CONTENT_TYPE = (
TYPE = SPLIT_LINES
PATTERN = '<pattern>'
)
Type:
text
(Optional) A regular expression pattern to split the data by. If left empty, the data is split by lines.
CONTENT_TYPE = (
TYPE = XML
[ STORE_ROOT_AS_STRING = { TRUE | FALSE } ]
)
Type:
boolean
Default:
false
(Optional) When
true
, a copy of the XML is stored as a string in an additional column.Last modified 2mo ago