Content type options
When reading in your data, additional options can be configured for the following content types:
CONTENT_TYPE = (
TYPE = CSV
INFER_TYPES = { TRUE | FALSE }
[ HEADER = '<header>' ]
[ DELIMITER = '<delimiter>' ]
[ QUOTE_ESCAPE_CHAR = '<char>' ]
[ NULL_VALUE = '<null_value>' ]
[ MAX_COLUMNS = <integer> ]
[ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)
Type:
boolean
(Optional) When
true
, each column's data types are inferred as one of the following types: string
, integer
, double
, boolean
.When
false
, all data is treated as a string.Type:
string
Default: Empty string
(Optional) A string containing a comma separated list of column names.
When the CSV data include a header as the first row,
HEADER
property can be omitted. By omitting this property, it tells SQLake that a header row can be found in the data and it will take the following actions:- 1.Use the first row for column names
- 2.Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the
HEADER
property when creating a JOB
. This tells SQLake to take the following actions:- 1.Use the provided
HEADER
property for column names - 2.Do not skip the first row since it contains data
If your data does not include a header row and you do not set a
HEADER
property when creating the job, SQLake will assume the first row is a header and not process it.Type:
text
Default:
,
(Optional) The delimiter used for columns in the CSV file
Type:
text
Default:
"
(Optional) Defines the character used for escaping quotes inside an already quoted value.
Type:
text
(Optional) Values in the CSV that match the provided value are interpreted as null.
Type:
integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
Type:
boolean
Default:
false
(Optional) When
true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.CONTENT_TYPE = (
TYPE = TSV
INFER_TYPES = { TRUE | FALSE }
[ HEADER = '<header>' ]
[ NULL_VALUE = '<null_value>' ]
[ MAX_COLUMNS = <integer> ]
[ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)
Type:
boolean
(Optional) When
true
, each column's data types are inferred as one of the following types: string
, integer
, double
, boolean
.When
false
, all data is treated as a string.Type:
string
Default: Empty string
(Optional) A string containing a comma separated list of column names.
When the TSV data include a header as the first row,
HEADER
property can be omitted. By omitting this property, it tells SQLake that a header row can be found in the data and it will take the following actions:- 1.Use the first row for column names
- 2.Skip the first row when processing the data
If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the
HEADER
property when creating a JOB
. This tells SQLake to take the following actions:- 1.Use the provided
HEADER
property for column names - 2.Do not skip the first row since it contains data
If your data does not include a header row and you do not set a
HEADER
property when creating the job, SQLake will assume the first row is a header and not process it.Type:
text
(Optional) Values in the TSV that match the provided value are interpreted as null.
Type:
integer
(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.
Type:
boolean
Default:
false
(Optional) When
true
, repeat headers are allowed. Numeric suffixes are added for disambiguation.CONTENT_TYPE = (
TYPE = JSON
[ SPLIT_ROOT_ARRAY = { TRUE | FALSE } ]
[ STORE_JSON_AS_STRING = { TRUE | FALSE } ]
)
Type:
boolean
Default:
true
(Optional) When
true
, a root object that is an array is parsed as separate events. When false
, it is parsed as a single event which contains only an array.Type:
boolean
Default:
false
(Optional) When
true
, a copy of the original JSON is stored as a string value in an additional column.Note that only Avro schemas are currently supported.
CONTENT_TYPE = (
TYPE = SCHEMA_REGISTRY
SCHEMA_REGISTRY_URL = '<url>'
)
Type:
text
Avro schema registry URL. To support schema evolution add {id} to the URL and Upsolver will embed the id from the AVRO header. For example, https://schema-registry.service.yourdomain.com/schemas/ids/{id}
CONTENT_TYPE = (
TYPE = FIXED_WIDTH
[ COLUMNS = ( (COLUMN_NAME = '<column_name>'
START_INDEX = <integer>
END_INDEX = <integer>) [,...] ) ]
[ INFER_TYPES = { TRUE | FALSE } ]
)
Type:
list
(Optional) An array of the name, start index, and end index for each column in the file.
Type:
boolean
Default:
false
(Optional) When
true
, each column's data types are inferred. When false
, all data is treated as a string.CONTENT_TYPE = (
TYPE = REGEX
[ PATTERN = '<pattern>' ]
[ MULTILINE = { TRUE | FALSE } ]
[ INFER_TYPES = { TRUE | FALSE } ]
)
Type:
text
(Optional) The pattern to match against the input. Named groups are extracted from the data.
Type:
boolean
Default:
false
(Optional) When
true
, the pattern is matched against the whole input. When false
, it is matched against each line of the input.Type:
boolean
Default:
false
(Optional) When
true
, each column's data types are inferred. When false
, all data is treated as a string.CONTENT_TYPE = (
TYPE = SPLIT_LINES
PATTERN = '<pattern>'
)
Type:
text
(Optional) A regular expression pattern to split the data by. If left empty, the data is split by lines.
CONTENT_TYPE = (
TYPE = XML
[ STORE_ROOT_AS_STRING = { TRUE | FALSE } ]
)
Type:
boolean
Default:
false
(Optional) When
true
, a copy of the XML is stored as a string in an additional column.Last modified 2d ago