Data source properties
This article provides a list of data source properties and their corresponding descriptions.
Last updated
This article provides a list of data source properties and their corresponding descriptions.
Last updated
Note: Properties listed below may not apply to all data sources.
You can modify certain properties by clicking on the pencil icon.
Property
Description
Content Format
Content format of ingested data source (Avro, Parquet, ORC, JSON, CSV, TSV, x-www-form-urlencoded, Protobuf, Avro-record, Avro with Schema Registry, or XML).
Compute Cluster
Compute cluster this data source is running on.
Target Storage
Where to store the data read (the output storage).
Retention
The retention policy in Upsolver in minutes, hours, or days. The data is deleted permanently after this period elapses (unless Soft Retention is set to Yes). By default, the data is kept forever.
Soft Retention
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
<Type> Connection
The connection string that represents a resource and its credentials.
Kafka Version
The Kafka version.
Kafka Topic
The Kafka topic.
Stream Name
The Kinesis stream name.
Read From Start
Whether the stream was read from the start.
Folder
The data folder (e.g. billing data). If this is not specified, the data is assumed to be in the top-level of the hierarchy.
Date Pattern
The date pattern of the files ingested.
Start Ingestion From
When ingestion started.
Property
Description
Infer Types
Whether or not to infer types.
Header
If applicable, the header of the file.
Delimiter
The delimiter between columns of data.
Null Value
If applicable, the default null value in the data.
Property
Description
Split Root Array
If the content is an array of JSON records, select to split the array content into separate records.
Store JSON as String
Whether to store the JSON in native format in a separate field.
Property
Description
Real Time Statistics
Whether the data source statistics are calculated in real-time directly from the input stream. This is relevant to lookup tables, when answers are required very fast and in real-time.
Shards
The number of shards, and the more shards the quicker the data processing.
Execution Parallelism
The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases.
End Read At
When to stop reading the stream.
Use SSL
Whether or not to use SSL.
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
Store Raw Data
Whether to store an additional copy of the data in its original format.
Data Dump Date
The date that the data starts.
Max Delay
The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.
Interval
The sliding interval to wait for data.
Property
Description
Prefix
The file prefix.
Pattern
The file pattern.
Property
Description
File Name Pattern
The file name pattern.
Property
Description
Name
Data source name in Upsolver.
Description
Description of data source.
Creation Time
Time this data source was created.
Created By
Which user this data source was created by.
Modified Time
If applicable, time this data source was last modified.
Modified By
Which user last modified this data source.
Property
Description
Show Sparse Fields
Select to show fields with density under 0.01%.
Attached Workspaces
The workspaces attached to the data source.