Data source properties
This article provides a list of data source properties and their corresponding descriptions.
Note: Properties listed below may not apply to all data sources.
You can modify certain properties by clicking on the pencil icon
Content format of ingested data source (Avro, Parquet, ORC, JSON, CSV, TSV, x-www-form-urlencoded, Protobuf, Avro-record, Avro with Schema Registry, or XML).
Compute cluster this data source is running on.
Where to store the data read (the output storage).
The retention policy in Upsolver in minutes, hours, or days. The data is deleted permanently after this period elapses (unless Soft Retention is set to Yes). By default, the data is kept forever.
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
The connection string that represents a resource and its credentials.
The Kafka version.
The Kafka topic.
The Kinesis stream name.
Read From Start
Whether the stream was read from the start.
The data folder (e.g. billing data). If this is not specified, the data is assumed to be in the top-level of the hierarchy.
The date pattern of the files ingested.
Start Ingestion From
When ingestion started.
Whether or not to infer types.
If applicable, the header of the file.
The delimiter between columns of data.
If applicable, the default null value in the data.
Split Root Array
If the content is an array of JSON records, select to split the array content into separate records.
Store JSON as String
Whether to store the JSON in native format in a separate field.
Real Time Statistics
Whether the data source statistics are calculated in real-time directly from the input stream. This is relevant to lookup tables, when answers are required very fast and in real-time.
The number of shards, and the more shards the quicker the data processing.
The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases.
End Read At
When to stop reading the stream.
Whether or not to use SSL.
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
Store Raw Data
Whether to store an additional copy of the data in its original format.
Data Dump Date
The date that the data starts.
The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.
The sliding interval to wait for data.
The file prefix.
The file pattern.
File Name Pattern
The file name pattern.
Data source name in Upsolver.
Description of data source.
Time this data source was created.
Which user this data source was created by.
If applicable, time this data source was last modified.
Which user last modified this data source.
Show Sparse Fields
Select to show fields with density under 0.01%.
The workspaces attached to the data source.