Data source properties
This article provides a list of data source properties and their corresponding descriptions.
Last updated
This article provides a list of data source properties and their corresponding descriptions.
Last updated
Note: Properties listed below may not apply to all data sources.
You can modify certain properties by clicking on the pencil icon.
Property | Description |
Content Format | Content format of ingested data source (Avro, Parquet, ORC, JSON, CSV, TSV, x-www-form-urlencoded, Protobuf, Avro-record, Avro with Schema Registry, or XML). |
Compute Cluster | Compute cluster this data source is running on. |
Target Storage | Where to store the data read (the output storage). |
Retention | The retention policy in Upsolver in minutes, hours, or days. The data is deleted permanently after this period elapses (unless Soft Retention is set to Yes). By default, the data is kept forever. |
Soft Retention | A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted. |
<Type> Connection | The connection string that represents a resource and its credentials. |
Kafka Version | The Kafka version. |
Kafka Topic | The Kafka topic. |
Stream Name | The Kinesis stream name. |
Read From Start | Whether the stream was read from the start. |
Folder | The data folder (e.g. billing data). If this is not specified, the data is assumed to be in the top-level of the hierarchy. |
Date Pattern | The date pattern of the files ingested. |
Start Ingestion From | When ingestion started. |
Property | Description |
Infer Types | Whether or not to infer types. |
Header | If applicable, the header of the file. |
Delimiter | The delimiter between columns of data. |
Null Value | If applicable, the default null value in the data. |
Property | Description |
Split Root Array | If the content is an array of JSON records, select to split the array content into separate records. |
Store JSON as String | Whether to store the JSON in native format in a separate field. |
Property | Description |
Real Time Statistics | Whether the data source statistics are calculated in real-time directly from the input stream. This is relevant to lookup tables, when answers are required very fast and in real-time. |
Shards | The number of shards, and the more shards the quicker the data processing. |
Execution Parallelism | The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases. |
End Read At | When to stop reading the stream. |
Use SSL | Whether or not to use SSL. |
Compression | The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None). |
Store Raw Data | Whether to store an additional copy of the data in its original format. |
Data Dump Date | The date that the data starts. |
Max Delay | The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out. |
Interval | The sliding interval to wait for data. |
Property | Description |
Prefix | The file prefix. |
Pattern | The file pattern. |
Property | Description |
File Name Pattern | The file name pattern. |
Property | Description |
Name | Data source name in Upsolver. |
Description | Description of data source. |
Creation Time | Time this data source was created. |
Created By | Which user this data source was created by. |
Modified Time | If applicable, time this data source was last modified. |
Modified By | Which user last modified this data source. |
Property | Description |
Show Sparse Fields | Select to show fields with density under 0.01%. |
Attached Workspaces | The workspaces attached to the data source. |