Create a data source
This article provides a guide on how to create different types of data sources using an API call.
This API enables you to create a new data source. All API calls require an API token.
Amazon S3 (Quick)
Connect to your AWS S3 Bucket.
In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).
A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: S3 connection
Fields
Field
Name
Type
Description
Optional
bucket
Bucket
String
The Amazon S3 bucket to read from.
globFilePattern
Glob File Pattern
String
The pattern for files to ingest.
datePattern
Date Pattern
String
The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm
). The date format specification must be set according to Java DateTimeFormatter format.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
prefix
Folder
String
If the data resides in a sub folder within the defined cloud storage, specify this folder.
+
startExecutionFrom
Start Ingestion From
String (ISO-8601)
The time from which to ingest the data from.
Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example 1
Example 2
Amazon S3 (Advanced)
Connect to your AWS S3 Bucket.
In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).
A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: S3 connection
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
sourceStorage
S3 Connection
String
The cloud storage to ingest files from.
datePattern
Date Pattern
String
The date pattern in the file name/folder structure. For example: yyyy/MM/dd/HH/mm
. The date format specification must be set according to Java DateTimeFormatter format.
fileMatchPattern
File Name Pattern
FileNameMatcher
The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from the bucket specified.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See Content Formats.
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See Adding a Compute Cluster.
destinationStorage
Target Storage
String
The data and metadata files for this data source will be stored in this storage.
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
interval
Interval
Int (Minutes)
The sliding interval to wait for data.
prefix
Folder
String
If the data resides in a sub folder within the defined cloud storage, specify this folder.
+
startExecutionFrom
Start Ingestion From
String (ISO-8601)
The time from which to ingest the data from.
Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.
+
retention
Retention
Int (Minutes)
The retention period for the data.
+
dataDumpDate
Data Dump Date
String (ISO-8601)
The date that the data starts.
+
maxDelay
Max Delay
Int (Minutes)
The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.
+
Example
Amazon Kinesis Stream (Quick)
Connect to your Amazon Kinesis. Upsolver can read events from your Amazon Kinesis, according to the stream you define.
A prerequisite for defining an Amazon Kinesis stream connection is providing Upsolver with the appropriate credentials for reading from your Amazon Kinesis stream. See: Kinesis connection
Fields
Field
Name
Type
Description
Optional
region
Region
Region
Your AWS region.
streamName
Stream
String
The name of the relevant Kinesis stream.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
Amazon Kinesis Stream (Advanced)
Connect to your Amazon Kinesis. Upsolver can read events from your Amazon Kinesis, according to the stream you define.
A prerequisite for defining an Amazon Kinesis stream connection is providing Upsolver with the appropriate credentials for reading from your Amazon Kinesis stream. See: Kinesis connection
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
kinesisConnection
Kinesis Connection
String
The AWS credentials to connect to Kinesis.
streamName
Stream
String
The name of the relevant Kinesis stream.
readFromStart
Read From Start
String
The time from which to ingest the data from.
Messages from before this time will be ignored. If you leave this field empty all messages are ingested.
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
connectionPointer
Target Storage
String
The data and metadata files for this data source will be stored in this storage.
isOnline
Real Time Statistics
Boolean
Calculate this data source's statistics in real time directly from the input stream if a real time cluster is deployed.
shards
Shards
Int
How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.
parallelism
Parallelism
Int
The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.
+
endExecutionAt
End Read At
String (ISO-8601)
If configured, stop reading after this date.
+
Example
Amazon S3 over SQS
Connect to your AWS S3 Bucket using SQS Notifications.
You will need to configure SQS Notifications from your S3 Bucket and open permissions to read and delete messages from the SQS Queue to the same access key and secret key you entered to give Upsolver permissions to read from the S3 Bucket. See: S3 over SQS connection
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
sourceStorage
Source Storage
String
The cloud storage to ingest files from.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
destinationStorage
Target Storage
String
The data and metadata files for this data source will be stored in this storage.
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
executionParallelism
Parallelism
Int
The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.
prefix
Prefix
String
The prefix of the files or directories. To filter a specific directory, add a trailing /
.
+
suffix
Suffix
String
The suffix of the files to read.
+
startExecutionFrom
Start Ingestion From
String (ISO-8601)
The time from which to ingest the data from.
Messages from before this time will be ignored. If you leave this field empty all messages are ingested.
+
endExecutionAt
End Read At
String (ISO-8601)
If configured, stop reading after this date.
+
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.
+
Example
Apache Kafka (Quick)
Connect to any topic on your Kafka Servers. Upsolver can read events from your Kafka cluster from the specified Kafka topic.
A prerequisite for defining a Kafka stream connection is providing Upsolver with the appropriate credentials for reading from your Kafka cluster. See: Kafka connection
Fields
Field
Name
Type
Description
Optional
kafkaHosts
Kafka Hosts
String
The Kafka hosts separated with commas (e.g. foo:9092,bar:9092
)
topicName
Kafka Topic
String
The Kafka topic to ingest the data from.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
maybekafkaVersion
Kafka Version
KafkaVersion
The version of the Kafka Servers. If unsure, use 0.10.x.x
.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
Apache Kafka (Advanced)
Connect to any topic on your Kafka Servers. Upsolver can read events from your Kafka cluster from the specified Kafka topic.
A prerequisite for defining a Kafka stream connection is providing Upsolver with the appropriate credentials for reading from your Kafka cluster. See: Kafka connection
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
kafkaVersion
Kafka Version
KafkaVersion
The version of the Kafka Servers. If unsure, use 0.10.x.x
.
kafkaHosts
Kafka Hosts
String
The Kafka hosts separated with commas. For example: foo:9092,bar:9092
topicName
Kafka Topic
String
The Kafka topic to ingest the data from.
readFromStart
Read From Start
Boolean
Whether to read the data from the start of the topic or to begin from the end.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
connectionPointer
Target Storage
String
The data and metadata files for this data source will be stored in this storage.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
shards
Shards
Int
How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.
executionParallelism
Execution Parallelism
Int
The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.
isOnline
Real Time Statistics
Boolean
Calculate this data source's statistics in real time directly from the input stream if a real time cluster is deployed.
useSsl
Use SSL
Boolean
Set this to true if your connection requires SSL. Contact us to ensure that your SSL certificate is supported.
storeRawData
Store Raw Data
Boolean
Store an additional copy of the data in its original format.
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
consumerProperties
Kafka Consumer Properties
String
Extra properties for Kafka Consumer.
+
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.
+
endExecutionAt
End Read At
String (ISO-8601)
If configured, stop reading after this date.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
Azure Blob storage
Connect to your Azure Blob storage container.
In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).
A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: Azure Blob storage connection
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
sourceStorage
Azure Blob Storage Connection
String
The cloud storage to ingest files from.
datePattern
Date Pattern
String
The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm
). The date format specification must be set according to Java DateTimeFormatter format.
fileMatchPattern
File Name Pattern
FileNameMatcher
The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from the storage container specified.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
destinationStorage
Target Storage
String
The data and metadata files for this data source will be stored in this storage.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
interval
Interval
Int (Minutes)
The sliding interval to wait for data.
prefix
Folder
String
If the data resides in a sub folder within the defined cloud storage, specify this folder.
+
initialLoadConfiguration
Initial Load Configuration
InitialLoadConfiguration
If you have initial data, enter in a prefix and regex pattern to list the relevant data and select the required files.
+
startExecutionFrom
Start Ingestion From
String (ISO-8601)
The time from which to ingest the data from.
Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.
+
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.
+
dataDumpDate
Data Dump Date
String (ISO-8601)
The date that the data starts.
+
maxDelay
Max Delay
Int (Minutes)
The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
Google Cloud Storage
Connect to your Google Storage Bucket.
In order for Upsolver to read events directly from your cloud strage, files should be partitioned by date and time (which defines the folder structure in the cloud storage)
A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: Google Storage connection
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
sourceStorage
Google Storage Connection
String
The cloud storage to ingest files from.
datePattern
Date Pattern
String
The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm
). The date format specification must be set according to Java DateTimeFormatter format.
fileMatchPattern
File Name Pattern
FileNameMatcher
The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from storage source specified.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
destinationStorage
Target Storage
String
The data and metadata files for this data source will be stored in this storage.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
compression
Compression
Compression
The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).
interval
Interval
Int (Minutes)
The sliding interval to wait for data.
prefix
Folder
String
If the data resides in a sub folder within the defined cloud storage, specify this folder.
+
initialLoadConfiguration
Initial Load Configuration
InitialLoadConfiguration
If you have initial data, enter in a prefix and regex pattern to list the relevant data and select the required files.
startExecutionFrom
Start Ingestion From
String (ISO-8601)
The time from which to ingest the data from.
Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.
+
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.
+
dataDumpDate
Data Dump Date
String (ISO-8601)
The date that the data starts.
+
maxDelay
Max Delay
Int (Minutes)
The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
Kinesis-backed HTTP
Connect your stream using HTTP requests from any source.
Once you create the connection, you will be provided with an HTTP endpoint. Upsolver receives the data as POST, with the data in the body, and the data is stored in a Kinesis Stream until processed by Upsolver.
Headers sent with the request are also ingested as part of the stream, so metadata can be added to the request header.
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
ingestionEnvironment
Ingestion Cluster
String
storageConnection
Target Storage
String
The data and metadata files for this Data Source will be stored in this storage.
kinesisConnection
Kinesis Connection
String
The data and metadata files for this Data Source will be stored in this storage.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.
+
startExecutionFrom
Start Ingestion From
String (ISO-8601)
The time from which to ingest the data from.
Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.
+
endExecutionAt
End Read At
String (ISO-8601)
If configured, stop reading after this date.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
HTTP
Connect your stream using HTTP requests from any source.
Once you create the connection, you will be provided with an HTTP endpoint. Upsolver receives the data as POST, with the data in the body.
Headers sent with the request are also ingested as part of the stream, so metadata can be added to the request header.
Fields
Field
Name
Type
Description
Optional
displayData.name
Name
String
The data source name.
displayData.description
Description
String
The data source description.
contentType
Content Format
ContentType
The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.
For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.
Messages can be compressed, Upsolver automatically detects the compression type.
Supported compression types are: Zip, GZip, Snappy and None.
See: Content formats
computeEnvironment
Compute Cluster
String
The compute cluster to run the calculation on. See: Compute cluster
connectionPointer
Target Storage
String
The data and metadata files for this Data Source are stored in this storage.
softRetention
Soft Retention
Boolean
A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.
shards
Shards
Int
How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.
retention
Retention
Int (Minutes)
A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.
+
endExecutionAt
End Read At
String (ISO-8601)
If configured, stop reading after this date.
+
workspaces
Workspaces
String[]
The workspaces attached to this data source.
+
Example
Last updated