Create a data source

This article provides a guide on how to create different types of data sources using an API call.

This API enables you to create a new data source. All API calls require an API token.

POST https://api.upsolver.com/inputs

Amazon S3 (Quick)

Connect to your AWS S3 Bucket.

In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: S3 connection

Fields

Field

Name

Type

Description

Optional

bucket

Bucket

String

The Amazon S3 bucket to read from.

globFilePattern

Glob File Pattern

String

The pattern for files to ingest.

datePattern

Date Pattern

String

The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm). The date format specification must be set according to Java DateTimeFormatter format.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example 1

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "QuickS3StorageInputRequest",
	"bucket" : "bucket",
	"globFilePattern" : "globFilePattern",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"contentType" : {
		"clazz" : "JsonContentType"
	},
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"displayData" : {
		"name" : "First Amazon S3 Data Source",
		"description" : "Description of first Amazon 
S3 data source"
	},
	"softRetention" : true
}' "https://api.upsolver.com/inputs/DATA-SOURCE-ID"

Example 2

curl 
-X POST 
-H "content-type: application/json" 
-H "Authorization: token" 
-d '{"clazz":"QuickS3StorageInputRequest", 
     "bucket": "upsolver-tutorials-orders", 
     "globFilePattern": "*", 
     "datePattern": "yyyy/MM/dd/HH", 
     "prefix": "data/", 
     "contentType":
                  {
                     "clazz": "JsonContentType"
                  }, 
     "compression": 
                  {
                     "clazz": "AutoDetectCompression"
                  }, 
     "displayData": 
                  {
                     "name": "API test data source", 
                     "description": "Description of first Amazon S3 data source"}, 
                     "softRetention":false
                  }' 
      "https://your-api.upsolver.com/inputs/"

Amazon S3 (Advanced)

Connect to your AWS S3 Bucket.

In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: S3 connection

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

S3 Connection

String

The cloud storage to ingest files from.

datePattern

Date Pattern

String

The date pattern in the file name/folder structure. For example: yyyy/MM/dd/HH/mm. The date format specification must be set according to Java DateTimeFormatter format.

fileMatchPattern

File Name Pattern

FileNameMatcher

The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from the bucket specified.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See Content Formats.

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See Adding a Compute Cluster.

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

interval

Interval

Int (Minutes)

The sliding interval to wait for data.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

retention

Retention

Int (Minutes)

The retention period for the data.

+

dataDumpDate

Data Dump Date

String (ISO-8601)

The date that the data starts.

+

maxDelay

Max Delay

Int (Minutes)

The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "CloudStorageInputRequest",
	"displayData" : {
		"name" : "First Amazon S3 Data Source",
		"description" : "Description of first Amazon 
S3 data source"
	},
	"sourceStorage" : "aa302f0a-e6ee-44aa-aa38-e28f1ff455f7",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"fileMatchPattern" : {
		"clazz" : "AllMatcher"
	},
	"contentType" : {
		"type" : "JsonContentType"
	,
	"computeEnvironment" : "53b314af-ffab-419a-9c2c-56032c6ef4c0",
	"destinationStorage" : "e019c0fe-bb80-4cf1-bc7b-aee579d8672e",
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"interval" : 120
}' "https://api.upsolver.com/api/v1/data-source/"

Amazon Kinesis Stream (Quick)

Connect to your Amazon Kinesis. Upsolver can read events from your Amazon Kinesis, according to the stream you define.

A prerequisite for defining an Amazon Kinesis stream connection is providing Upsolver with the appropriate credentials for reading from your Amazon Kinesis stream. See: Kinesis connection

Fields

Field

Name

Type

Description

Optional

region

Region

Region

Your AWS region.

streamName

Stream

String

The name of the relevant Kinesis stream.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "QuickKinesisInputRequest",
	"region" : "region",
	"streamName" : "streamName",
	"contentType" : {
		"type" : "JsonContentType"
	},
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"displayData" : {
		"name" : "First Amazon Kinesis Stream Data 
Source",
		"description" : "Description of first Amazon 
Kinesis Stream data source"
	},
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

Amazon Kinesis Stream (Advanced)

Connect to your Amazon Kinesis. Upsolver can read events from your Amazon Kinesis, according to the stream you define.

A prerequisite for defining an Amazon Kinesis stream connection is providing Upsolver with the appropriate credentials for reading from your Amazon Kinesis stream. See: Kinesis connection

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

kinesisConnection

Kinesis Connection

String

The AWS credentials to connect to Kinesis.

streamName

Stream

String

The name of the relevant Kinesis stream.

readFromStart

Read From Start

String

The time from which to ingest the data from.

Messages from before this time will be ignored. If you leave this field empty all messages are ingested.

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

connectionPointer

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

isOnline

Real Time Statistics

Boolean

Calculate this data source's statistics in real time directly from the input stream if a real time cluster is deployed.

shards

Shards

Int

How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.

parallelism

Parallelism

Int

The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KinesisInputRequest",
	"displayData" : {
		"name" : "First Amazon Kinesis Stream Data 
Source",
		"description" : "Description of first Amazon 
Kinesis Stream data source"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"kinesisConnection" : "dcfc39c1-c458-4ec5-87f1-0c7f437ea17e",
	"streamName" : "streamName",
	"readFromStart" : true,
	"computeEnvironment" : "fc2a356d-c3ae-4756-a4a3-1b15158df5e7",
	"connectionPointer" : "4e3787cd-daac-42ea-8dd4-642c516a4a31",
	"isOnline" : true,
	"shards" : 1,
	"parallelism" : 1,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	}
}' "https://api.upsolver.com/api/v1/data-source/"

Amazon S3 over SQS

Connect to your AWS S3 Bucket using SQS Notifications.

You will need to configure SQS Notifications from your S3 Bucket and open permissions to read and delete messages from the SQS Queue to the same access key and secret key you entered to give Upsolver permissions to read from the S3 Bucket. See: S3 over SQS connection

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

Source Storage

String

The cloud storage to ingest files from.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

executionParallelism

Parallelism

Int

The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

prefix

Prefix

String

The prefix of the files or directories. To filter a specific directory, add a trailing /.

+

suffix

Suffix

String

The suffix of the files to read.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Messages from before this time will be ignored. If you leave this field empty all messages are ingested.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "S3OverSQSInputRequest",
	"displayData" : {
		"name" : "First S3 Over SQS Data Source",
		"description" : "Description of first S3 
Over SQS data source"
	},
	"sourceStorage" : "6ee598b6-4928-4b07-b532-83a79464e5bb",
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "46c2945e-5a46-4d93-9e21-5a85653c28c5",
	"destinationStorage" : "2a08601c-52d2-425e-8d2c-324dbcea5858",
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"softRetention" : true,
	"executionParallelism" : 1
}' "https://api.upsolver.com/api/v1/data-source/"

Apache Kafka (Quick)

Connect to any topic on your Kafka Servers. Upsolver can read events from your Kafka cluster from the specified Kafka topic.

A prerequisite for defining a Kafka stream connection is providing Upsolver with the appropriate credentials for reading from your Kafka cluster. See: Kafka connection

Fields

Field

Name

Type

Description

Optional

kafkaHosts

Kafka Hosts

String

The Kafka hosts separated with commas (e.g. foo:9092,bar:9092)

topicName

Kafka Topic

String

The Kafka topic to ingest the data from.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

maybekafkaVersion

Kafka Version

KafkaVersion

The version of the Kafka Servers. If unsure, use 0.10.x.x.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "QuickKafkaInputRequest",
	"kafkaHosts" : "kafkaHosts",
	"topicName" : "topicName",
	"contentType" : {
		"type" : "JsonContentType"
	},
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"displayData" : {
		"name" : "First Kafka Data Source",
		"description" : "Description of first 
Kafka data source"
	},
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

Apache Kafka (Advanced)

Connect to any topic on your Kafka Servers. Upsolver can read events from your Kafka cluster from the specified Kafka topic.

A prerequisite for defining a Kafka stream connection is providing Upsolver with the appropriate credentials for reading from your Kafka cluster. See: Kafka connection

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

kafkaVersion

Kafka Version

KafkaVersion

The version of the Kafka Servers. If unsure, use 0.10.x.x.

kafkaHosts

Kafka Hosts

String

The Kafka hosts separated with commas. For example: foo:9092,bar:9092

topicName

Kafka Topic

String

The Kafka topic to ingest the data from.

readFromStart

Read From Start

Boolean

Whether to read the data from the start of the topic or to begin from the end.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

connectionPointer

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

shards

Shards

Int

How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.

executionParallelism

Execution Parallelism

Int

The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

isOnline

Real Time Statistics

Boolean

Calculate this data source's statistics in real time directly from the input stream if a real time cluster is deployed.

useSsl

Use SSL

Boolean

Set this to true if your connection requires SSL. Contact us to ensure that your SSL certificate is supported.

storeRawData

Store Raw Data

Boolean

Store an additional copy of the data in its original format.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

consumerProperties

Kafka Consumer Properties

String

Extra properties for Kafka Consumer.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" 
-H "Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KafkaInputRequest",
	"displayData" : {
		"name" : "First Kafka Data Source",
		"description" : "Description of first 
Kafka data source"
	},
	"kafkaVersion" : "0.10.x.x",
	"kafkaHosts" : "kafkaHosts",
	"topicName" : "topicName",
	"readFromStart" : true,
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "5d8b6e27-6004-48b2-8dc1-1db9f25880cb",
	"connectionPointer" : "36a1d237-1ff0-4574-972b-b071482f3d08",
	"softRetention" : true,
	"shards" : 1,
	"executionParallelism" : 1,
	"isOnline" : true,
	"useSsl" : true,
	"storeRawData" : true,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	}
}' "https://api.upsolver.com/api/v1/data-source/"

Azure Blob storage

Connect to your Azure Blob storage container.

In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: Azure Blob storage connection

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

Azure Blob Storage Connection

String

The cloud storage to ingest files from.

datePattern

Date Pattern

String

The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm). The date format specification must be set according to Java DateTimeFormatter format.

fileMatchPattern

File Name Pattern

FileNameMatcher

The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from the storage container specified.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

interval

Interval

Int (Minutes)

The sliding interval to wait for data.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

initialLoadConfiguration

Initial Load Configuration

InitialLoadConfiguration

If you have initial data, enter in a prefix and regex pattern to list the relevant data and select the required files.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

dataDumpDate

Data Dump Date

String (ISO-8601)

The date that the data starts.

+

maxDelay

Max Delay

Int (Minutes)

The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "AzureBlobStorageInputRequest",
	"displayData" : {
		"name" : "First Azure Blob Storage Data 
Source",
		"description" : "Description of first Azure 
Blob Storage data source"
	},
	"sourceStorage" : "2da1c848-9c44-4b4e-a226-4ffebf0d9c49",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"fileMatchPattern" : {
		"clazz" : "AllMatcher"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "1bce245a-0c76-4fe7-acb5-6f2bf4b64d8f",
	"destinationStorage" : "004974e0-e384-466e-8ac4-7429c30614e3",
	"softRetention" : true,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"interval" : 120
}' "https://api.upsolver.com/api/v1/data-source/"

Google Cloud Storage

Connect to your Google Storage Bucket.

In order for Upsolver to read events directly from your cloud strage, files should be partitioned by date and time (which defines the folder structure in the cloud storage)

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See: Google Storage connection

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

Google Storage Connection

String

The cloud storage to ingest files from.

datePattern

Date Pattern

String

The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm). The date format specification must be set according to Java DateTimeFormatter format.

fileMatchPattern

File Name Pattern

FileNameMatcher

The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from storage source specified.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

interval

Interval

Int (Minutes)

The sliding interval to wait for data.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

initialLoadConfiguration

Initial Load Configuration

InitialLoadConfiguration

If you have initial data, enter in a prefix and regex pattern to list the relevant data and select the required files.

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

dataDumpDate

Data Dump Date

String (ISO-8601)

The date that the data starts.

+

maxDelay

Max Delay

Int (Minutes)

The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "GoogleCloudStorageInputRequest",
	"displayData" : {
		"name" : "First Google Cloud Storage 
Data Source",
		"description" : "Description of first Google 
Cloud Storage data source"
	},
	"sourceStorage" : "0a617e70-c2b2-4870-a908-f4c864c4961f",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"fileMatchPattern" : {
		"clazz" : "AllMatcher"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "d4f6ee48-0782-4789-b661-4e19679a541a",
	"destinationStorage" : "8ddd7397-e1ed-4b21-a9cb-91aa923ea165",
	"softRetention" : true,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"interval" : 120
}' "https://api.upsolver.com/api/v1/data-source/"

Kinesis-backed HTTP

Connect your stream using HTTP requests from any source.

Once you create the connection, you will be provided with an HTTP endpoint. Upsolver receives the data as POST, with the data in the body, and the data is stored in a Kinesis Stream until processed by Upsolver.

Headers sent with the request are also ingested as part of the stream, so metadata can be added to the request header.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

ingestionEnvironment

Ingestion Cluster

String

storageConnection

Target Storage

String

The data and metadata files for this Data Source will be stored in this storage.

kinesisConnection

Kinesis Connection

String

The data and metadata files for this Data Source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KinesisHttpInputRequest",
	"displayData" : {
		"name" : "First Kinesis backed HTTP Data 
Source",
		"description" : "Description of first Kinesis 
backed HTTP data source"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "bdb99c82-4874-4e41-b735-090cc53af71e",
	"ingestionEnvironment" : "c71ba158-8022-494e-84e1-e0623533ef1c",
	"storageConnection" : "f4376c08-1b13-495b-904f-768effa818da",
	"kinesisConnection" : "44a93a4b-5083-4f26-90cb-c3ea7284e3d3",
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

HTTP

Connect your stream using HTTP requests from any source.

Once you create the connection, you will be provided with an HTTP endpoint. Upsolver receives the data as POST, with the data in the body.

Headers sent with the request are also ingested as part of the stream, so metadata can be added to the request header.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

See: Content formats

computeEnvironment

Compute Cluster

String

The compute cluster to run the calculation on. See: Compute cluster

connectionPointer

Target Storage

String

The data and metadata files for this Data Source are stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

shards

Shards

Int

How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KinesisHttpInputRequest",
	"displayData" : {
		"name" : "First Kinesis backed HTTP Data 
Source",
		"description" : "Description of first 
Kinesis backed HTTP data source"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "bdb99c82-4874-4e41-b735-090cc53af71e",
	"ingestionEnvironment" : "c71ba158-8022-494e-84e1-e0623533ef1c",
	"storageConnection" : "f4376c08-1b13-495b-904f-768effa818da",
	"kinesisConnection" : "44a93a4b-5083-4f26-90cb-c3ea7284e3d3",
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

Last updated