Upsolver
Contact Support
  • Welcome to Upsolver
  • Getting Started
    • Start using Upsolver for free
    • Get started as a Upsolver user
      • Upsolver in 5 minutes
        • Upsolver Quickstart in 5 minutes
          • Additional sandbox fun
        • Amazon Athena data output
        • MySQL (AWS RDS) data output
        • Local MySQL data output
      • Upsolver free training
        • Introduction to Upsolver
          • Transform and write data to Amazon Athena
          • Pre-aggregate data for efficiency and performance
          • UPSERT streaming data to Amazon Athena
      • Prerequisites for AWS deployment
      • AWS integration
      • Deploy Upsolver on your AWS account
      • Prerequisites for Azure Deployment
      • Azure Integration
        • Prerequisites for Azure Users
        • Log into Upsolver
        • Log into Azure & Authenticate
        • Set Up and Deploy Azure Resources
        • Delegate Resource Group, and Deploy Upsolver in Azure
        • Integrate Azure with Upsolver
    • Upsolver concepts
      • Deployment models
      • Upsolver components
      • Data ingestion
    • Upsolver Amazon AWS deployment guide
      • Private VPC
      • Upsolver VPC
      • AWS role permissions
      • VPC peering
    • Tutorials and FAQ
      • Tutorials
        • How To Re-process Data
        • Create an Amazon S3 data source
        • Create an Amazon Athena data output
        • Join multiple data streams for real-time analytics
        • Use Upsolver to index less data into Splunk
        • Upsert and delete use case
        • AWS S3 to Athena use case
        • Merge data use case
        • Full vs. Partial Inbound Data Records
      • FAQ
      • Infrastructure
        • What is a dry-run cluster?
    • Glossary
      • Language guide
        • SQL syntax reference
        • Functions
          • Aggregation Functions
            • APPROX_COUNT_DISTINCT
            • APPROX_COUNT_DISTINCT_EACH
            • AVG
            • AVG_EACH
            • AVG_TIME_SERIES
            • COLLECT_SET
            • COLLECT_SET_EACH
            • COUNT
            • COUNT(*)
            • COUNT_DISTINCT
            • COUNT_EACH
            • COUNT_IF
            • DECAYED_SUM
            • DYNAMIC_SESSIONS
            • FIRST
            • FIRST_ARRAY
            • FIRST_EACH
            • FIRST_TIME_SERIES
            • LAST
            • LAST_ARRAY
            • LAST_EACH
            • LAST_K
            • LAST_K_EACH
            • LAST_TIME_SERIES
            • MAX
            • MAX_BY
            • MAX_EACH
            • MAX_TIME_SERIES
            • MIN
            • MIN_BY
            • MIN_EACH
            • MIN_TIME_SERIES
            • SESSION_COUNT
            • STD_DEV
            • STD_DEV_EACH
            • STRING_MAX_EACH
            • STRING_MIN_EACH
            • SUM
            • SUM_EACH
            • SUM_TIME_SERIES
            • WEIGHTED_AVERAGE
          • Calculated functions
            • Aerospike functions
            • Array functions
            • Conditional functions
            • Date functions
            • External API functions
            • Filter functions
            • Numeric functions
            • Spatial functions
            • String functions
            • Structural functions
              • ZIP
            • Type conversion functions
      • Data formats
      • Data types and features
      • Database output options
      • Upsolver shards
      • Permissions list
      • Index
    • Troubleshooting
      • My CloudFormation stack failed to deploy
      • My private API doesn't start or I can't connect to it
        • Elastic IPs limit reached
        • EC2 Spot Instance not running
        • DNS cache
        • Security group not open
      • My compute cluster doesn't start
      • I can't connect to my Kafka cluster
      • I can't create an S3 data source
      • Data doesn't appear in Athena table
      • I get an exception when querying my Athena table
      • Unable to define a JDBC (Postgres) connection
  • Connecting data sources
    • Amazon AWS data sources
      • Amazon S3 data source
        • Quick guide: S3 data source
        • Full guide: S3 data source
      • Amazon Kinesis Stream data source
      • Amazon S3 over SQS data source
      • Amazon AppFlow data source
        • Setup Google Analytics client ID and client secret.
    • Microsoft Azure data sources
      • Azure Blob storage data source
      • Azure Event Hubs data source
    • Kafka data source
    • Google Cloud Storage data source
    • File upload data source
    • CDC data sources (Debezium)
      • MySQL CDC data source
        • Binlog retention in MySQL
      • PostgreSQL CDC database replication
    • JDBC data source
    • HDFS data source
    • Data source UI
    • Data source properties
  • Data outputs and data transformation
    • Data outputs
      • Amazon AWS data outputs
        • Amazon S3 data output
        • Amazon Athena data output
          • Quick guide: Athena data output
          • Full guide: Athena data output
          • Output all data source fields to Amazon Athena
        • Amazon Kinesis data output
        • Amazon Redshift data output
        • Amazon Redshift Spectrum data output
          • Connect Redshift Spectrum to Glue Data Catalog
        • Amazon SageMaker data output
      • Data lake / database data outputs
        • Snowflake data output
          • Upsert data to Snowflake
        • MySQL data output
        • PostgreSQL data output
        • Microsoft SQL Server data output
        • Elasticsearch data output
        • Dremio
        • PrestoDB
      • Upsolver data output
      • HDFS data output
      • Google Storage data output
      • Microsoft Azure Storage data output
      • Qubole data output
      • Lookup table data output
        • Lookup table alias
        • API Playground
        • Serialization of EACH aggregations
      • Kafka data output
    • Data transformation
      • Transform with SQL
        • Mapping data to a desired schema
        • Transforming data with SQL
        • Aggregate streaming data
        • Query hierarchical data
      • Work with Arrays
      • View outputs
      • Create an output
        • Modify an output in SQL
          • Quick guide: SQL data transformation
        • Add calculated fields
        • Add filters
        • Add lookups
          • Add lookups from data sources
          • Add lookups from lookup tables
          • Adding lookups from reference data
        • Output properties
          • General output properties
      • Run an output
      • Edit an output
      • Duplicate an output
      • Stop an output
      • Delete an output
  • Guide for developers
    • Upsolver REST API
      • Create a data source
      • Modify a data source
      • API content formats
    • CI/CD on Upsolver
  • Administration
    • Connections
      • Amazon S3 connection
      • Amazon Kinesis connection
      • Amazon Redshift connection
      • Amazon Athena connection
      • Amazon S3 over SQS connection
      • Google Storage connection
      • Azure Blob storage connection
      • Snowflake connection
      • MySQL connection
      • Elasticsearch connection
      • HDFS connection
      • Qubole connection
      • PostgreSQL connection
      • Microsoft SQL Server connection
      • Spotinst Private VPC connection
      • Kafka connection
    • Clusters
      • Cluster types
        • Compute cluster
        • Query cluster
        • Local API cluster
      • Monitoring clusters
      • Cluster tasks
      • Cluster Elastic IPs
      • Cluster properties
      • Uploading user-provided certificates
    • Python UDF
    • Reference data
    • Workspaces
    • Monitoring
      • Credits
      • Delays In Upsolver pipelines
      • Monitoring reports
        • Monitoring system properties
        • Monitoring metrics
    • Security
      • IAM: Identity and access management
        • Manage users
        • Manage groups
        • Manage policies
      • Git integration
      • Single sign-on with SAML
        • Microsoft Azure AD with SAML sign-on
        • Okta with SAML sign-on
        • OneLogin with SAML sign-on
      • AMI security updates
  • Support
    • Upsolver support portal
  • Change log
  • Legal
Powered by GitBook
On this page
  • Amazon S3 (Quick)
  • Amazon S3 (Advanced)
  • Amazon Kinesis Stream (Quick)
  • Amazon Kinesis Stream (Advanced)
  • Amazon S3 over SQS
  • Apache Kafka (Quick)
  • Apache Kafka (Advanced)
  • Azure Blob storage
  • Google Cloud Storage
  • Kinesis-backed HTTP
  • HTTP

Was this helpful?

  1. Guide for developers
  2. Upsolver REST API

Create a data source

This article provides a guide on how to create different types of data sources using an API call.

PreviousUpsolver REST APINextModify a data source

Last updated 4 years ago

Was this helpful?

This API enables you to create a new data source. All API calls require an .

POST https://api.upsolver.com/inputs

Amazon S3 (Quick)

Connect to your AWS S3 Bucket.

In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See:

Fields

Field

Name

Type

Description

Optional

bucket

Bucket

String

The Amazon S3 bucket to read from.

globFilePattern

Glob File Pattern

String

The pattern for files to ingest.

datePattern

Date Pattern

String

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example 1

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "QuickS3StorageInputRequest",
	"bucket" : "bucket",
	"globFilePattern" : "globFilePattern",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"contentType" : {
		"clazz" : "JsonContentType"
	},
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"displayData" : {
		"name" : "First Amazon S3 Data Source",
		"description" : "Description of first Amazon 
S3 data source"
	},
	"softRetention" : true
}' "https://api.upsolver.com/inputs/DATA-SOURCE-ID"

Example 2

curl 
-X POST 
-H "content-type: application/json" 
-H "Authorization: token" 
-d '{"clazz":"QuickS3StorageInputRequest", 
     "bucket": "upsolver-tutorials-orders", 
     "globFilePattern": "*", 
     "datePattern": "yyyy/MM/dd/HH", 
     "prefix": "data/", 
     "contentType":
                  {
                     "clazz": "JsonContentType"
                  }, 
     "compression": 
                  {
                     "clazz": "AutoDetectCompression"
                  }, 
     "displayData": 
                  {
                     "name": "API test data source", 
                     "description": "Description of first Amazon S3 data source"}, 
                     "softRetention":false
                  }' 
      "https://your-api.upsolver.com/inputs/"

Amazon S3 (Advanced)

Connect to your AWS S3 Bucket.

In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

S3 Connection

String

The cloud storage to ingest files from.

datePattern

Date Pattern

String

fileMatchPattern

File Name Pattern

FileNameMatcher

The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from the bucket specified.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

interval

Interval

Int (Minutes)

The sliding interval to wait for data.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

retention

Retention

Int (Minutes)

The retention period for the data.

+

dataDumpDate

Data Dump Date

String (ISO-8601)

The date that the data starts.

+

maxDelay

Max Delay

Int (Minutes)

The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "CloudStorageInputRequest",
	"displayData" : {
		"name" : "First Amazon S3 Data Source",
		"description" : "Description of first Amazon 
S3 data source"
	},
	"sourceStorage" : "aa302f0a-e6ee-44aa-aa38-e28f1ff455f7",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"fileMatchPattern" : {
		"clazz" : "AllMatcher"
	},
	"contentType" : {
		"type" : "JsonContentType"
	,
	"computeEnvironment" : "53b314af-ffab-419a-9c2c-56032c6ef4c0",
	"destinationStorage" : "e019c0fe-bb80-4cf1-bc7b-aee579d8672e",
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"interval" : 120
}' "https://api.upsolver.com/api/v1/data-source/"

Amazon Kinesis Stream (Quick)

Connect to your Amazon Kinesis. Upsolver can read events from your Amazon Kinesis, according to the stream you define.

Fields

Field

Name

Type

Description

Optional

region

Region

Region

Your AWS region.

streamName

Stream

String

The name of the relevant Kinesis stream.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "QuickKinesisInputRequest",
	"region" : "region",
	"streamName" : "streamName",
	"contentType" : {
		"type" : "JsonContentType"
	},
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"displayData" : {
		"name" : "First Amazon Kinesis Stream Data 
Source",
		"description" : "Description of first Amazon 
Kinesis Stream data source"
	},
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

Amazon Kinesis Stream (Advanced)

Connect to your Amazon Kinesis. Upsolver can read events from your Amazon Kinesis, according to the stream you define.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

kinesisConnection

Kinesis Connection

String

The AWS credentials to connect to Kinesis.

streamName

Stream

String

The name of the relevant Kinesis stream.

readFromStart

Read From Start

String

The time from which to ingest the data from.

Messages from before this time will be ignored. If you leave this field empty all messages are ingested.

computeEnvironment

Compute Cluster

String

connectionPointer

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

isOnline

Real Time Statistics

Boolean

Calculate this data source's statistics in real time directly from the input stream if a real time cluster is deployed.

shards

Shards

Int

How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.

parallelism

Parallelism

Int

The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KinesisInputRequest",
	"displayData" : {
		"name" : "First Amazon Kinesis Stream Data 
Source",
		"description" : "Description of first Amazon 
Kinesis Stream data source"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"kinesisConnection" : "dcfc39c1-c458-4ec5-87f1-0c7f437ea17e",
	"streamName" : "streamName",
	"readFromStart" : true,
	"computeEnvironment" : "fc2a356d-c3ae-4756-a4a3-1b15158df5e7",
	"connectionPointer" : "4e3787cd-daac-42ea-8dd4-642c516a4a31",
	"isOnline" : true,
	"shards" : 1,
	"parallelism" : 1,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	}
}' "https://api.upsolver.com/api/v1/data-source/"

Amazon S3 over SQS

Connect to your AWS S3 Bucket using SQS Notifications.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

Source Storage

String

The cloud storage to ingest files from.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

executionParallelism

Parallelism

Int

The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

prefix

Prefix

String

The prefix of the files or directories. To filter a specific directory, add a trailing /.

+

suffix

Suffix

String

The suffix of the files to read.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Messages from before this time will be ignored. If you leave this field empty all messages are ingested.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "S3OverSQSInputRequest",
	"displayData" : {
		"name" : "First S3 Over SQS Data Source",
		"description" : "Description of first S3 
Over SQS data source"
	},
	"sourceStorage" : "6ee598b6-4928-4b07-b532-83a79464e5bb",
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "46c2945e-5a46-4d93-9e21-5a85653c28c5",
	"destinationStorage" : "2a08601c-52d2-425e-8d2c-324dbcea5858",
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"softRetention" : true,
	"executionParallelism" : 1
}' "https://api.upsolver.com/api/v1/data-source/"

Apache Kafka (Quick)

Connect to any topic on your Kafka Servers. Upsolver can read events from your Kafka cluster from the specified Kafka topic.

Fields

Field

Name

Type

Description

Optional

kafkaHosts

Kafka Hosts

String

The Kafka hosts separated with commas (e.g. foo:9092,bar:9092)

topicName

Kafka Topic

String

The Kafka topic to ingest the data from.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

maybekafkaVersion

Kafka Version

KafkaVersion

The version of the Kafka Servers. If unsure, use 0.10.x.x.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "QuickKafkaInputRequest",
	"kafkaHosts" : "kafkaHosts",
	"topicName" : "topicName",
	"contentType" : {
		"type" : "JsonContentType"
	},
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"displayData" : {
		"name" : "First Kafka Data Source",
		"description" : "Description of first 
Kafka data source"
	},
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

Apache Kafka (Advanced)

Connect to any topic on your Kafka Servers. Upsolver can read events from your Kafka cluster from the specified Kafka topic.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

kafkaVersion

Kafka Version

KafkaVersion

The version of the Kafka Servers. If unsure, use 0.10.x.x.

kafkaHosts

Kafka Hosts

String

The Kafka hosts separated with commas. For example: foo:9092,bar:9092

topicName

Kafka Topic

String

The Kafka topic to ingest the data from.

readFromStart

Read From Start

Boolean

Whether to read the data from the start of the topic or to begin from the end.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

connectionPointer

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

shards

Shards

Int

How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.

executionParallelism

Execution Parallelism

Int

The number of independent shards to parse data, to increase parallelism and reduce latency. This should remain 1 in most cases and be no more than the number of shards used to read the data from the source.

isOnline

Real Time Statistics

Boolean

Calculate this data source's statistics in real time directly from the input stream if a real time cluster is deployed.

useSsl

Use SSL

Boolean

Set this to true if your connection requires SSL. Contact us to ensure that your SSL certificate is supported.

storeRawData

Store Raw Data

Boolean

Store an additional copy of the data in its original format.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

consumerProperties

Kafka Consumer Properties

String

Extra properties for Kafka Consumer.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this period of time passes, the data is deleted forever.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" 
-H "Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KafkaInputRequest",
	"displayData" : {
		"name" : "First Kafka Data Source",
		"description" : "Description of first 
Kafka data source"
	},
	"kafkaVersion" : "0.10.x.x",
	"kafkaHosts" : "kafkaHosts",
	"topicName" : "topicName",
	"readFromStart" : true,
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "5d8b6e27-6004-48b2-8dc1-1db9f25880cb",
	"connectionPointer" : "36a1d237-1ff0-4574-972b-b071482f3d08",
	"softRetention" : true,
	"shards" : 1,
	"executionParallelism" : 1,
	"isOnline" : true,
	"useSsl" : true,
	"storeRawData" : true,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	}
}' "https://api.upsolver.com/api/v1/data-source/"

Azure Blob storage

Connect to your Azure Blob storage container.

In order for Upsolver to read events directly from your cloud storage, files should be partitioned by date and time (which defines the folder structure in the cloud storage).

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

Azure Blob Storage Connection

String

The cloud storage to ingest files from.

datePattern

Date Pattern

String

fileMatchPattern

File Name Pattern

FileNameMatcher

The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from the storage container specified.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

interval

Interval

Int (Minutes)

The sliding interval to wait for data.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

initialLoadConfiguration

Initial Load Configuration

InitialLoadConfiguration

If you have initial data, enter in a prefix and regex pattern to list the relevant data and select the required files.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

dataDumpDate

Data Dump Date

String (ISO-8601)

The date that the data starts.

+

maxDelay

Max Delay

Int (Minutes)

The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "AzureBlobStorageInputRequest",
	"displayData" : {
		"name" : "First Azure Blob Storage Data 
Source",
		"description" : "Description of first Azure 
Blob Storage data source"
	},
	"sourceStorage" : "2da1c848-9c44-4b4e-a226-4ffebf0d9c49",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"fileMatchPattern" : {
		"clazz" : "AllMatcher"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "1bce245a-0c76-4fe7-acb5-6f2bf4b64d8f",
	"destinationStorage" : "004974e0-e384-466e-8ac4-7429c30614e3",
	"softRetention" : true,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"interval" : 120
}' "https://api.upsolver.com/api/v1/data-source/"

Google Cloud Storage

Connect to your Google Storage Bucket.

In order for Upsolver to read events directly from your cloud strage, files should be partitioned by date and time (which defines the folder structure in the cloud storage)

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

sourceStorage

Google Storage Connection

String

The cloud storage to ingest files from.

datePattern

Date Pattern

String

fileMatchPattern

File Name Pattern

FileNameMatcher

The file name pattern for the files to ingest. If all the files in the specified folders are relevant, specify All. The pattern given is matched against the file path starting from storage source specified.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

destinationStorage

Target Storage

String

The data and metadata files for this data source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

compression

Compression

Compression

The compression in the data (Zip, GZip, Snappy, SnappyUnframed, Tar, or None).

interval

Interval

Int (Minutes)

The sliding interval to wait for data.

prefix

Folder

String

If the data resides in a sub folder within the defined cloud storage, specify this folder.

+

initialLoadConfiguration

Initial Load Configuration

InitialLoadConfiguration

If you have initial data, enter in a prefix and regex pattern to list the relevant data and select the required files.

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

dataDumpDate

Data Dump Date

String (ISO-8601)

The date that the data starts.

+

maxDelay

Max Delay

Int (Minutes)

The maximum delay to consider the data, that is, any data that arrives delayed by more than the max delay is filtered out.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "GoogleCloudStorageInputRequest",
	"displayData" : {
		"name" : "First Google Cloud Storage 
Data Source",
		"description" : "Description of first Google 
Cloud Storage data source"
	},
	"sourceStorage" : "0a617e70-c2b2-4870-a908-f4c864c4961f",
	"datePattern" : "yyyy/MM/dd/HH/mm",
	"fileMatchPattern" : {
		"clazz" : "AllMatcher"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "d4f6ee48-0782-4789-b661-4e19679a541a",
	"destinationStorage" : "8ddd7397-e1ed-4b21-a9cb-91aa923ea165",
	"softRetention" : true,
	"compression" : {
		"clazz" : "AutoDetectCompression"
	},
	"interval" : 120
}' "https://api.upsolver.com/api/v1/data-source/"

Kinesis-backed HTTP

Connect your stream using HTTP requests from any source.

Once you create the connection, you will be provided with an HTTP endpoint. Upsolver receives the data as POST, with the data in the body, and the data is stored in a Kinesis Stream until processed by Upsolver.

Headers sent with the request are also ingested as part of the stream, so metadata can be added to the request header.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

ingestionEnvironment

Ingestion Cluster

String

storageConnection

Target Storage

String

The data and metadata files for this Data Source will be stored in this storage.

kinesisConnection

Kinesis Connection

String

The data and metadata files for this Data Source will be stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

startExecutionFrom

Start Ingestion From

String (ISO-8601)

The time from which to ingest the data from.

Files from before this time (based on the provided date pattern) are ignored. If you leave this field empty all files are ingested.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KinesisHttpInputRequest",
	"displayData" : {
		"name" : "First Kinesis backed HTTP Data 
Source",
		"description" : "Description of first Kinesis 
backed HTTP data source"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "bdb99c82-4874-4e41-b735-090cc53af71e",
	"ingestionEnvironment" : "c71ba158-8022-494e-84e1-e0623533ef1c",
	"storageConnection" : "f4376c08-1b13-495b-904f-768effa818da",
	"kinesisConnection" : "44a93a4b-5083-4f26-90cb-c3ea7284e3d3",
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

HTTP

Connect your stream using HTTP requests from any source.

Once you create the connection, you will be provided with an HTTP endpoint. Upsolver receives the data as POST, with the data in the body.

Headers sent with the request are also ingested as part of the stream, so metadata can be added to the request header.

Fields

Field

Name

Type

Description

Optional

displayData.name

Name

String

The data source name.

displayData.description

Description

String

The data source description.

contentType

Content Format

ContentType

The format of the messages. Supported formats are: JSON, AVRO, CSV, TSV, ORC, Protobuf and x-www-form-urlencoded.

For self-describing formats like JSON, the schema is auto-detected. The body should contain of the message should contain the message itself, which should not be url-encoded.

Messages can be compressed, Upsolver automatically detects the compression type.

Supported compression types are: Zip, GZip, Snappy and None.

computeEnvironment

Compute Cluster

String

connectionPointer

Target Storage

String

The data and metadata files for this Data Source are stored in this storage.

softRetention

Soft Retention

Boolean

A setting that prevents data deletion when the retention policy in Upsolver activates. When enabled, the metadata is purged but the underlying data (e.g. S3 object) is not deleted.

shards

Shards

Int

How many readers to use in parallel to read the stream. A recommended value would be to increase it by 1 for every 70 MB/s sent to your topic.

retention

Retention

Int (Minutes)

A retention period for the data in Upsolver. After this amount of time elapsed the data will be deleted forever.

+

endExecutionAt

End Read At

String (ISO-8601)

If configured, stop reading after this date.

+

workspaces

Workspaces

String[]

The workspaces attached to this data source.

+

Example

curl -X POST -H "content-type: application/json" -H 
"Authorization: YOUR_TOKEN" \
-d '{
	"clazz" : "KinesisHttpInputRequest",
	"displayData" : {
		"name" : "First Kinesis backed HTTP Data 
Source",
		"description" : "Description of first 
Kinesis backed HTTP data source"
	},
	"contentType" : {
		"type" : "JsonContentType"
	},
	"computeEnvironment" : "bdb99c82-4874-4e41-b735-090cc53af71e",
	"ingestionEnvironment" : "c71ba158-8022-494e-84e1-e0623533ef1c",
	"storageConnection" : "f4376c08-1b13-495b-904f-768effa818da",
	"kinesisConnection" : "44a93a4b-5083-4f26-90cb-c3ea7284e3d3",
	"softRetention" : true
}' "https://api.upsolver.com/api/v1/data-source/"

The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm). The date format specification must be set according to .

See:

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See:

The date pattern in the file name/folder structure. For example: yyyy/MM/dd/HH/mm. The date format specification must be set according to .

See .

The compute cluster to run the calculation on. See .

A prerequisite for defining an Amazon Kinesis stream connection is providing Upsolver with the appropriate credentials for reading from your Amazon Kinesis stream. See:

See:

A prerequisite for defining an Amazon Kinesis stream connection is providing Upsolver with the appropriate credentials for reading from your Amazon Kinesis stream. See:

See:

The compute cluster to run the calculation on. See:

You will need to configure SQS Notifications from your S3 Bucket and open permissions to read and delete messages from the SQS Queue to the same access key and secret key you entered to give Upsolver permissions to read from the S3 Bucket. See:

See:

The compute cluster to run the calculation on. See:

A prerequisite for defining a Kafka stream connection is providing Upsolver with the appropriate credentials for reading from your Kafka cluster. See:

See:

A prerequisite for defining a Kafka stream connection is providing Upsolver with the appropriate credentials for reading from your Kafka cluster. See:

See:

The compute cluster to run the calculation on. See:

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See:

The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm). The date format specification must be set according to .

See:

The compute cluster to run the calculation on. See:

A prerequisite for defining a cloud storage data source is providing Upsolver with the appropriate credentials for reading from your cloud storage. See:

The date pattern in the file name/folder structure (e.g. yyyy/MM/dd/HH/mm). The date format specification must be set according to .

See:

The compute cluster to run the calculation on. See:

See:

The compute cluster to run the calculation on. See:

See:

The compute cluster to run the calculation on. See:

API token
S3 connection
S3 connection
Kinesis connection
Kinesis connection
S3 over SQS connection
Kafka connection
Kafka connection
Azure Blob storage connection
Google Storage connection
Java DateTimeFormatter format
Content formats
Java DateTimeFormatter format
Content Formats
Content formats
Content formats
Content formats
Content formats
Content formats
Java DateTimeFormatter format
Content formats
Java DateTimeFormatter format
Content formats
Compute cluster
Content formats
Content formats
Adding a Compute Cluster
Compute cluster
Compute cluster
Compute cluster
Compute cluster
Compute cluster
Compute cluster