Data ingestion
This page provides an overview of how data ingestion and data sources work in Upsolver.
Upsolver supports ingesting data from various data source types. These data sources can be logically divided into two types:
Event based data sources
File based data sources
Upsolver lists files from the data source bucket, it creates a list of all the files that need to be loaded. The pull operation happens every minute by default. Upsolver then takes the list of files, parses the data and pushes them in a parsed folder. The parsed folders are going to be the same for both file based and event based data.
File based data sources Upsolver reinforces Exactly-once Semantics by
Send metadata on which files exist
Store that which files exist in the Kinesis stream
Read existing files information from the Kinesis stream to ensure exactly-once processing
Event based data sources
Event based data sources Upsolver reinforces exactly-once by:
Find the events up until which timestamp/offset have already been pulled
Write the information to the Kinesis stream
Read offset information from the Kinesis stream to ensure exactly-once processing
The Data Sources page shows all your active data sources in a grid view and you can filter the data sources according to type.
For example, you can select to view the Amazon S3 data sources only, select whether to view the Active or Deleted data sources, and flip between the grid and list views. You can also search for specific data sources.
The data source panel shows you the name of the data source, the file format ingested, when it was created, and whether it is Running. The graph shows you the volume of events over time, and you can mouse over the graph to view the details for a specific time period.
You can also view:
the compute cluster for the data source
how many fields are being ingested
number of events loaded to date
number of errors
rate of ingestion
Last updated