Upsolver
Contact Support
  • Welcome to Upsolver
  • Getting Started
    • Start using Upsolver for free
    • Get started as a Upsolver user
      • Upsolver in 5 minutes
        • Upsolver Quickstart in 5 minutes
          • Additional sandbox fun
        • Amazon Athena data output
        • MySQL (AWS RDS) data output
        • Local MySQL data output
      • Upsolver free training
        • Introduction to Upsolver
          • Transform and write data to Amazon Athena
          • Pre-aggregate data for efficiency and performance
          • UPSERT streaming data to Amazon Athena
      • Prerequisites for AWS deployment
      • AWS integration
      • Deploy Upsolver on your AWS account
      • Prerequisites for Azure Deployment
      • Azure Integration
        • Prerequisites for Azure Users
        • Log into Upsolver
        • Log into Azure & Authenticate
        • Set Up and Deploy Azure Resources
        • Delegate Resource Group, and Deploy Upsolver in Azure
        • Integrate Azure with Upsolver
    • Upsolver concepts
      • Deployment models
      • Upsolver components
      • Data ingestion
    • Upsolver Amazon AWS deployment guide
      • Private VPC
      • Upsolver VPC
      • AWS role permissions
      • VPC peering
    • Tutorials and FAQ
      • Tutorials
        • How To Re-process Data
        • Create an Amazon S3 data source
        • Create an Amazon Athena data output
        • Join multiple data streams for real-time analytics
        • Use Upsolver to index less data into Splunk
        • Upsert and delete use case
        • AWS S3 to Athena use case
        • Merge data use case
        • Full vs. Partial Inbound Data Records
      • FAQ
      • Infrastructure
        • What is a dry-run cluster?
    • Glossary
      • Language guide
        • SQL syntax reference
        • Functions
          • Aggregation Functions
            • APPROX_COUNT_DISTINCT
            • APPROX_COUNT_DISTINCT_EACH
            • AVG
            • AVG_EACH
            • AVG_TIME_SERIES
            • COLLECT_SET
            • COLLECT_SET_EACH
            • COUNT
            • COUNT(*)
            • COUNT_DISTINCT
            • COUNT_EACH
            • COUNT_IF
            • DECAYED_SUM
            • DYNAMIC_SESSIONS
            • FIRST
            • FIRST_ARRAY
            • FIRST_EACH
            • FIRST_TIME_SERIES
            • LAST
            • LAST_ARRAY
            • LAST_EACH
            • LAST_K
            • LAST_K_EACH
            • LAST_TIME_SERIES
            • MAX
            • MAX_BY
            • MAX_EACH
            • MAX_TIME_SERIES
            • MIN
            • MIN_BY
            • MIN_EACH
            • MIN_TIME_SERIES
            • SESSION_COUNT
            • STD_DEV
            • STD_DEV_EACH
            • STRING_MAX_EACH
            • STRING_MIN_EACH
            • SUM
            • SUM_EACH
            • SUM_TIME_SERIES
            • WEIGHTED_AVERAGE
          • Calculated functions
            • Aerospike functions
            • Array functions
            • Conditional functions
            • Date functions
            • External API functions
            • Filter functions
            • Numeric functions
            • Spatial functions
            • String functions
            • Structural functions
              • ZIP
            • Type conversion functions
      • Data formats
      • Data types and features
      • Database output options
      • Upsolver shards
      • Permissions list
      • Index
    • Troubleshooting
      • My CloudFormation stack failed to deploy
      • My private API doesn't start or I can't connect to it
        • Elastic IPs limit reached
        • EC2 Spot Instance not running
        • DNS cache
        • Security group not open
      • My compute cluster doesn't start
      • I can't connect to my Kafka cluster
      • I can't create an S3 data source
      • Data doesn't appear in Athena table
      • I get an exception when querying my Athena table
      • Unable to define a JDBC (Postgres) connection
  • Connecting data sources
    • Amazon AWS data sources
      • Amazon S3 data source
        • Quick guide: S3 data source
        • Full guide: S3 data source
      • Amazon Kinesis Stream data source
      • Amazon S3 over SQS data source
      • Amazon AppFlow data source
        • Setup Google Analytics client ID and client secret.
    • Microsoft Azure data sources
      • Azure Blob storage data source
      • Azure Event Hubs data source
    • Kafka data source
    • Google Cloud Storage data source
    • File upload data source
    • CDC data sources (Debezium)
      • MySQL CDC data source
        • Binlog retention in MySQL
      • PostgreSQL CDC database replication
    • JDBC data source
    • HDFS data source
    • Data source UI
    • Data source properties
  • Data outputs and data transformation
    • Data outputs
      • Amazon AWS data outputs
        • Amazon S3 data output
        • Amazon Athena data output
          • Quick guide: Athena data output
          • Full guide: Athena data output
          • Output all data source fields to Amazon Athena
        • Amazon Kinesis data output
        • Amazon Redshift data output
        • Amazon Redshift Spectrum data output
          • Connect Redshift Spectrum to Glue Data Catalog
        • Amazon SageMaker data output
      • Data lake / database data outputs
        • Snowflake data output
          • Upsert data to Snowflake
        • MySQL data output
        • PostgreSQL data output
        • Microsoft SQL Server data output
        • Elasticsearch data output
        • Dremio
        • PrestoDB
      • Upsolver data output
      • HDFS data output
      • Google Storage data output
      • Microsoft Azure Storage data output
      • Qubole data output
      • Lookup table data output
        • Lookup table alias
        • API Playground
        • Serialization of EACH aggregations
      • Kafka data output
    • Data transformation
      • Transform with SQL
        • Mapping data to a desired schema
        • Transforming data with SQL
        • Aggregate streaming data
        • Query hierarchical data
      • Work with Arrays
      • View outputs
      • Create an output
        • Modify an output in SQL
          • Quick guide: SQL data transformation
        • Add calculated fields
        • Add filters
        • Add lookups
          • Add lookups from data sources
          • Add lookups from lookup tables
          • Adding lookups from reference data
        • Output properties
          • General output properties
      • Run an output
      • Edit an output
      • Duplicate an output
      • Stop an output
      • Delete an output
  • Guide for developers
    • Upsolver REST API
      • Create a data source
      • Modify a data source
      • API content formats
    • CI/CD on Upsolver
  • Administration
    • Connections
      • Amazon S3 connection
      • Amazon Kinesis connection
      • Amazon Redshift connection
      • Amazon Athena connection
      • Amazon S3 over SQS connection
      • Google Storage connection
      • Azure Blob storage connection
      • Snowflake connection
      • MySQL connection
      • Elasticsearch connection
      • HDFS connection
      • Qubole connection
      • PostgreSQL connection
      • Microsoft SQL Server connection
      • Spotinst Private VPC connection
      • Kafka connection
    • Clusters
      • Cluster types
        • Compute cluster
        • Query cluster
        • Local API cluster
      • Monitoring clusters
      • Cluster tasks
      • Cluster Elastic IPs
      • Cluster properties
      • Uploading user-provided certificates
    • Python UDF
    • Reference data
    • Workspaces
    • Monitoring
      • Credits
      • Delays In Upsolver pipelines
      • Monitoring reports
        • Monitoring system properties
        • Monitoring metrics
    • Security
      • IAM: Identity and access management
        • Manage users
        • Manage groups
        • Manage policies
      • Git integration
      • Single sign-on with SAML
        • Microsoft Azure AD with SAML sign-on
        • Okta with SAML sign-on
        • OneLogin with SAML sign-on
      • AMI security updates
  • Support
    • Upsolver support portal
  • Change log
  • Legal
Powered by GitBook
On this page
  • Data source schema
  • To filter the data view
  • To split data by event type
  • Data source samples
  • Parse errors
  • Lineage
  • Monitoring
  • Summary
  • Progress
  • Errors

Was this helpful?

  1. Connecting data sources

Data source UI

This article provides an overview of the different features that are available for each datasource in the Upsolver UI.

PreviousHDFS data sourceNextData source properties

Last updated 4 years ago

Was this helpful?

Click on a data source in the Data Sources page to view more details for that data source.

The information on that data source will be split into:

Data source schema

Upsolver maintains the hierarchical format of the ingested data.

The tree on the left includes the data and headers ingested, as well as any fields added by Upsolver to the header (e.g. time) as well as any calculated fields that may have been added.

The pane on the right show a graph of the volume of events over the lifetime of the data source. If a field is selected, it shows the events graph for that specific field.

You can also explore your data source from a different angle by splitting up the data and configuring a field to Use as Event Type.

This allows you to filter your data source by event type and see how the rest of the fields in your data set are affected by this selection; this can be useful when you have different types of data in the same stream, or if you have a very large data set and only want to view some of the data.

Note: The Use as Event Type feature is most useful if you have several distinct values (with up to a maximum of 999 values).

If a field is selected in the tree, the following metrics appear:

How many of the events in this data source include this field, expressed as a percentage (e.g. 20.81%).

The density in the hierarchy (how many of the events in this branch of the data hierarchy include this field), expressed a percentage.

How many unique values appear in this field.

The total number of values ingested for this field.

The first time this field included a value, for example, a year ago.

The last time this field included a value, for example, 2 minutes ago.

The percentage distribution of the field values. These distribution values can be exported by clicking Export.

A time-series graph of the total number of events that include the selected field.

The most recent data values for the selected field and columns. You can change the columns that appear by clicking Choose Columns.

If a hierarchy element is selected (e.g. the overall data), the following metrics appear:

The number of fields in the selected hierarchy.

The number of keys in the selected hierarchy.

The number of arrays in the selected hierarchy.

A stacked bar chart (by data type) of the number of fields versus the density/distinct values or a stacked bar chart of the number of fields by data type.

A list of the fields in the hierarchy element, including Type, Density, Top Values, Key, Distinct Values, Array, First Seen, and Last Seen.

To filter the data view

1. In the tree, select or search for the required field.

2. Mouse over the graph to view the information for a specific period.

Select and drag over the a portion of graph to review the events over a specific window of time.

3. To change the date range, Lifetime above the graph. You can either select a given range of time under Quick Range or toggle to Custom Range to specify your own range.

4. To change the sample data columns, click Choose Columns select the required fields, and then click Update.

To split data by event type

1. Select a field in the tree and then click Use as Event Type.

This triggers a process that partitions the data source by the event type field by scanning all the data and creating new metadata divided according to the unique values of the selected event type field.

2. Read the warning and click OK.

3. You can now select to partion by one of the values from the Event Type dropdown at the top of the page.

4. Click Clear to stop partitioning the data by event type.

Data source samples

The graph depicts the volume of the events and any errors over the selected period of the data source; below it, 10 samples of the original data are displayed in hierarchical format. These sample values can be exported by clicking Export.

Parse errors

The graph depicts the volume of events and any errors over the selected period of the data source; below it, a list of the parse errors with the Time/File, Original Content, and Error is displayed (e.g. errors may occur due to file corruptions).

Lineage

The outputs, lookup tables, and dashboards that use this data source are displayed here.

Monitoring

Monitoring is split into three tabs:

Summary

The following details are displayed:

Number of files currently being written from this data source to outputs.

Number of unresolved errors stemming from outputs created from this data source.

Number of files written to outputs from this data source.

Additionally, below this you will find two graphs:

  • A graph of the overall utilization of the cluster this data source is running on as well as the utilization by this specific data source.

  • A graph of the delay split by whether it's from ingestion or processing.

Progress

The graph on this page shows the progress of the processing of new data into the data source and details the following:

The speed at which the data is being ingested into the data source.

How far behind the system is processing the data, in minutes.

The expected time of arrival of the data (e.g. when the system is ingesting the data at about the same rate as the data is being generated, this will be less than a minute).

Mouse over the graph to view the information for a specific period.

Errors

Any errors from outputs created from data source will be displayed here.

Summary
Progress
Errors
Properties
Schema
Samples
Parse errors
Lineage
Monitoring
Volume of events over lifetime for a specific field
Change the sample data columns
Partition by event type
View data source progress