Pipeline Basics
Read this guide to understand how pipelines work in Upsolver.
Real-time ingestion and analytics in the data lake
Most organizations manage data that is continuously updated in real-time, such as the collection of clickstream events from websites to understand user interaction and improve personalization. This is called streaming data.
But companies also process and analyze data in large batches -- for example, the process of enriching user data with third-party information. This is batch data.
Both batch and streaming are integral to a company's data architecture. In this section, we illustrate how to implement both streaming and batch data analytics in the Upsolver data lake.
Below is a simple diagram that shows a high-level architecture of a data pipeline you can use to implement data analytics:
How does Upsolver merge streaming and batch processing?
Upsolver enables you to ingest both streaming and batch data with just one tool, using only familiar SQL syntax. Let's zoom in to understand how Upsolver manages data.
Now, let's look at the core components of Upsolver:
Connectors
For example:
Ingestion jobs
Here's an example:
Transformation jobs
For example:
Benefits of Upsolver Pipelines
1. Always on
Upsolver pipelines are always on. One of the main benefits of a streaming-first design is that pipelines do not need external scheduling or orchestration. This reduces the complexity of deploying and maintaining pipelines. Instead, Upsolver infers the necessary transformations and task progression from the SQL you write. There are no directed acyclic graphs (DAGs) to create and maintain, and you don't need a third-party orchestration tool such as Dagster, Astronomer, or Apache Airflow.
2. Observability and data quality
If you can understand the source and output data -- its structure, schema, data types, value distribution, and whether key fields contain null values -- then you can deliver reliable, fresh, and consistent datasets. Upsolver job monitoring provides graphs and metrics that indicate status at a glance. Upsolver also exposes system tables that contain all the tasks executed at different stages of the pipeline, such as:
Reading from a source system
Writing to the staging table
Transforming the data
Maintaining it in the data lake.
Learn More
Last updated