Join multiple data streams for real-time analytics
This article provides a walkthrough on how to join multiple data streams for real-time analytics.
Last updated
This article provides a walkthrough on how to join multiple data streams for real-time analytics.
Last updated
Before performing a join on multiple data streams, you must have already deployed Upsolver and created data sources.
Performing a join on multiple data streams is easy with Upsolver.
This guide provides the instructions on joining:
impressions
: primary data source with ad campaign information
clicks
: secondary data stream tracking number of clicks on an ad
Note: All clicks will be associated with an ad impression, but not all ad impressions will result in clicks.
1. Click on Outputs on the left and then New on the right upper corner.
2. Select Amazon Athena as the data output.
3. Click Add to add as many data sources as you need. Click Next to continue.
1. Select the SQL window from the upper right hand corner.
2. The sample SQL below performs a LEFT OUTER JOIN
between impressions
and clicks
data streams.
Behind the scenes, the LEFT OUTER JOIN
is creating a lookup table, enabling users to index data by a set of keys and then retrieve the results in milliseconds.
Read more about Upsolver lookup tables here.
1. Define storage, database, and table information for your Athena environment and click Next.
2. Define the compute cluster that you would like to use and the time range of the data you would like to output.
Keep in mind that setting Ending At to Never means the output will be a continuous stream.
3. Click Deploy.
1. Check to make sure the output data is up to date by clicking on the Progress tab.
2. Run a query in Athena to make sure you get the correct results.