Comment on page
Aggregate and output data
This quickstart article shows you how to aggregate data when transforming data in the staging zone.
Using familiar SQL syntax in your transformation job, you can aggregate the data in your staging table and load it into your target table.
Here's the code to create a job that loads aggregated data into your target table:
CREATE JOB group_testing
START_FROM = BEGINNING
RUN_INTERVAL = 1 MINUTE
ADD_MISSING_COLUMNS = TRUE
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_aggregated_data MAP_COLUMNS_BY_NAME
SELECT customer.address.city AS customer_address_city,
SUM(nettotal) AS total_revenue,
COUNT(DISTINCT orderid) AS num_orders
GROUP BY customer.address.city;
Let's understand what this code does.
In this example, you create a job named group_testing, and set the target table for your data as orders_aggregated_data. The job option
START_FROMinstructs Upsolver to include all historical data, and the
RUN_INTERVALoption specifies that the job should execute every 1 minute. The option
ADD_MISSING_COLUMNSensures that all columns that are added to the source table in the future, are also added to the target table.
SELECTstatement, we are aggregating our sample orders by customer.address.city, calculating the total revenue per city. Let's break it down:
- SUM(nettotal) AS total_revenue: in this line, we sum the nettotal column and rename it to total_revenue. This provides us with the total revenue by city.
- COUNT(DISTINCT orderid) AS num_orders: we then count each the number of orders by city into a new column named num_orders.
- GROUP BY customer.address.city: this line groups our previous columns by customer.address.city, enabling us to view total revenue and the number of orders by city.
When the job is running and we have data in our target table, the final step is to query the data.
Here's the code for querying your data: