Output to a Target Table
This quickstart guide describes how to insert data from a staging table to a target table in Upsolver.
Creating an output table for refined data in Amazon Athena
Here's the code to create a target table in the data lake:
You don’t need to define all of your output columns, as Upsolver automatically adds any missing columns. If you do want to control which columns are added, you can define them in the CREATE TABLE
statement and then set the ADD_MISSING_COLUMNS
job property to FALSE in the job options.
Prepare your data for the refined zone
Now that you have created the target table, let's create a transformation job to read data from the staging table, transform it, and write the results into the target table inside your refined zone.
You must have data in your staging table to run this job.
Create a transformation job to read from staging and write to the target:
Let's walk through this job:
First, we create a job named transform_orders_and_insert_into_athena. The START_FROM
option instructs Upsolver to include all data going back in time, and ADD_MISSING_COLUMNS
ensures columns added in the future are added to the target table. In this example, the RUN_INTERVAL
specifies that the job executes every 1 minute.
Then, in the AS INSERT INTO
command, you set the insertion table as the target table you created in the previous step. The MAP_COLUMNS_BY_NAME
command tells Upsolver to match the source and target columns based on name.
In the SELECT
statement, you can perform transformations on your data:
Examples of the transformations you can apply to your data include renaming columns, such as changing orderid to order_id, or populating a new column with data from a different one. In the above example, buyeremail is masked using the MD5
function, and a new column, customer_id, is created.
Lastly, you specify the table from which you are copying your data, usually your staging table:
Query your data
Finally, to view your pipeline results, use a SELECT
statement to query your table.
Here's the code to view your pipeline data:
Including the LIMIT
option instructs the Upsolver query engine to limit the result set to 100 rows.
Last updated