Search results

    Edit on GitHub

    Mapping Data To a Desired Schema

    UpSQL enables to map your data into a desired schema. The process of such mapping in Upsolver is being done in two simple steps:

    1. Ingest your raw data to the Data Lake - this process is being done by configuring a Data Source in Upsolver.
    2. Map your data to a desired schema - this process is being done by configuring an Output using UpSql.

    To demonstrate this, let’s assume we have the following data in CSV format in a data source called Purchases (all values in this table are Strings):

    purchase_id customer_id product_name quantity unit_price
    1 1 Orange 3 0.25
    2 1 Apple 1 0.5
    3 1 Banana 2 0.25

    Define Simple Schema

    If we define a table as:

    SELECT customer_id, purchase_id, product_name
    FROM Purchases
    

    The resulting table will reflect that query as events stream in. The final table will contain the data:

    customer_id purchase_id product_name
    “1” “1” “Orange”
    “1” “2” “Apple”
    “1” “3” “Banana”

    Rename column names

    Renaming of column names is being done as follows using the AS statement:

    SELECT customer_id AS Customer, purchase_id AS Purchase, product_name AS Product
    FROM Purchases
    

    The column names in the resulting table have been renamed:

    Customer Purchase Product
    “1” “1” “Orange”
    “1” “2” “Apple”
    “1” “3” “Banana”

    Data Type Conversion

    Conversion of data types is being done using “:” next to the selected column name. We will demonstrate a conversion of the column quantity from the example data source Purchases (which is in String format) into BIGINT format.

    SELECT customer_id, purchase_id, product_name, quantity:BIGINT
    FROM Purchases
    

    The resulting table will reflect the conversion:

    customer_id purchase_id product_name quantity
    “1” “1” “Orange” 3
    “1” “2” “Apple” 1
    “1” “3” “Banana” 2

    Perform Calculations

    It is possible to perform inline calculations when defining the schema:

    If we define a table as:

    SELECT customer_id, purchase_id, product_name, 
          quantity:BIGINT * unit_price as total_cost:BIGINT
    FROM Purchases
    

    The result of the query is the following table which contains the calculated field total_cost:

    customer_id purchase_id product_name total_cost
    “1” “1” “Orange” 0.75
    “1” “2” “Apple” 0.5
    “1” “3” “Banana” 0.5
    It is also possible to first calculate the field and then just use it in the query:

    SET total_cost = quantity:BIGINT * unit_price:BIGINT;
    SELECT customer_id, purchase_id, product_name, total_cost
    FROM Purchases