Flattening Arrays

The FLATTEN_PATHS feature enables you to flatten the results of a query based on the provided paths. This feature can transform nested arrays within your data into separate rows in the output, providing a simple and efficient way to handle hierarchical data structures.

Syntax

The basic syntax to use FLATTEN_PATHS in a transformation job is as follows:

CREATE sync JOB job_name
    FLATTEN_PATHS = (path1, path2, ...)
AS INSERT INTO target_table
      SELECT *
      FROM source_table;

The FLATTEN_PATHS feature is currently only supported when writing to Upsolver tables.

Example: Flattening a Single Path

The following example demonstrates how to flatten the items array.

SQL Code

CREATE sync JOB flatten_single_path_example
    FLATTEN_PATHS = (items)
AS INSERT INTO orders_database.flattened_orders
       SELECT *
       FROM orders_database.raw_orders;

Input Row

{
  "name": "Order1",
  "items": [
    {"product": "apple", "quantity": 3, "categories": ["fruit", "fresh"]},
    {"product": "sugar", "quantity": 2, "categories": ["pantry"]}
  ]
}

Output Rows

{
  "name": "Order1",
  "product": "apple",
  "quantity": 3,
  "categories": ["fruit", "fresh"]
}
{
  "name": "Order1",
  "product": "sugar",
  "quantity": 2,
  "categories": ["pantry"]
}

Example: Flattening a Nested Array

SQL Code

CREATE sync JOB flatten_nested_path_example
    FLATTEN_PATHS = (items[].categories)
AS INSERT INTO orders_database.flattened_categories
      SELECT *
      FROM orders_database.raw_orders;

Input Row

{
  "name": "Order1",
  "items": [
    {"product": "apple", "quantity": 3, "categories": ["fruit", "fresh"]},
    {"product": "sugar", "quantity": 2, "categories": ["pantry"]}
  ]
}

Output Rows

{
  "name": "Order1",
  "product": "apple",
  "quantity": 3,
  "categories": "fruit"
}
{
  "name": "Order1",
  "product": "apple",
  "quantity": 3,
  "categories": "fresh"
}
{
  "name": "Order1",
  "product": "sugar",
  "quantity": 2,
  "categories": "pantry"
}

Example: Flattening Multiple Paths Creating a Cartesian Product

In this example, we flatten both categories and pricing, resulting in a Cartesian product since two independent arrays are used in the flattening.

SQL Code

CREATE sync JOB flatten_multiple_paths_example
    FLATTEN_PATHS = (categories, pricing)
AS INSERT INTO products_database.flattened_products
      SELECT *
      FROM products_database.raw_products;

Input Row

{
  "name": "Apple",
  "categories": ["fruit", "fresh"],
  "pricing": [ {"members": 2.99}, {"general": 3.99}]
}

Output Rows

Here, flattening both categories and pricing creates a Cartesian product of the two arrays, leading to combinations of each category with each pricing option.

{
  "name": "Apple",
  "categories": "fruit",
  "pricing": {"members": 2.99}
}
{
  "name": "Apple",
  "categories": "fruit",
  "pricing": {"general": 3.99}
}
{
  "name": "Apple",
  "categories": "fresh",
  "pricing": {"members": 2.99}
}
{
  "name": "Apple",
  "categories": "fresh",
  "pricing": {"general": 3.99}
}

The FLATTEN_PATHS feature provides an intuitive way to handle complex data structures, making it easier to work with nested arrays in your data processing tasks.

Last updated