LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Reference
Reference
  • Reference
  • ✨Learning Paths
    • Amazon Kinesis
    • Amazon S3
    • Apache Kafka
    • Confluent Cloud
    • Microsoft SQL Server
    • MongoDB
    • MySQL
    • PostgreSQL
  • SQL COMMANDS
    • Clusters
      • CREATE CLUSTER
      • ALTER CLUSTER
      • ROLL CLUSTER
      • STOP/START CLUSTER
      • DROP CLUSTER
    • Connections
      • CREATE CONNECTION
        • Amazon Kinesis
        • Amazon Redshift
        • Amazon S3
        • Apache Kafka
        • AWS Glue Data Catalog
        • ClickHouse
        • Confluent Cloud
        • Elasticsearch
        • Hive Metastore
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • Polaris Catalog
        • Iceberg REST catalogs
        • PostgreSQL
        • Snowflake
      • ALTER CONNECTION
      • DROP CONNECTION
      • CDC Connections with SSL
    • Jobs
      • CREATE JOB
        • Ingestion
          • Amazon Kinesis
          • Amazon S3
          • Apache Kafka
          • Confluent Kafka
          • Content Types
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Replication
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Transformation
          • INSERT
            • MAP_COLUMNS_BY_NAME
            • UNNEST
          • MERGE
          • SELECT
          • Job Options
            • Amazon Redshift
            • Amazon S3
            • Apache Iceberg
            • ClickHouse
            • Data Lake Tables
            • Elasticsearch
            • PostgreSQL
            • Snowflake
        • Monitoring
          • Amazon CloudWatch
          • Datadog
          • Dynatrace
      • ALTER JOB
      • PAUSE / RESUME JOB
      • DROP JOB
    • Materialized Views
      • CREATE MATERIALIZED VIEW
      • ALTER MATERIALIZED VIEW
      • DROP MATERIALIZED VIEW
      • Join with a Materialized View
    • Tables
      • CREATE TABLE
      • ALTER TABLE
      • DROP TABLE
    • Iceberg Tables
      • Upsolver Managed Tables
        • CREATE ICEBERG TABLE
        • ALTER ICEBERG TABLE
        • DROP ICEBERG TABLE
        • OPTIMIZE ICEBERG TABLE
      • External Iceberg Tables
        • CREATE EXTERNAL ICEBERG TABLE
        • DROP EXTERNAL ICEBERG TABLE
      • Mirror Iceberg Tables
        • CREATE MIRROR
        • ALTER MIRROR
        • PAUSE / RESUME MIRROR
        • DROP MIRROR
  • FUNCTIONS & OPERATORS
    • Data Types
    • Functions
      • Aggregate
        • APPROX_COUNT_DISTINCT
        • APPROX_COUNT_DISTINCT_EACH
        • AVG
        • AVG_EACH
        • AVG_TIME_SERIES
        • COLLECT_SET
        • COLLECT_SET_EACH
        • COUNT
        • COUNT(*)
        • COUNT(DISTINCT ...)
        • COUNT_EACH
        • COUNT_IF
        • DYNAMIC_SESSIONS
        • FIRST
        • FIRST_ARRAY
        • FIRST_EACH
        • FIRST_TIME_SERIES
        • LAST
        • LAST_ARRAY
        • LAST_EACH
        • LAST_K
        • LAST_K_EACH
        • LAST_TIME_SERIES
        • MAX
        • MAX_BY
        • MAX_EACH
        • MAX_TIME_SERIES
        • MIN
        • MIN_BY
        • MIN_EACH
        • MIN_TIME_SERIES
        • SESSION_COUNT
        • STD_DEV
        • STD_DEV_EACH
        • STRING_MAX
        • STRING_MAX_EACH
        • STRING_MIN_EACH
        • SUM
        • SUM_EACH
        • SUM_TIME_SERIES
        • WEIGHTED_AVERAGE
      • Array
        • ARRAY_DISTINCT
        • ARRAY_JOIN
        • ARRAY_MAX
        • ARRAY_MIN
        • ARRAY_SORT
        • ARRAY_SORT_DESC
        • ARRAY_SUM
        • COUNT_VALUES_IF
        • COUNT_VALUES
        • ELEMENT_AT
        • FIRST_ELEMENT
        • LAST_ELEMENT
        • VALUE_INDEX_IN_ARRAY
        • VALUE_INDEX_IN_ROW
      • Comparison
        • GREATEST
        • LEAST
      • Conditional
        • COALESCE
        • IF_ELSE
        • NULL_IF
      • Date & Time
        • ADD_TIME_ZONE_OFFSET
        • DATE
        • DATE_ADD
        • DATE_DIFF
        • DATE_TRUNC
        • DAY
        • DAY_OF_WEEK
        • DAY_OF_YEAR
        • EXTRACT_TIMESTAMP
        • EXTRACT
        • FORMAT_DATETIME
        • FROM_ISO8601_DATE
        • FROM_UNIXTIME
        • HOUR
        • MILLISECOND
        • MINUTE
        • MONTH
        • QUARTER
        • RUN_END_TIME
        • RUN_START_TIME
        • SECOND
        • SUBTRACT_TIME_ZONE_OFFSET
        • TO_UNIX_EPOCH_MILLIS
        • TO_UNIX_EPOCH_SECONDS
        • TO_UNIXTIME
        • WEEK
        • YEAR_OF_WEEK
        • YEAR
      • Filter
        • IS_DUPLICATE
        • NOT
      • Interval
        • PARSE_DURATION
      • Mathematical
        • ABS
        • CBRT
        • CEIL
        • CEILING
        • DEGREES
        • EXP
        • FLOOR
        • GET_SHARD_NUMBER
        • LN
        • LOG
        • LOG2
        • LOG10
        • MOD
        • MODULO
        • POW
        • POWER
        • RADIANS
        • RAND
        • RANDOM
        • RECIPROCAL
        • ROUND
        • SIGN
        • SORT_VALUES
        • SQRT
        • TRUNCATE
      • Regular Expressions
        • REGEXP_EXTRACT
        • REGEXP_EXTRACT_ALL
        • REGEXP_LIKE
        • REGEX_MATCH_POSITION
        • REGEX_NAMED_GROUPS
        • REGEXP_REPLACE
      • Spatial
        • ST_DISTANCE
        • ST_WGS84_DISTANCE
        • WKT_SPATIAL_CONTAINS
        • WKT_SPATIAL_INTERSECT
      • String
        • BASE64_DECODE
        • BASE64_TO_HEX
        • BYTES_SUBSTRING
        • CONCAT
        • DATE
        • JOIN_ARRAYS
        • LENGTH
        • LOWER
        • LPAD
        • LTRIM
        • MD5
        • PARSE_DATETIME
        • REPLACE
        • REVERSE
        • RPAD
        • RTRIM
        • SHA1
        • SHA3_512
        • SHA256
        • SHA512
        • SORT_VALUES
        • SPLIT
        • SPLIT_TO_RECORD
        • STRING_FORMAT
        • STRIP_MARGIN
        • STRIP_PREFIX
        • STRIP_SUFFIX
        • STRPOS
        • SUBSTR
        • SUBSTRING
        • TRANSLATE
        • TRIM_CHARS
        • TRIM
        • UPPER
        • UUID_GENERATOR
        • XX_HASH
      • Structural
        • FROM_KEY_VALUE
        • GET_RANGE
        • JOIN_ALL_BY_KEY
        • JSON_PATH
        • JSON_TO_RECORD
        • MAP_WITH_INDEX
        • QUERY_STRING_TO_RECORD
        • RECORD_TO_JSON
        • SORT_BY
        • TO_ARRAY
        • ZIP_WITH_INDEX
        • ZIP
      • Trigonometric
        • COS
        • SIN
        • TAN
        • TANH
      • Type Conversion
        • CAST
        • CHR
        • DECIMAL_TO_HEX
        • HEX_TO_DECIMAL
        • TO_BIGINT
        • TO_DOUBLE
        • TO_STRING
      • URL
        • TOP_PRIVATE_DOMAIN
        • URL_DECODE
        • URL_ENCODE
        • URL_PARSER
    • Operators
      • Comparison
      • Conditional
        • CASE
      • Logical
      • Mathematical
      • String
  • MONITORING
    • Clusters
    • Datasets
      • Ingested Data
        • Column
      • Lineage
      • Data Violations
      • Statistics
      • Maintenance
        • Compactions
        • Expire Snapshots
        • Orphan Files
      • Columns
      • Partitions
      • Properties
    • Job Status
      • Stream & File Sources
        • Monitoring
        • Graphs
        • Lineage
        • Settings
      • CDC Sources
        • Monitoring
        • Replication Settings
        • Job Settings
    • System Catalog
      • Information Schema
        • Clusters
        • Columns
        • Connections
        • Jobs
        • Mirrors
        • Tables
        • Users
        • Views
      • Insights
        • job_output_column_stats
        • dataset_column_stats
      • Monitoring
        • CDC Status
        • Clusters
        • Expectations
        • Jobs
        • Partition Statistics
        • Recent Compactions
        • Running Queries
        • Table Statistics
      • Task Executions Table
  • GENERAL
    • Common SQL Syntax
    • View Entity Syntax
    • Keyboard Shortcuts
Powered by GitBook
On this page
  • Example 1
  • Example 2
  • Example 3
  • Example 4
  1. SQL COMMANDS
  2. Jobs
  3. CREATE JOB
  4. Transformation
  5. INSERT

UNNEST

The UNNEST operator is useful for converting nested objects with arrays to flat tables. With Upsolver, UNNEST allows you to flatten arrays based on a full SELECT statement.

In certain cases, using UNNEST may produce a Cartesian product of the column's array values in your result. This means that the values in the flattened arrays will appear in every possible combination within your result.

For example. given the arrays [ 1, 2, 3 ] and [ 4, 5 ], their Cartesian product would be [ (1 , 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5) ].

If this is the desired result, enable the job option ALLOW_CARTESIAN_PRODUCTS to allow the flattening of the arrays.

Example 1

Sample data

{
    "values": [ 1, 2, 3 ],
    "name": "Oleg",
    "id": 123
}

Sample query

UNNEST(SELECT data.values[] AS value,
              data.name AS name,
              data.id AS id
       FROM my_data_source)

Result

Since the data was flattened based on the values[] array that contained three values, the result contains three rows from one source event.

value
name
id

1

Oleg

123

2

Oleg

123

3

Oleg

123

Example 2

Sample data

{
    "values": ["apple", "NY"],
    "type": ["fruit", "city"]
}

Sample query with Cartesian product

UNNEST(SELECT data.values[] AS value,
              data.type[] AS type
       FROM my_data_source)

Result

Since the selected arrays are independent without any shared context, the result contains a Cartesian product.

value
type

apple

fruit

NY

fruit

apple

city

NY

city

In this case, we can see that it doesn't make sense to pair NY with fruit and apple with city, so the ZIP function should be used to first combine the arrays into a single context.

Sample query with ZIP

UNNEST(SELECT zipped[].value AS value
              zipped[].type AS type
       FROM my_data_source
       LET zipped = ZIP('type,value', data.type[], data.values[]))

Result

Now we have achieved our desired output for our data.

value
type

apple

fruit

NY

city

Example 3

Sample data

{
    "orders": [{ "order_lines": [ 1, 2, 3 ], "name": "a" }, 
               { "order_lines": [ 4, 5, 6 ], "name": "b" }]
    "refunds": [ 1, 2 ]
}

Sample query without Cartesian product

UNNEST(SELECT orders[].name AS name, 
              orders[].order_lines[] AS line
       FROM my_data_source)

Result

Since there is a natural pairing between orders[].name and orders[].order_lines[] where each name has a corresponding order_lines array, using UNNEST on this query does not result in a Cartesian product.

name
line

a

1

a

2

a

3

b

4

b

5

b

6

Sample query with Cartesian product

UNNEST(SELECT orders[].name AS name, 
              orders[].order_lines[] AS line, 
              refunds[] AS refund
       FROM my_data_source)

Result

Since there is no relationship between orders[] and refunds[] in our data, using UNNEST on this query will result in a Cartesian product between the result of pairing orders[] and refunds[].

name
line
refund

a

1
1

a

2
1

a

3
1

b

4
1

b

5
1

b

6
1

a

1
2

a

2
2

a

3
2

b

4
2

b

5
2

b

6
2

Example 4

Sample data

{
    "orders": [{ "order_lines": [ 1, 2, 3 ], 
                 "name": "a", 
                 "order_date": [ "07/30/2021", "11/27/2021" ]}, 
               { "order_lines": [ 4, 5, 6 ], 
                 "name": "b", 
                 "order_date": [ "03/21/2021", "09/13/2021" ]}]
}

Sample query without Cartesian product

UNNEST(SELECT orders[].name AS name, 
              orders[].order_lines[] AS line
       FROM my_data_source)

Result

Since there is a natural pairing between orders[].name and orders[].order_lines[] where each name has a corresponding order_lines array, using UNNEST on this query does not result in a Cartesian product.

name
line

a

1

a

2

a

3

b

4

b

5

b

6

Sample query with Cartesian product

UNNEST(SELECT orders[].name AS name, 
              orders[].order_lines[] AS line, 
              orders[].order_date[] AS order_date
       FROM my_data_source)

Result

While orders[].order_lines[] and orders[].order_date[] are both part of orders[], there is no natural pairing between the values in the two respective arrays. As such, using UNNEST on this query will result in a Cartesian product.

name
line
order_date

1

a

07/30/2021

2

a

07/30/2021

3

a

07/30/2021

1

a

11/27/2021

2

a

11/27/2021

3

a

11/27/2021

4

b

03/21/2021

5

b

3/21/2021

6

b

3/21/2021

4

b

09/13/2021

5

b

09/13/2021

6

b

09/13/2021

To run an UNNEST query that produces a Cartesian Product you must set SKIP_VALIDATIONS = ('ALLOW_CARTESIAN_PRODUCT').

Last updated 11 months ago