VALUE_INDEX_IN_ARRAY

Calculates a 1-based positional index for each non-null value within a sub-array, resetting the index with each new sub-array. This is particularly useful for maintaining context-specific indexes within nested array structures.

Syntax

VALUE_INDEX_IN_ARRAY(array)

Arguments

array

Type: any

An array of any value types.

Example

Transformation job example

In this example, column orders contains the following two rows:

[{"products":[{"name":"Apple"},{"name":"Banana"}]},{"products":[{"name":"Cherry"},{"name":"Cherry"}]}]
[{"products":[{"name":"Apple","type":"Fruit"},{"name":"Banana"}]},{"products":[{"name":"Cherry"},{"name":"Cherry"},{"type":"Berry"}]}]

If we use this in the following job, it will output the query result below:

CREATE JOB demo_item_index
   RUN_INTERVAL = 1 MINUTE
   START_FROM = BEGINNING
   ADD_MISSING_COLUMNS = TRUE
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data 
 MAP_COLUMNS_BY_NAME
  SELECT
   *,
   VALUE_INDEX_IN_ARRAY(orders[].products[].name) AS orders[].products[].name_index,
   VALUE_INDEX_IN_ARRAY(orders[].products[].type) AS orders[].products[].type_index
  FROM default_glue_catalog.upsolver_samples.orders_raw_data
  WHERE time_filter();

Query result

The VALUE_INDEX_IN_ARRAY function calculates the positional index of the value in the products array and returns the following result:

[{"products":
[{"name":"Apple","name_index":1},{"name":"Banana","name_index":2}]},{"products":[{"name":"Cherry","name_index":1},{"name":"Cherry","name_index":2}]
}]
[{"products":
[{"name":"Apple","name_index":1,"type":"Fruit","type_index":1},{"name":"Banana","name_index":2}]},{"products":[{"name":"Cherry","name_index":1},{"name":"Cherry","name_index":2},{"type":"Berry","type_index":1}]}]

Unlike VALUE_INDEX_IN_ROW, VALUE_INDEX_IN_ARRAY resets the count for each products array. Each product is a separate context that provides an index for the position of name or type field in that specific array.

Last updated