BYTES_SUBSTRING

This page describes how to use the BYTES_SUBSTRING function to return a portion of a string value.

Returns a substring of the input value, using the offsets in bytes of the UTF-8 encoded byte representation. Partial characters and invalid UTF-8 code points are removed from the result.

Syntax

Use the following syntax to include the BYTES_SUBSTRING function in your query:

BYTES_SUBSTRING(value, startIndex, endIndex)

Arguments

The BYTES_SUBSTRING function requires the following arguments:

  • value input is the string upon which you want to perform the substring function.

  • startIndex begins at 1 and includes the first character in your string.

  • endIndex counts from the first character in the string, not from the startIndex position.

NameTypeDescription

value

string

The input value to perform the substring function on

startIndex

int

The inclusive start index in bytes

endIndex

int

The exclusive end index in bytes

Returns

string

Examples

The following example shows how the startIndex and endIndex arguments affect the Output value. Here, the endIndex value demonstrates that regardless of the startIndex value, the endIndex works from position 1 in the string to include 10 characters.

valuestartIndexendIndexOutput

'Hello World'

1

10

Hello Worl

'Hello World'

2

10

ello Worl

'⻤Hello Wor⻤'

2

10

Hello W

'⻤Hello Wor⻤'

1

10

⻤Hello W

'Hello'

1

10

Hello

Transformation job example

The script below includes the BYTES_SUBSTRING function in a SELECT statement within a transformation job, using the "Hello World" string example.

SQL

CREATE JOB function_operator_example
    ADD_MISSING_COLUMNS = true
    AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data MAP_COLUMNS_BY_NAME
    SELECT value, startIndex, endIndex,
        BYTES_SUBSTRING(value, 1, 10) AS Output
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    LET value = 'Hello World',
        startIndex = 1,
        endIndex = 10      
    WHERE time_filter()
    LIMIT 1;

Query result

valuestartIndexendIndexOutput

'Hello World'

1

10

Hello Worl

Last updated