BYTES_SUBSTRING

Returns a substring of the input value, using the offsets in bytes of the UTF-8 encoded byte representation. Partial characters and invalid UTF-8 code points are removed from the result.

Syntax

Use the following syntax to include the BYTES_SUBSTRING function in your query:

BYTES_SUBSTRING(value, startIndex, endIndex)

Arguments

The BYTES_SUBSTRING function requires the following arguments:

  • value input is the string upon which you want to perform the substring function.

  • startIndex begins at 1 and includes the first character in your string.

  • endIndex counts from the first character in the string, not from the startIndex position.

Name
Type
Description

value

string

The input value to perform the substring function on

startIndex

int

The inclusive start index in bytes

endIndex

int

The exclusive end index in bytes

Returns

Returns a string value.


Examples

The following example shows how the startIndex and endIndex arguments affect the Output value. Here, the endIndex value demonstrates that regardless of the startIndex value, the endIndex works from position 1 in the string to include 10 characters.

value
startIndex
endIndex
Output

'Hello World'

1

10

Hello Worl

'Hello World'

2

10

ello Worl

'⻤Hello Wor⻤'

2

10

Hello W

'⻤Hello Wor⻤'

1

10

⻤Hello W

'Hello'

1

10

Hello

Transformation job example

The script below includes the BYTES_SUBSTRING function in a SELECT statement within a transformation job, using the "Hello World" string example.

SQL

CREATE JOB function_operator_example
    ADD_MISSING_COLUMNS = true
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data 
  MAP_COLUMNS_BY_NAME
    SELECT value, startIndex, endIndex,
        BYTES_SUBSTRING(value, 1, 10) AS Output
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    LET value = 'Hello World',
        startIndex = 1,
        endIndex = 10      
    WHERE $commit_time BETWEEN run_start_time() AND run_end_time()
    LIMIT 1;

Query result

value
startIndex
endIndex
Output

'Hello World'

1

10

Hello Worl

Last updated