BYTES_SUBSTRING
Returns a substring of the input value, using the offsets in bytes of the UTF-8 encoded byte representation. Partial characters and invalid UTF-8 code points are removed from the result.
Syntax
Use the following syntax to include the BYTES_SUBSTRING function in your query:
BYTES_SUBSTRING(value, startIndex, endIndex)Arguments
The BYTES_SUBSTRING function requires the following arguments:
valueinput is the string upon which you want to perform the substring function.startIndexbegins at1and includes the first character in your string.endIndexcounts from the first character in the string, not from thestartIndexposition.
value
string
The input value to perform the substring function on
startIndex
int
The inclusive start index in bytes
endIndex
int
The exclusive end index in bytes
Returns
Returns a string value.
Examples
The following example shows how the startIndex and endIndex arguments affect the Output value. Here, the endIndex value demonstrates that regardless of the startIndex value, the endIndex works from position 1 in the string to include 10 characters.
'Hello World'
1
10
Hello Worl
'Hello World'
2
10
ello Worl
'⻤Hello Wor⻤'
2
10
Hello W
'⻤Hello Wor⻤'
1
10
⻤Hello W
'Hello'
1
10
Hello
Transformation job example
The script below includes the BYTES_SUBSTRING function in a SELECT statement within a transformation job, using the "Hello World" string example.
SQL
CREATE JOB function_operator_example
ADD_MISSING_COLUMNS = true
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data
MAP_COLUMNS_BY_NAME
SELECT value, startIndex, endIndex,
BYTES_SUBSTRING(value, 1, 10) AS Output
FROM default_glue_catalog.upsolver_samples.orders_raw_data
LET value = 'Hello World',
startIndex = 1,
endIndex = 10
WHERE $commit_time BETWEEN run_start_time() AND run_end_time()
LIMIT 1;Query result
'Hello World'
1
10
Hello Worl
Last updated
