REGEX_NAMED_GROUPS

Matches the regular expression on the input string. Returns the record with field names and group names.

Syntax

REGEX_NAMED_GROUPS(pattern, allMatches, filterEmpty, input)

Arguments

Name

Type

Description

Default Value

pattern

string

Regular Expression Pattern

allMatches

Boolean

Return all the matches of the pattern, and not only the first one

false

filterEmpty

Boolean

Filter out empty matches

false

input

string

Examples

pattern

allMatches

filterEmpty

input

Output

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

'https://www.domain.com/page.html'

{scheme: https, domain: www.domain.com, port: null, page: page.html}

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

'http://www.domain.com:8080/page.html'

{scheme: http, domain: www.domain.com, port: 8080, page: page.html}

'^(?\d*)$'

false

'123'

{digits: 123}

'^(?\d*)$'

false

'foo'

null

'^(?\d*)$'

false

{digits: ``}

'^(?\d*)$'

false

true

null

'\bwww.(?[^.]*).com\b'

true

false

'www.upsolver.com'

{domain: upsolver}

'\bwww.(?[^.]*).com\b'

true

false

'www.a.com www.b.com'

[{domain: a}, {domain: b}]

'\bwww.(?[^.]*).com\b'

false

'www.a.com www.b.com'

{domain: a}

Transformation job example

SQL

CREATE JOB function_operator_example
    ADD_MISSING_COLUMNS = true
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data 
  MAP_COLUMNS_BY_NAME
    SELECT pattern, allMatches, filterEmpty, input,
        REGEX_NAMED_GROUPS('^(?:(?<scheme>.*?):\/)?\/?(?<domain>[^:\/\s]+)(?::(?<port>\d*))?(?:(\/\w+)*\/)(?<page>[\w\-\.]+[^#?\s]+)(?:.*)?$', false, false, input) AS Output
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    LET pattern = '^(?:(?<scheme>.*?):\/)?\/?(?<domain>[^:\/\s]+)(?::(?<port>\d*))?(?:(\/\w+)*\/)(?<page>[\w\-\.]+[^#?\s]+)(?:.*)?$',
        allMatches = false,
        filterEmpty = false,
        input = 'https://www.domain.com/page.html'
    WHERE $commit_time BETWEEN run_start_time() AND run_end_time()
    LIMIT 1;

Query result

pattern

allMatches

filterEmpty

input

Output

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

'https://www.domain.com/page.html'

{scheme: https, domain: www.domain.com, port: null, page: page.html}

Last updated 1 year ago