REGEX_NAMED_GROUPS

Matches the regular expression on the input string. Returns record with field names and group names

Syntax

REGEX_NAMED_GROUPS(pattern, allMatches, filterEmpty, input)

Arguments

NameTypeDescriptionDefault Value

pattern

string

Regular Expression Pattern

allMatches

boolean

Return all the matches of the pattern, and not only the first one

false

filterEmpty

boolean

Filter out empty matches

false

input

string

Examples

patternallMatchesfilterEmptyinputOutput

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

false

'https://www.domain.com/page.html'

{scheme: https, domain: www.domain.com, port: null, page: page.html}

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

false

'http://www.domain.com:8080/page.html'

{scheme: http, domain: www.domain.com, port: 8080, page: page.html}

'^(?\d*)$'

false

false

'123'

{digits: 123}

'^(?\d*)$'

false

false

'foo'

null

'^(?\d*)$'

false

false

''

{digits: ``}

'^(?\d*)$'

false

true

''

null

'\bwww.(?[^.]*).com\b'

true

false

'www.upsolver.com'

{domain: upsolver}

'\bwww.(?[^.]*).com\b'

true

false

'www.a.com www.b.com'

[{domain: a}, {domain: b}]

'\bwww.(?[^.]*).com\b'

false

false

'www.a.com www.b.com'

{domain: a}

Transformation job example

SQL

CREATE JOB function_operator_example
    ADD_MISSING_COLUMNS = true
    AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data MAP_COLUMNS_BY_NAME
    SELECT pattern, allMatches, filterEmpty, input,
        REGEX_NAMED_GROUPS('^(?:(?<scheme>.*?):\/)?\/?(?<domain>[^:\/\s]+)(?::(?<port>\d*))?(?:(\/\w+)*\/)(?<page>[\w\-\.]+[^#?\s]+)(?:.*)?$', false, false, input) AS Output
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    LET pattern = '^(?:(?<scheme>.*?):\/)?\/?(?<domain>[^:\/\s]+)(?::(?<port>\d*))?(?:(\/\w+)*\/)(?<page>[\w\-\.]+[^#?\s]+)(?:.*)?$',
        allMatches = false,
        filterEmpty = false,
        input = 'https://www.domain.com/page.html'
    WHERE time_filter()
    LIMIT 1;

Query result

patternallMatchesfilterEmptyinputOutput

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

false

'https://www.domain.com/page.html'

{scheme: https, domain: www.domain.com, port: null, page: page.html}

Last updated