REGEX_NAMED_GROUPS

Matches the regular expression on the input string. Returns the record with field names and group names.

Syntax

REGEX_NAMED_GROUPS(pattern, allMatches, filterEmpty, input)

Arguments

NameTypeDescriptionDefault Value

pattern

string

Regular Expression Pattern

allMatches

Boolean

Return all the matches of the pattern, and not only the first one

false

filterEmpty

Boolean

Filter out empty matches

false

input

string

Examples

patternallMatchesfilterEmptyinputOutput

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

false

'https://www.domain.com/page.html'

{scheme: https, domain: www.domain.com, port: null, page: page.html}

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

false

'http://www.domain.com:8080/page.html'

{scheme: http, domain: www.domain.com, port: 8080, page: page.html}

'^(?\d*)$'

false

false

'123'

{digits: 123}

'^(?\d*)$'

false

false

'foo'

null

'^(?\d*)$'

false

false

''

{digits: ``}

'^(?\d*)$'

false

true

''

null

'\bwww.(?[^.]*).com\b'

true

false

'www.upsolver.com'

{domain: upsolver}

'\bwww.(?[^.]*).com\b'

true

false

'www.a.com www.b.com'

[{domain: a}, {domain: b}]

'\bwww.(?[^.]*).com\b'

false

false

'www.a.com www.b.com'

{domain: a}

Transformation job example

SQL

CREATE JOB function_operator_example
    ADD_MISSING_COLUMNS = true
AS INSERT INTO default_glue_catalog.upsolver_samples.orders_transformed_data 
  MAP_COLUMNS_BY_NAME
    SELECT pattern, allMatches, filterEmpty, input,
        REGEX_NAMED_GROUPS('^(?:(?<scheme>.*?):\/)?\/?(?<domain>[^:\/\s]+)(?::(?<port>\d*))?(?:(\/\w+)*\/)(?<page>[\w\-\.]+[^#?\s]+)(?:.*)?$', false, false, input) AS Output
    FROM default_glue_catalog.upsolver_samples.orders_raw_data
    LET pattern = '^(?:(?<scheme>.*?):\/)?\/?(?<domain>[^:\/\s]+)(?::(?<port>\d*))?(?:(\/\w+)*\/)(?<page>[\w\-\.]+[^#?\s]+)(?:.*)?$',
        allMatches = false,
        filterEmpty = false,
        input = 'https://www.domain.com/page.html'
    WHERE $commit_time BETWEEN run_start_time() AND run_end_time()
    LIMIT 1;

Query result

patternallMatchesfilterEmptyinputOutput

'^(?:(?.?):/)?/?(?[^:/\s]+)(?::(?\d))?(?:(/\w+)/)(?[\w-.]+[^#?\s]+)(?:.)?$'

false

false

'https://www.domain.com/page.html'

{scheme: https, domain: www.domain.com, port: null, page: page.html}

Last updated