Working with Date Patterns

Before you create your landing zone where raw data will be staged for Upsolver to ingest, you need to be aware of a couple of things:

Date values should be lexicographically ordered and include a leading zero when applicable. For example, use /2022/02/04 for February 4th 2022, over /22/2/4.
Avoid including a dynamically changing prefix, like application ID or job cluster ID, as the first part of your prefix. If you need to include it, add it at the end. For example, use /2022/02/04/X34TFA2 instead of /X34TFA2/2022/02/04.

To ingest date-partitioned Amazon S3 data, you can configure the DATE_PATTERN property of your ingestion job. For Apache Hive partitions, the value is prefixed with a keyword representing its meaning, for example, /year=2022/month=02/day=04/. To support this format, you include the prefixes in the DATE_PATTERN property surrounded by two single quotes (not a double quote).

Using the above example, the DATE_PATTERN will be ’’’year=’’yyyy’’/month=’’MM’’/day=’’dd’. As you can see, the string literals in the pattern, like ‘’year=’’ and ‘’month=’’ are wrapped in two single quote characters, not double quotes.

If there is a single quote in your folder path that needs to be represented in the path literal, you need to surround it with two single quotes. For example, year(‘0’)=2000 would be represented as ’’’year(’’’’0’’’’)=’’yyyy/’. This could be confusing, so ensure you’re quoting the string literal parts of the pattern and not the date value patterns.

Note that when there is no date pattern defined, you cannot specify a time to start ingestion; all available data is ingested by default.

Last updated 1 year ago