Snowflake Schema Evolution
Understand how Upsolver manages changes to your source schema to keep your jobs running successfully.
Upsolver Schema Evolution
When a change occurs to the schema of your data source, Upsolver automatically detects the change during ingestion and dynamically applies it to the target. This change could be the addition or removal of a column or a change in data type for a column.
In a transformation job, you can use SELECT *
to specify that all columns are included in the pipeline. Upsolver performs schema discovery based on the metadata and continues to dynamically monitor changes as data is ingested.
Records are parsed to extract a mapping of column names and infer the column type according to the value detected in each record. Therefore, a column type is determined on the first encounter and adjusted as more values are parsed.
For example, consider the following job:
The job runs, and a record is ingested with the following structure:
{“details": "text"}
In this record, Upsolver discovers a column named details with a string type. Columns and rows are continuously added to the target table and, as the data is loaded, a different data type is discovered in the details column:
{"details": 1}
Now the job has found the same column name (details), with two different types - a string (VARCHAR) and a number (BIGINT) - and it needs to provide a coherent, updated details record.
Conflict resolution
How does Upsolver resolve this apparent conflict? This is where the evolution happens: under the hood, Upsolver allows the existence of multiple types for each column and applies type-casting where possible. The new types are added as DETAILS_BIGINT
and where the value encountered can be cast to a number, Upsolver applies the casting to populate the column for this record. If no such casting is supported by the target platform, e.g. Snowflake, a NULL value is used instead.
Special casting
Each data platform has its own methods for handling data types. For Snowflake, Upsolver uses the following mapping to perform casting that is otherwise not supported, where the value is denoted as v
:
Source Type | Target Type | Expression |
---|---|---|
BIGINT | TIMESTAMP | TRY_CAST($v / 1000 as TIMESTAMP) |
TIMESTAMP | BIGINT | EXTRACT(EPOCH_MILLISECONDS FROM $v) |
BIGINT | DATE | TRY_CAST(TRY_CAST($v * 86400 AS VARCHAR) AS DATE |
DATE | BIGINT | TRY_CAST(EXTRACT(EPOCH_SECOND FROM $v) / 86400 AS BIGINT) |