Snowflake Schema Evolution

Understand how Upsolver manages changes to your source schema to keep your jobs running successfully.

Upsolver Schema Evolution

When a change occurs to the schema of your data source, Upsolver automatically detects the change during ingestion and dynamically applies it to the target. This change could be the addition or removal of a column or a change in data type for a column.

In a transformation job, you can use SELECT * to specify that all columns are included in the pipeline. Upsolver performs schema discovery based on the metadata and continues to dynamically monitor changes as data is ingested.

Records are parsed to extract a mapping of column names and infer the column type according to the value detected in each record. Therefore, a column type is determined on the first encounter and adjusted as more values are parsed.

For example, consider the following job:

CREATE SYNC JOB snowflake_dynamic_schema_evolution
  START_FROM = BEGINNING
  RUN_INTERVAL = 5 MINUTES
  CREATE_TABLE_IF_MISSING = true
AS INSERT INTO SNOWFLAKE snowflake_connection.orders.order_history 
  MAP_COLUMNS_BY_NAME
     SELECT *
     FROM default_glue_catalog.upsolver_samples.orders
     WHERE $event_time between run_start_time() AND run_end_time();

The job runs, and a record is ingested with the following structure:

{“details": "text"}

In this record, Upsolver discovers a column named details with a string type. Columns and rows are continuously added to the target table and, as the data is loaded, a different data type is discovered in the details column:

{"details": 1}

Now the job has found the same column name (details), with two different types - a string (VARCHAR) and a number (BIGINT) - and it needs to provide a coherent, updated details record.

Conflict resolution

How does Upsolver resolve this apparent conflict? This is where the evolution happens: under the hood, Upsolver allows the existence of multiple types for each column and applies type-casting where possible. The new types are added as DETAILS_BIGINT and where the value encountered can be cast to a number, Upsolver applies the casting to populate the column for this record. If no such casting is supported by the target platform, e.g. Snowflake, a NULL value is used instead.

Special casting

Each data platform has its own methods for handling data types. For Snowflake, Upsolver uses the following mapping to perform casting that is otherwise not supported, where the value is denoted as v:

Source Type

Target Type

Expression

BIGINT

TIMESTAMP

TRY_CAST($v / 1000 as TIMESTAMP)

TIMESTAMP

BIGINT

EXTRACT(EPOCH_MILLISECONDS FROM $v)

BIGINT

DATE

TRY_CAST(TRY_CAST($v * 86400 AS VARCHAR) AS DATE

DATE

BIGINT

TRY_CAST(EXTRACT(EPOCH_SECOND FROM $v) / 86400 AS BIGINT)