Change log

Change log for SQLake (
Release notes for can be found here.


    • Support AWS Glue Data Catalog as a target. This will create a table and maintain schema evolution in the target for every replicated table
  • Performance improvement in transformations of jobs / materialized views
Bug Fixes
    • Fixed new Kafka ingestion/data sources stalling when reading from the start in certain situations


Bug Fixes
    • Aggregated Jobs Update: For aggregated jobs (transformation jobs using GROUP BY) with an unspecified time window in time filters, e.g. WHERE TIME_FILTER() or WHERE $event_time BETWEEN RUN_START_TIME() AND RUN_END_TIME(), the default aggregation window is now set to the job's interval, replacing the previous infinite window default


Bug Fixes
  • S3 Data Source: Resolved a race condition that could lead to duplicated ingestion of the same file in scenarios where an S3 data source is used with a date pattern that does not follow lexicographical order
  • Minor Bug Fixes


  • Enhancements
    • Placeholder (Dynamic Jobs) support in Transformation Jobs to Glue Data Catalog
  • Bug Fixes
    • Minor Bug Fixes


  • CDC Jobs:
Bug Fixes
  • Minor Bug Fixes


  • When using Avro Schema Registry content type with Debezium, support parsing JSON, Timestamp, and Date types
  • Jobs Page:
    • Support Drop job from the UI
    • Support Pause/Resume job via UI and SQL (Pause is not supported for ingestion or transformation jobs that write to a data lake table as a target)
Bug Fixes
  • Jobs List:
    • Fixed Backlog calculation to show more accurate times of the current running jobs
    • Fixed Events Over Time graph for Replication job and capped its range by the job running times
  • Minor bug fixes


  • Jobs page: Support the ability to filter by Job Status
Bug Fixes
  • Minor bug fixes



  • Added support for r7 instance types in Compute Clusters
Bug Fixes
  • Fixed a bug causing recent data from a partitioned table with a primary key to be discarded if the transformation job was filtering by at least one partition field
  • Fixed an issue preventing users from being able to create compute clusters
  • Fixed the replay cluster not being shut down in some situations
  • Fixed an issue of a job reading from a system table
  • Fixed an issue of high CPU in Amazon S3 outputs while viewing Datasets


  • CDC Replication Group Jobs:
    • Support new write mode option OPERATION_TYPE_COLUMN for replication group APPEND write mode
Bug Fixes
  • Minor bug fixes


  • New status for Failed (Retrying) jobs: If a job is currently failing, for example, it encountered fatal errors that currently prevent or will prevent it from proceeding, the job status will be changed to Failed (Retrying)
  • S3 With SQS: Limit the size of bulk reads from SQS to ensure data is distributed evenly
Bug Fixes
  • Fixed a bug on the jobs page where the Events Over Time graph shows the wrong range
  • Minor bug fixes


  • Sidebar redesign and improvements
  • Introduce Datasets:
    • View outputs\destinations schema
    • Monitor data health, freshness, and volume when loaded into the destination
    • Monitor data quality violations
    • List all target tables
Bug Fixes
  • Filtered out the heartbeat table from the CDC status display for Postgres CDC
  • Fixed a bug causing loading of all tasks to fail if a task was created with a start time way in the future
  • Select a.* (without an alias) will behave the same as select a.* as *, directly extracting the nested fields from object a, and returning them as separate columns instead of one object


  • Added KEEP_REPLICATION_SLOT option to ingestion and replication jobs that read from PostgreSQL. This preserves the replication slot created when the job is dropped
Bug Fixes
  • Minor bug fixes


  • Running queries can now be stopped via the UI with a new Cancel button next to a running query in the Event Log
  • Running queries in Upsolver's Query Engine can now be stopped using the following new SQL syntax: ABORT QUERY <query_id>
  • You can now see what queries are running in Upsolver's Query Engine by querying the system.monitoring.running_queries table
  • Improved performance during replay/initial-processing for outputs to Redshift
Bug Fixes
  • Fixed inconsistent catalog names in system.insights tables for Snowflake outputs
  • Fixed an issue where FLATTEN_PATH wasn't case insensitive with columns from a SELECT * statement


Bug Fixes
  • Jobs: Fixed an issue causing NULL values when casting raw columns to multiple different types in the same query


Bug Fixes
  • Minor bug fixes


  • Apache Kafka and Confluent Kafka connections: support setting SASL username and password with dedicated parameters SASL_USERNAME and SASL_PASSWORD
  • Support new function: REGEXP_EXTRACT
  • Allow the creation of a new cloud integration on a new organization
  • Ingestion wizard: now supports Confluent Cloud as a source
  • Ingestion jobs: now supports Amazon S3 via SQS
  • UI design improvements
Bug Fixes
  • When editing the number of shards in an Apache Kafka output job, Upsolver waits until the previous shards are completed before running the new shards
  • Fixed an issue of Query engine request timed out




  • Support is now available for using an external Hive Metastore as a catalog
  • PostgreSQL CDC:
    • Tables that aren't included in the publication will not be part of the snapshot
  • Apache Kafka Jobs:
    • When copying data from Kafka topics, names are now treated as globs (stars match any number of chars, and question marks match one char)
  • Elasticsearch Jobs:
    • Write timestamp and date types as ISO-8601 strings in jobs that write to Elasticsearch
  • Support added for il-central-1 region. This region is currently only supported with private VPC deployments
  • Reduced the number of Amazon S3 API calls to lower S3 costs
Bug Fixes
  • Synced transformation jobs with an interval smaller than one of the jobs writing to a source table, that did not read the respective data
  • Minor bug fixes


  • Snowflake Jobs:
    • SELECT * will preserve the original case of field names in variant columns
  • SQL: Allow altering EXPOSE_IN_CATALOG property in tables
  • Performance Improvement: Reduce the number of file operations when coordinating future table operations
  • Write Timestamp and Date types as ISO-8601 strings in jobs that write to Elasticsearch
  • Ingestion wizard:
Bug Fixes


  • Write Timestamp and Date types as ISO-8601 strings in string output jobs, for example: job to Amazon S3 with format JSON/CSV
  • Write Timestamp and Date types as ISO-8601 in RECORD_TO_JSON function
Bug Fixes
  • Performance improvements in CDC jobs
  • Performance improvements when querying the Upsolver Query Engine
  • Minor bug fixes


  • Users can now omit the connection type when specifying a source or target in jobs (e.g. INSERT INTO S3 catalog LOCATION = '...' can be replaced in INSERT INTO catalog LOCATION = '...')
  • Improved the performance of CDC jobs reading from databases with a large number of tables
  • Querying the information schema tables now returns Jobs and Materialized Views in deleting state
  • Elasticsearch Jobs:
    • MERGE jobs now support deleting documents by using the WHEN MATCHED AND ... THEN DELETE syntax
  • Upgraded Avro and Parquet libraries to the latest versions
Bug Fixes
  • Major improvements when reading from a table with a large number of partitions
  • Minor bug fixes


Bug Fixes
  • Minor bug fixes


Bug Fixes
  • Minor bug fixes


Bug Fixes
  • Snowflake Jobs:
    • Fixed an issue with a custom insert/update expression causing the job to fail if the field is also mapped in the select statement
    • On auto-managed tables, Upsolver will not create an extra column if the following conversion happens:
      • Original column is Double and got a value of type Long
      • Original column is Timestamp and got a value of type Date
      • Original column is Varchar
      • Original column is Variant In all other cases, we will create an extra column with the new type as the column name suffix. For example: if a column col was of type Bigint and got a Double value, we will create a column COL_DOUBLE in the Snowflake table
    • Fixed a delay in Materialized View on Job List/Index page


Bug Fixes
  • Fixed tree on fields containing dots, e.g. turning {"a\.b": 1} to {"a.b": 1}.
  • Snowflake jobs: changed the file format to copy from Avro to JSON. This fixed an issue when ingesting records with sub-fields that have special characters.


  • Added VALUE_INDEX_IN_ROW() - this function receives an element of an array of records and returns the 1-based index of the element position (incrementing regardless of whether the array is nested). Null values are not counted.
  • Added VALUE_INDEX_IN_ARRAY() - this function receives an element of an array of records and returns the 1-based index of the element position (index resets to 1 for each sub-array). Null values are not counted.
  • Ingestion wizard - support added for creating a heartbeat table within the wizard.
Bug Fixes
  • Reduce the frequency of metadata queries to Snowflake in order to reduce the cost of COMPUTE SERVICES charged by Snowflake.
  • [BREAKING CHANGE] Fixed RECORD_TO_JSON on fields containing dots, e.g. turning {"a.b": 1} to {"a\.b": 1}.


Bug Fixes
  • Minor bug fixes and improvements


  • New UUID() function returns a unique identifier (UUID) string.
  • PostgreSQL CDC: ignore rows from the heartbeat table
  • Upgraded Debezium version from 2.1.3 to 2.2.1
  • Ingestion wizard:
    • Supports compute cluster input (in case the organization has more than one compute cluster)
    • Supports basic expectation
  • The cluster version appears in the UI on the clusters page
  • Snowflake table: Show Variant columns statistics on field level
  • Sign-out is now available from the main screen
Bug Fixes
  • Fixed the conversion of float to double to preserve the perceived semantic value in CDC sources and in data sources that get Avro or Parquet
  • Minor bug fixes and improvements


  • Added $row_number system column to transformation jobs
  • [BREAKING CHANGE] Changed $row_number system field from 0-based to 1-based
  • Added $item_index system column, representing the source batch's row index. For example, in S3 sources, it will be the row index in a file
  • S3 outputs now support the inclusion of the shard number in the target path. This allows the use of output shards without overwriting the output files
  • User information and organization name now displayed on the main pages with the ability to switch between organizations
Bug Fixes
  • Fixed a bug in the IS_DUPLICATE function that caused the wrong results when the job is running with an interval higher than 1 minute
  • Fixed a bug reading Avro and Parquet files that caused fields of type Date to be ignored
  • Minor bug fixes


Bug Fixes
  • Fixed an issue reading from empty Kafka topics that contain empty partitions
  • Fixed a bug reading Avro files that use a named type more than once
  • Minor bug fixes
  • Snowflake table statistics are now available


Bug Fixes
  • Snowflake Merge Jobs: enforce the ON clause expression to prevent creating an array
  • Minor bug fixes


Bug Fixes
  • Job status page improvements
  • Minor bug fixes
  • CDC: PostgreSQL with partitioned tables - expose data.full_partition_table_name field specifying the name of the event's original partition
  • Error messages improvements


Bug Fixes
  • CASE WHEN now handles NULL as input and returns the ELSE value
  • CDC: Fixed the bug that caused the ingestion of a decimal type column to be converted to binary base64 string


Bug Fixes
  • COLUMN_TRANSFORMATIONS with dependencies between them created the wrong name for the nested column
  • Fixed target name column value for Snowflake outputs in the table
  • Validate that the first parameter in an ARRAY_JOIN is not a literal
  • Ingestion wizard now supports Amazon Kinesis source


Bug Fixes
  • Fixed the bug where TABLE_DATA_RETENTION could be disabled by disabling compactions
  • Dropping a table while specifying DELETE_DATA = true did not delete data files written by jobs with RUN_PARALLELISM > 1
  • Parquet Files are now distributed more evenly when ingesting data from Amazon S3 with high execution parallelism
  • Fixed a bug when selecting from large Materialized Views with predicates on key columns would return "Query exceeded input row limit"
  • Fixed a bug where a job reading from information_schema.columns does not write data into a table
  • Fixed a bug where querying can result in an error
  • BYTES_SUBSTRING position now starts from 1 as SUBSTRING (previously started from 0)
  • New SQL syntax is now supported:
    • SHOW CREATE JOB "Job name"
    • SHOW CREATE TABLE "Table name"
    • SHOW CREATE CLUSTER "Cluster name"
  • PostgreSQL CDC: Support reading 14+ partitioned tables by the root table name instead of the underlying partition table names
  • Snowflake: Added query tag to queries executed by Upsolver for easier cost tracking


Bug Fixes
  • Improved statistics in Job Status
  • Fixed the issue of inviting a member to the organization not working
  • Prevented the creation of sync jobs that read from system tables
  • Fixed a bug in jobs when writing to Amazon S3 with a dynamic location
  • Fixed a bug that caused some columns to be missing when reading from a table
  • Support querying all system tables using the syntax: SELECT $*
  • Information Schema: added atype_evolution column to the system table system.information_schema.columnsto show all previous types of the column


Bug Fixes
  • Job Status page bug fixes
  • Improved error messages in the Ingestion Wizard
  • Added $event_date column to all transformation jobs that write to a Managed Upsolver table
  • SQL/AutoComplete: Show aggregation result fields
  • System Tables: added elastic IPs column to system.information_schema.clusters


Bug Fixes
  • Fixed an issue collecting field statistics and metadata for large data files with a large number of unique field names
  • Snowflake output job now supports SELECT *: creating and managing the snowflake table.
    • CDC to Snowflake SELECT *: Support ingested JSONB as a variant
  • Allow syntax in JOB: START_FROM = NOW - INTERVAL '6' HOURS
  • Delete intermediate files after copy to Redshift
  • Copy From Features: Add a Deduplication option to the COPY FROM job
  • Added PARSE_JSON_COLUMNS option to CDC COPY FROM jobs. This will parse any JSON typed columns in the database as nested objects in the target table.
  • SQL/AutoComplete: Show aggregation result fields
  • Support partial flattening of arrays in jobs that write to Upsolver tables: FLATTEN_PATHS = (A)
  • Ingestion wizard - Easy Ingest to Snowflake:
    • Step-by-step wizard, no SQL, no data lake tables. Supports significant data quantities, streaming data, and strong ordering of data. Comes with deduplication and field hashing capabilities
  • Execution results and event log experience will store outside the worksheet page. Users can return to the worksheet later and start where they left off
  • Job Status (beta)


Bug Fixes
  • Fixed the system catalog name from System to system
  • AvroRegistry content type: Support URL encoded authentication information
  • Snowflake: Support keeping old values on partial updates
  • Fixed "deleting" entities showing up in information_schema tables
  • Show all JDBC jobs on the tree
  • Show "Staging Location" in inspection panel of S3 Copy From jobs with enabled DELETE_AFTER_LOAD option
  • Add editor shortcuts to increase/decrease the font size CMD+/- on mac
  • S3 output file Type options (set delimiter for S3 outputs)
  • SQLake S3 output: Allow overwrite
  • Expose editor shortcuts in the help panel widget
  • Display the original file path in the "copy from job" info
  • Functions: New function: RECORD_TO_JSON


Bug Fixes
  • JDBC Outputs: delete intermediate files after being written to the DB
  • The cluster catalog is now visible in the tree
  • Gather all system entities in the tree under a catalog named "System"
  • Improved AS OF syntax
  • Auto complete on Jobs
  • Daily usage graph and report are available
  • Ability to decide which query engine to use to run a select statement (Athena/Upsolver)
  • Put all System entities under a catalog named "System"
  • Support AS of syntax
  • Information Schema: Add a table for columns


Bug Fixes
  • JDBC Outputs: delete intermediate files after being written to the DB
  • Cluster catalog is now visible in the tree
  • Gather all system entities in the tree under a catalog named "System"
  • Improved AS OF syntax
  • Auto complete on Jobs


Bug Fixes
  • Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka
  • Show information schema catalog on the tree
  • Auto Complete on Information Schema tables and columns
  • Allow creating jobs from Information Schema tables
  • Add support for Timestamp, Date, and Decimal types in CDC and AVRO sources
  • Added support for bigserial in Postgres outputs
  • Support EXCLUDE_COLUMNS for a COPY FROM (ingestion) job
  • Memory allocation optimizations in Lookup Table Query servers


Bug fixes
  • Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API.
  • Upgrade debezium to V2.1.2
  • Support transformation job to PostgreSQL
  • Expose security information within the app to allow easier AWS configuration to connect your own data
  • Memory allocation optimizations in Lookup Table Query servers
  • Allow to alter Materialized View COMMENT.
  • Display managed entities in the tree even when can't connect to Athena.
  • Support ignoring fields in COPY FROM jobs by specifying the EXCLUDE_COLUMNS option.


Bug fixes
  • Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures
  • Use regional STS endpoints if available
  • Indication on an executed statement in the editor, successs/failed
  • Column appears immediately on the tree on creating transformation job
  • Support CAST expression in the language
  • Renamed function TO_LONG to TO_BIGINT
  • SELECT * now returns columns from joined Materialized View


  • Enhancements
    • Add support for information_schema queries
    • Add support for SKIP_VALIDATION and SKIP_ALL_VALIDATION options.
      • DEPRECATION: ALLOW_EMPTY_SOURCES will be deprecated in favor of the new options.
    • Added validation to prevent explicitly mapping fields with different data type to the defined output table columns.
    • Bug Fixes
    • Add support for hierarchical system columns in the tree


  • Enhancements
    • Support using non fully qualified names for tables and materialized views
    • Improve error message when trying to create a table with the same name as existing one
    • Support querying without WHERE statement (Infinite Window)
    • S3 Output Job: Support split files to folders
    • Home page redesign
    • Cluster management tab in the UI
  • Bug Fixes
    • Alter Cluster: Fix Alter to null not set default values
    • Transformation Jobs: Fix missing columns mapping validations for partition and key columns
    • API: Support join materialized view with array


  • Enhancements
    • Jobs monitoring will now show materialized views
    • API: Join with materialized view is no longer requiring alias for the mv
    • Auto completion for CREATE TABLE options
    • Added a new System Table jobs.transform_job_state that provides a summary status of all running transform jobs.
    • Remove RETENTION property from all transformation jobs
    • Improved query results tab
  • Bug Fixes
    • Fixed slow loading of Schemas under Athena connections in the tree
    • Job SQL Statement in Inspection Pane doesn't omit parenthesis when they're required
    • Monitoring: Fixed the 'job_name' of aggregation stages to be the original 'job_name' instead of "Output Aggregation". This means logs in the System Table 'logs.task_executions' will now have a correct 'job_name' for aggregation stages.
    • Fixed MAP_COLUMNS_BY_NAME is not needed for S3, Elasticsearch targets


  • Enhancements
    • New system columns added: $source_id , $shard_number , $row_number
    • Support Time Travel in joins!
    • Support running a SELECT query without a FROM clause
    • Support running a SELECT query reading from an Upsolver Classic Data Source
    • Support running UNNEST queries
    • Support selecting columns by their fully qualified name (e.g. catalog.schema.table.column)
    • Support select System Columns with glob patterns
  • Bug Fixes
    • Fixed slow replay progress for Snowflake and PostgreSQL outputs


  • Enhancements
    • Event log improvements + Present informative diagram for copy/transform jobs
    • Pipeline Monitoring: Expose filtered rows due to missing PK or Partition Column
    • CLI: Show only message on DDL commands success
    • Support using Classic Data Sources


  • Enhancements
    • Inviting a user to an organization is now supported
    • Expose the column data type in the tree
    • Improved CLI experience
    • New SQLake templates added (CDC MySQL, CDC PostgreSQL, Elasticsearch, Snowflake, Redshift)


  • Enhancements
    • Support Transformation Jobs to Elastic Search


  • Enhancements
    • PostgreSQL CDC: Moved TABLE_INCLUDE_LIST and COLUMN_EXCLUDE_LIST from job options to source definition
    • MySQL CDC: Moved TABLE_INCLUDE_LIST and COLUMN_EXCLUDE_LIST from job options to source definition
    • API token management


  • Enhancements
    • Support Transformation Jobs to S3
    • Support Copy From PostgreSQL Jobs
    • New Home Page
    • Support private VPC integration


  • Enhancements
    • New system tables: running_tasks, failing_tasks
    • Support Copy From MySQL Jobs
    • Support Transformation Jobs to Redshift


  • Features
    • New system tables were added:
      • running_tasks
      • failing_tasks
      • copy_from_job_status
  • Changes
    • Preview is now limited to a fixed amount of input rows. Queries that are too large for preview will be aborted
  • Bug Fixes
    • Jobs: transformation jobs with an interval larger than one minute did not handle cases where the start time or end time of the job was not fully aligned with that interval