Change log

Change log for Upsolver Classic (

Release notes for may be found here.



  • Help widget adjustments (size and location have changed to prevent hiding application data)

Bug Fixes

  • Fixed Snapshotting of tables sometimes being stuck in CDC

  • Fixed a bug causing Amazon S3 outputs to be stuck on retention deletion when the output has multiple versions

  • Minor bug fixes



  • Upgraded libraries to include recent security patches to enhance system security and stability

Bug Fixes

  • Minor bug fixes



  • Upgraded libraries to include recent security patches to enhance system security and stability

Bug fixes

  • Minor bug fixes



  • Libraries upgrade to include security patches

Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Run connection validation tasks on servers with elastic ips when elastic ips are enabled

  • MongoDB CDC:

    • Corrected the parsing of Decimal types to Double

    • Resolved errors encountered when replicating collections containing fields with types Regex, Min Key, and Max Key



  • Upgraded the Snowflake driver to 3.15.0

Bug Fixes

  • Minor bug fixes



  • For new entities, use the updated Parquet list structure (parquet.avro.write-old-list-structure = false) when writing Parquet files to S3 and Upsolver Tables

Bug Fixes

  • Fixed a bug that could skip data when reading from CDC sources

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes



Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes

  • Please note that in addition to the updates mentioned in this release note, it encompasses all enhancements and changes from previous versions.


Bug Fixes

  • Minor bug fixes



  • Performance Improvement in Transformations of Outputs / Lookup Tables

  • Allow downloading the outputs list grid as CSV

Bug Fixes

  • Apache Kafka Jobs:

    • Fixed new Kafka ingestion/data sources stalling when reading from the start in certain situations


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • S3 Data Source: Resolved a race condition that could lead to duplicated ingestion of the same file in scenarios where an S3 data source is used with a date pattern that does not follow lexicographical order

  • Minor Bug Fixes


Bug Fixes

  • Minor Bug Fixes


Bug Fixes

  • Minor Bug Fixes



  • When using Avro Schema Registry content format with Debezium, support parsing JSON type

Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


  • Minor bug fixes



  • Added support for r7 instance types in Compute Clusters

Bug Fixes

  • Fixed an issue preventing users from being able to create compute clusters

  • Fixed the replay cluster not being shut down in some situations



  • New compactions metrics: Compaction delay, Number of files in WAL

Bug Fixes

  • Minor bug fixes



  • S3 With SQS: Limit the size of bulk reads from SQS to ensure data is distributed evenly

Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Fixed some rare instances where a broken JDBC Data Source could interfere with other task executions

  • Fixed a bug causing loading of all tasks to fail if a task was created with a start time way in the future


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Fixed an issue causing CDC inputs to get stuck if a table failed to snapshot


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Kafka/Kinesis output: Fixed an issue that could cause events to arrive out of order when changing the number of shards in the output properties.



  • Snowflake Output: changed the intermediate format from Avro to JSON. This change improves performance when writing to Snowflake and fixes an issue where writing to a column of type VARIANT with sub-fields that contain special chars in the field name

Bug Fixes

  • Minor bug fixes

  • Performance improvements when writing Parquet files



  • PostgreSQL CDC:

    • Tables that aren't included in the publication will not be part of the snapshot

  • Support added for il-central-1 region. This region is currently only supported with private VPC deployments

  • Elasticsearch Jobs:

    • Write timestamp and date types as ISO-8601 strings in jobs that write to Elasticsearch

  • Reduced the number of Amazon S3 API calls to lower S3 costs

Bug Fixes

  • Minor bug fixes



  • Write Timestamp and Date types as ISO-8601 strings in Elasticsearch output

  • Performance Improvement: Reduce the number of file operations when coordinating future table operations

Bug Fixes

  • Minor bug fixes



  • Write Timestamp and Date types as ISO-8601 strings in string outputs, for example: Amazon S3 output with format JSON/CSV

  • Write Timestamp and Date types as ISO-8601 in RECORD_TO_JSON function

Bug Fixes

  • Performance improvements in CDC data sources

  • Minor bug fixes



  • Improved the performance of CDC jobs reading from databases with a large number of table

  • Upgraded Avro and Parquet libraries to the latest versions

Bug Fixes

  • Fixed the SQL Parser to parse the LOG function and DECAYED_SUM aggregation

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes



  • Cluster version appears in the UI on the clusters page

Bug Fixes

  • Minor bug fixes



  • Updated snowflake JDBC driver version to 3.13.33

Bug Fixes

  • Fix UI error, "The client couldn't connect to the API cluster."

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes and improvements



  • New UUID() function returns a unique identifier (UUID) string

  • Upgraded Debezium version from 2.1.3 to 2.2.1

Bug Fixes

  • Fixed the conversion of float to double to preserve the perceived semantic value in CDC sources and in data sources that get Avro or Parquet .

  • Minor bug fixes



  • Add new headers in Data Sources: parser_shard_number and parser_row_number

Bug Fixes

  • Fixed a bug reading Avro and Parquet files that caused fields of type Date to be ignored

  • Minor bug fixes


Bug Fixes

  • Fixed an issue reading from empty Kafka topics that contain empty partitions

  • Fixed a bug reading Avro files that use a named type more than once

  • Minor bug fixes


Bug Fixes

  • Snowflake Merge Jobs: enforce the ON clause expression to prevent creating an array

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes


  • CDC: PostgreSQL with partitioned tables - expose data.full_partition_table_name field specifying the name of the event's original partition

  • UI Performance improvements


Bug Fixes

  • CASE WHEN now handles NULL as input and returns the ELSE value

  • CDC: Fixed the bug that caused the ingestion of a decimal type column to be converted to binary base64 string


  • [BREAKING CHANGE] GET_SHARD_NUMBER function no longer requires arguments


Bug Fixes

  • Minor bug fixes


  • Validate that the first parameter in an ARRAY_JOIN is not a literal


Bug Fixes

  • Parquet Files are now distributed more evenly when ingesting data from Amazon S3 with high execution parallelism


  • Snowflake: Added query tag to queries executed by Upsolver for easier cost tracking


Bug Fixes

  • Minor bug fixes


Bug Fixes

  • Minor bug fixes

  • CDC PostgreSQL: Fixed a bug that caused the replication slot to be not deleted when deleting the Data Source

  • Athena Output: Filter out rows when the partition field value is an empty string (Partition cannot be an empty string)


Bug Fixes

  • Fixed an issue collecting field statistics and metadata for large data files with a large number of unique field names


  • Delete intermediate files after copy to Redshift


Bug Fixes

  • JDBC Data Source: Fixed a bug that would not close the JDBC connection in some situations when using fullLoadInterval

  • AvroRegistry content type: Support URL encoded authentication information

  • Snowflake: Support keeping old values on partial updates

  • Upgrade Debezium to version 2.1.3


Bug Fixes

  • JDBC Outputs: delete intermediate files after being written to the database

  • Revert Debezium to version 1.4


Bug Fixes

  • JDBC Outputs: delete intermediate files after being written to the database


Bug Fixes

  • Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka


  • Upgrade Debezium to V2.1.2

  • Add Debezium version header

  • Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API

  • Memory allocation optimizations in Lookup Table Query servers


Bug Fixes

  • Fixed memory leak on Elasticsearch outputs

  • Minor bug fixes


Bug Fixes

  • Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures

  • Fixed an issue where discovering a new partition / topic without any messages would cause Kafka / Kinesis Data Sources to hang until a message arrived.

  • Fixed an issue when creating a Kafka Data Source over high number of topics would cause CPU spike in the API


  • Use regional STS endpoints if available


  • Bug Fixes

    • Minor bug fixes


  • Bug Fixes

    • Fixed bucket region detection when using an Amazon S3 Private VPC endpoint

    • API: Fixed a bug that cause to fail to Run new output with Lookup when using full history snapshot

    • Fixed an edge case that could cause data loss when editing stopped Athena output

  • Enhancements

    • Outputs: support window size override in non aggregated outputs


  • Bug Fixes

    • Monitoring: Fixed the 'operation_name' of aggregation steps to be the original 'operation_name' instead of "Output Aggregation". This means metrics reported via Monitoring Reports will now show aggregation step information under the correct 'operation_name'

    • Unsynchronized data sources no longer fail if they can't construct their consumers


  • Bug Fixes

    • SQL: Improved error messages and auto completion

  • Enhancements

    • Performance and memory improvements


  • Bug Fixes:

    • API: Prevent changing the end execution time for old output versions

      • API: Added validation to prevent creating Cloud Storage outputs with a date format that is not refined enough to include the Output Interval

      • Improved performance of Python UDF validations when uploading a new UDF

      • Fixed slow replay progress for Snowflake and PostgreSQL outputs


  • Enhancements

    • Added RAND function and added overload to RANDOM function that gets no arguments and returns a value between 0 and 1




    • Hive Metastore (Athena) Output: When using SELECT * with partition fields, if there is a field in the source that mapped to the partition field column, the field won't be written to the parquet files because this value can't be queried

  • Enhancements

    • Kafka Data Sources: Support unsynced mode, which allows the stream to continue processing even when there are errors or a backlog from the topic

    • Add presto-compliant RANDOM() and RAND() functions

    • We now support clusters that mix both Intel-based (e.g. r6i, r6a) and ARM-based types (e.g. r6g) within the same Elastigroup

  • Bug Fixes

    • Fix deadlock between the indexing task and index entry deletion task that could end up waiting for each other when modifying an Athena output's data

    • When deploying clusters to a region, we now filter out instance types that don't exist in that region

    • Hive Metastore (Athena) Output: not calculate statistics of rows that were filtered out due to missing partition field value Previously, if a row was filtered out because the partition field value was missing or null, the rows counted in Output Fields statistics and in Events over Time graphs.

    • Improved recovery mechanism when our configuration database is unavailable


  • Enhancements

    • Avoid out of order per key in Kinesis outputs by sending the same key only once within the same PutRecords request.

    • Improved performance of server boot time and memory usage for organizations that use high number of shards.


  • Bug Fixes

    • Fixed stack overflow in JDBC Data Source in some cases

    • API: Fixed a bug on generating SQL statement when the SQL is not sync with output's definition

  • Enhancements

    • Minor performance improvements in data processing critical path

    • Improved performance of servers boot and periodic configuration load, this might improve reliability and performance of data flow for organizations and clusters that have a lot of processing entities


  • Bug Fixes

    • Fixed a bug where Compactions would stop working when advancing the "End Execution At" property of the Hive Metastore Output after it has arrived (now > End Execution At).

    • API: Added validation to prevent creating connections with empty names

    • Minor Performance Enhancements


  • Bug Fixes

    • API: Fixed an issue that caused the SQL statement to be invalid after changing data source of an output

    • Fixed an issue when mapping numeric field to an upsert column of type string in JDBC outputs (Redshift, Snowflake, ...)

    • Fixed a rare bug where an internal metadata index would stop progressing, preventing compactions from occurring.

  • Enhancements

    • The Elasticsearch client version was upgraded from 6.x to 7.x in order to also support Elasticsearch 7 & 8 as output targets

    • Performance enhancements for clusters with a lot of tasks (more to come in the future)

    • Snowflake Output: Support writing to Transient Tables

    • Kafka Data Sources: Added an option to restart reading partition when the end offset of that partition is larger than the last offset read by the Data Source for the same partition. This should allow users to reset partitions.


  • Bug Fixes

    • Hive Metastore Output: performance improvements on calculating partition compaction trigger

    • Fixed a bug where Outputs with IS_DUPLICATE with big window sizes wouldn't be considered as completed

    • Fixed a bug where Outputs that depend on an Upsolver Output would run with a Runtime Delay based on the maximum Runtime Delay of all the versions of the Upsolver Output, the new behaviour will skip completed versions

    • Upsolver Query (Table output) was visible in the UI. This will now only be available via SQLake.

    • [BREAKING CHANGE] Simple S3 Data Source: changed the value of the time field to be the beginning of the minute instead of the end of the minute. This change will be applied only on new data sources

  • Enhancements

    • More informative errors when missing access to S3 resources


  • Bug Fixes:

    • API: Fixed being able to create a Kafka input with an invalid storage connection

    • API: ModifyServerFile changeset now adds file if not exists

  • New Features:

    • Compression: Add ZStandard

    • Redshift Output: Support authentication with IAM

    • Redshift Output: Support Super type

    • Roles Anywhere - Hide internal access/secret keys for SoC2

  • Enhancements

    • Upgraded Kafka Client to Version 3.2.0

    • Upgraded Redshift to Version

    • Improved the reliability of the connection between User Clusters and the Configuration Database

    • Performance improvements in the Compaction Coordinator in Athena Outputs

    • Improve error messages.

    • Enlarged maximum number of shards, output shards and compaction shards in outputs to 512


Bug Fixes

  • Simple Cloud Storage Input: Improvements to file discovery


  • Athena Outputs: Enabled partition column types other than string

  • Performance improvements


Changes in this Release:

  • API: The return value of shards and related fields changed from number to struct. The struct contains executionParallelism which represents the old number. Customers using API endpoints related to data sources, lookup tables or outputs may need to update their code. Please contact our support for details.

Bug Fixes

  • SQL Fixed the code-generation of the decimal type

  • Compute Cluster Fixed a bug that would cause the Compute cluster, in rare cases, not to update against the configuration.

  • Monitoring Fixed a bug that would not expose reporting tags to external monitors.

  • API Fixed a bug that prevented the rolling of clusters.

  • API Fixed a race condition that prevented multiple concurrent requests to work and returned Forbidden (403).

  • Snowflake Output Fixed a bug when writing values to DATE columns.

  • CDC Fixed a bug that failed to write data which was larger than 2GB.


  • Functions Added the PRESTO_ZIP function

  • Python Allow using URLLIB

  • CDC Improved the binlog delay monitor

  • AWS VPC integration Validated subnet ids in Existing AWS VPC integration

  • Athena Output Non-string partition columns now supported


Bug Fixes

  • Show scaling policy in the Cluster page.

  • Wurfl User Agent: fixed a bug that appeared when there was more than one wurfl file in the organization.

  • Fixed a bug that caused the metrics to stop being reported to external monitoring systems (Datadog / Influx).

  • Deprecated SPLIT, CONCAT and DATE_DIFF functions and introduced new functions:





  • Added function LN.

  • DATE_DIFF function now supports dynamic units.

  • LIKE operation now supports getting another field as a pattern.


Recently Implemented Changes (Currently Enabled)

As part of Upsolver's effort to adopt industry standards, we are gradually changing functions to be more Presto compatible. The functions that changed are CONCAT, SPLIT and DATE_DIFF.

CONCAT, SPLIT and DATE_DIFF are being deprecated. Henceforth, SQLs that attempt to use CONCAT, SPLIT and DATE_DIFF will include a warning message when executed. This behavior is designed to draw attention to the changes. Currently running outputs are NOT affected by these changes.

The change log summary:

Important: All information in this table, including planned versions and dates, is subject to change; the information is provided only as a guideline for updates you may make in the future.


Enabled by default in February, 2022

Functional Area

SQL Changes - Commands & Functions



  • Bug Fixes

    • MySQL Output: Fixed bug with boolean fields that were not written as expected.

    • Redshift Output: Fixed race condition in upsert tables that could cause rows not to get deleted in rare cases.

    • SQL:

      • Improved SQL editor responsiveness.

      • Fixed a bug in SQL parsing.

      • Fixed an exception arising when using infix operations.

      • Fixed join/match expressions not working correctly with >3 terms.

    • API:

      • Fixed an issue with distinct data sources that had the same name.

      • Prevented "SPLIT TABLE ON" on non-Athena Outputs.

      • Fixed name suggestion in hierarchical Athena outputs.

  • Enhancements

    • Azure Event Hubs: Support more features.

    • Streaming Output: Support setting an upsert key.

    • ContentTypes:

      • Support null values in TSV.

      • Support fixed width content type.

    • Oracle Object Storage: Various enhancements.

    • SQL: Support for WHERE filter in sub-select expressions.

    • S3 Data Source: Don't require AWS integration when creating S3 data source.

    • S3 Output: Support bucket-level access control.

    • UI: Added various annotations cluster graphs in the monitoring tab.


  • Enhancements

    • CSV Content Format: allows repeating header names in files.

    • Function changes: the * CONCAT function was changed to ARRAY_JOIN.

      • ARRAY_JOIN - gets an array of strings and a delimiter and concats them.

      • * CONCAT - now gets multiple arguments and concats them (like || in SQL).

  • Bug Fixes

    • Athena Output: fixed a performance issue when deleting files due to retention.

    • Clusters: Show "Additional Processing Units for Replay" only in Compute Clusters.

    • Redshift Spectrum: fixed boolean casting when running output with SELECT *

    • API: Show thrown errors from Hive Metastore.

    • SQL: Fixed a bug when join with sub-query.


  • Enhancements

    • Support dynamic position in ELEMENT_AT function.

    • Allow updating the boot script in Clusters.

    • Support fixed schema in S3 outputs with Avro format.

  • Bug Fixes

    • Fixed a bug when reading from multiple topics in Kafka Data Source.

    • API - Fixed column name suggester when mapping new fields in Athena Output.


  • Bug Fixes

    • API

      • Fixed a bug with Azure Integration not working in some regions

      • Fixed validation when updating Columns Retention in Hive Metastore outputs

      • Data Source Page: don't show statistics from the preview when querying on a time range without data

      • Show output's fields on outputs with SELECT *

    • SQL

      • Prevent SQL regeneration when updating duplicate handling (APPEND ON DUPLICATE or REPLACE ON DUPLICATE)

      • Added some validation errors when trying to create invalid state

    • Backend

      • Fixed a bug that caused duplicated rows when editing Hive Metastore output with upserts


  • Enhancements

    • Monitoring Reporters: Support Graphite

    • Hive Metastore Output: support splitting the output by schemas/databases in addition to splitting by table names. For example, if the value of the multi table field is "", the "foo" will be the schema/database name, and "bar" will be the table name

  • Bug Fixes

    • S3 Data Sources Advanced: Fixed a bug with Glob File Name pattern

    • Hive Metastore Output: save storage by deleting manifest files after their usage


  • Enhancements

    • Athena output: create Views with Glue API

  • Bug Fixes

    • Don't show completed dependencies in Lineage tab

    • Select * in Hive Metastore Output

      • Return the defined fields first

      • Removed the multi table column from the view definitions

    • Hive Metastore Output: fixed a bug when editing output with upserts

    • API: Allow changing the cluster size on Trial plans


  • Enhancements

    • Added new modal and new SQL syntax for Table Name Suffix Field, which allow you to create multi tables in Hive Metastore output with a single output.

    • CDC Data source (MySQL) - added Destination part that allows replicating the source database to your data lake

    • Qubole Metastore: allow changing the time partition column type to String

  • Bug Fixes

    • Fixed health check parameters in Query clusters

    • Don't show deleting data sources in the main page

    • Hive Metastore output: added a cache layer in the Partition Manager that prevents redundant calls to the Metastore

    • API: Limit number of running previews. This should fix high CPU usage of the API when many previews are running in the same time.


  • Enhancements

    • Support Select * in Redshift Spectrum

    • API: Support Select * and Upserts on Preview

    • Lookup Table: when running Output with a lookup to a Lookup Table, don't calculate the start/end times of the Lookup Table implicitly but use the original times.

  • Bug Fixes

    • SAML: Don't regenerate group when changing display name in Upsolver

    • Athena Output: fixed bug in Columns Retention

    • API: Fixed a bug that caused deleted inputs to not work

    • Snowflake Output: fixed columns casing

    • Removed "errors" outputs from outputs with Parquet format (Athena/S3)


  • Enhancements

    • CDC ingestion is more stable when scaling cluster

    • Previewing outputs now considers the upsert definition of it

    • Compactions are now prioritized by urgency and age in order to prevent starvation

    • Support epoch time date pattern with prefixes in Cloud Storage Data Sources

  • Bug Fixes

    • Fixed database name validation in Microsoft SQL Server Connection


  • Enhancements

    • HiveMetastoreClient: Better SET LOCATION method


  • Enhancements

    • Elasticsearch Output: Support Upsert Keys

    • CDC: Support Column Exclude List

    • Added SHA512 and SHA3_512 functions

  • Bug Fixes

    • S3 Connection with SQS now works with paths that ends with slash


  • Enhancements

    • Added FROM_UNIXTIME function

    • Qubole Output: added an option to support changing column types

    • Hive Metastore Outputs: trigger more than one compaction if there is a backlog

    • Upsolver Output: support new field type: JSON. This type will be extracted when using as an Upsolver Data Source

    • CSV Content Format: support custom quote escape char

    • When duplicating output, copy the workspaces from the previous output

  • Bug Fixes

    • Fixed memory leak in External Hive Metastore outputs


  • Enhancements

    • Added External Hive Metastore to the output types list

    • Support SELECT * on External Hive Metastore when querying with PrestoDB and SparkSQL

    • Reference Data can now be deleted after output is not using it (i.e. output deleted or output completed and was edited)

    • Reference Data can't be created with the same name as another Reference data or Lookup table


  • Enhancements

    • Kafka Output - Allow ignoring messages that are too large (According to broker settings and producer settings)

    • Streaming Data Sources (Kafka, Kinesis, EventHubs) - Allow deleting offsets metadata files

    • API - Performance enhancements when updating Outputs / Lookup Tables

  • Bug Fixes

    • Hive Metastore: Fixed bug with SELECT *


  • Features

    • Support MAX/MIN aggregations on more data types

    • Support <,<=,>,>= on timestamps


  • Features

    • Support SELECT * in Hive Metastore Outputs, this will update the table definition every time a new field arrives

    • Oracle Object Storage Support

  • Bug Fixes

    • Aggregation calculated fields now works in SQL mode


  • Features

    • CDC (Capture Data Change) Data Sources

    • Dremio and PrestoDB Outputs

    • Stop/Start Data Sources

  • Enhancements

    • Allow setting Lazy Load on Lookup Tables using the Properties tab

    • Update base AMI image in AWS to Amazon Linux 2

  • Bug Fixes

    • Data Lake Output: Filter out partitions that were deleted due to retention compaction


  • Features

    • Hive Metastore: Allow creating an Output to External Hive Metastore

  • Enhancements

    • Lower latencies between dependencies in Compute Cluster


  • Features

    • Ahana Output

    • Starburst Output

  • Enhancements

    • Redshift: Allow inserting 'now' into date / time fields in order to set a column to the insertion time

  • Bug Fixes

    • Kinesis Stream Autocomplete filter out Upsolver Internal Streams

    • Fixed bug in S3 IAM policy generation with slash in end of path

    • Avro Schema Registry: Don't treat HTTP errors as parse errors

    • SQL Parser: Don't regenerate the SQL when there is an expression that returns boolean with extra parentheses


  • Support Real Time Kafka Output - Support running Kafka Outputs on the Real Time cluster with ms latency

  • Hive Metastore Output with Upserts - fixed a bug that caused the compaction process to get stuck after edit

  • Hive Metastore Output with Upserts - support number as an upsert key

  • Lookup Tables: fixed a bug when using sharded lookup tables in outputs

  • API: show the current capacity when clicking Update Capacity button on Clusters page

  • API: fixed wrong validation on Kafka Outputs (support numbers on topic names)

  • Microsoft SQL Server Output: fixed create statement when primary key is empty

  • API: fixed a bug when removing mapping of fields


  • S3 Data Source with Parquet Content Format - split files by 200MB

  • Lookup Table - support compaction shards on lookup tables with multiple windows

  • SQL - fixed a bug generating the SQL when "Is Delete Field" is mapped to a column


  • Monitoring: Added three metrics to Hive Metastore Outputs

    • partitions-delay - The delay between now and the last partition time

    • data-loading-delay - The delay on loading data to the metastore

    • partitions-count - Number of partitions in the table

  • IS_DUPLICATE and Lookup from Data Sources: Don't omit key columns for new versions

  • Avro: Fixed escaping of [] in array namespaces

    • Fixes a bug in Snowflake Output with VARIANT column output with arrays


  • Azure: Support billing SaaS offering

  • DNS: Ability to sync Route53 records with private IP addresses for customers with own Spotinst Account

  • SSO/bugfix: attach endpoints don't have permissions

  • Partners: Support exporting logs and monitoring to external domain

  • Free Plan: Support upgrading account


  • Snowflake Output: Configurable DbDecimal

  • CSV Content Type: Don't ignore values starting with #

  • SQL: Support unmapped columns in JDBC outputs. New mapped columns will be created when deploying the output

  • Infra: performacne improvements

  • Lookup Table: fixed a bug when using Delete column

  • Singup: Create sample data source on register

  • SQL: Fixed a bug with autocomplete Lookup Table names

  • SQL: Support Lookup time

  • Athena Output: Fixed a bug with editing Athena Output when Upsert Partition Fields is true



  • JDBC Data Sources: Fixed an issue that could cause it to get stuck and not read any data

  • JDBC Connections: Fixed an issue that would allow connections to be created with a concurrency of 0

  • Monitoring: Include the actual time an index is ready to be read form in the monitoring delay charts *

  • Allow using anonymous credentials to access data in public S3 bucketsA

  • AppFlow: Autocomplete buckets and flow names during setup

  • Functions:

    • Added a Subtract Time Zone Feature to complement Add Time Zone

  • UI:

    • Show SQL Errors when deploying Outputs

    • Show indicative error message when Reference Data file couldn't be found


  • Deployment: Allow deploy Upsolver servers to Azure

  • Add support for Azure EventsHub data source

  • Athena: Create Glue database if doesn't exist

  • Functions: Fixed a bug in TO_DATE function

  • Function: Added new function: RECORD_TO_JSON

  • Query Cluster: Improvements in the underlying files cache

  • SQL: Show validation error when mapping an array to unrelated path

  • SQL: Show validation error when mapping null without specifying type

  • API: When creating data source, fixed a bug when previewing large file with tar compression

  • API: Fixed high CPU on boot


  • Kafka data source: support reading custom kafka headers

  • Metastore Ouptut: support running Athena/Qubole output without partitioning by time

  • Snowflake Output: support Azure storage as the intermediate storage

  • Compute Cluster Infra: optimize threads when running low priority tasks

  • ETL: Improved target path inference for some scenarios

  • Monitoring Task: fixed failure when one of the monitoring reporters is not avaiable

  • SQL: Fixed validation of inline functions in aggregations

  • Metastore Output: set the table location to the root path of the output

  • Qubole: allow defining if TIMESTAMP fields will be created as TIMESTAMP or BIGINT columns in the table per output

  • Qubole: Added feature flag to deprecate the "SET hive.on.master=?" statement

  • Elasticsearch Output: Fixed a bug that could cause high memory usage


  • Add Amazon AppFlow support

  • Zip Function- Added optional field names

  • Api - Fixed validation message for Kafka input

  • Elastic Search - upgraded client version


  • S3 Data source with Parquet Content Format - when the file is not a parquet file, handle it as a parse error

  • Added Free plan

  • SQL - Fixed a duplication issue when function target name and select target name are the same

  • Hive Metastore Output with Upsert keys - Trigger compactions in a better way to avoid compacting in a loop

  • SQL - Fixed target path inferrence of key columns with inline functions on aggregated outputs

  • API - Allow setting higher number of shards in the output than number of execution parallelism in the data source. This will parallel the data by the data source files

  • Support "SELECT * " in cloud storage outputs with parquet content format

  • API - Fixed a bug that allowed creating more than one draft in the same output


  • Show number of sparse fields inside fields tree in inputs and outputs and allow to toggle the filter


  • Jdbc data source: use field types from the table definition

  • PostgreSQL output: support timestamptz data type

  • UI: New modal when adding multiple fields in tabular outputs to prevent cartesian product between unrelated arrays

  • No need to specify a target field for filters when creating a filter from the UI

  • Some bug fixes in API

2020/09/30 - SNAPSHOT

  • Query Agent - Support round robin

  • "No Local API" page - Show "Connection Established" instead of error when able to connect

  • Input creation preview - Filter big JSONs and let the user know about it


  • Performance improvements in internal cache mechanism

  • Performance improvements in Hive Metastore outputs Raw Blame

  • Fixed bug that caused Hive Metastore outputs with upserts to stuck after editing a new version

  • Avro w/ Schema Registry Content Format: Support Tagged Avro Schema Registry

  • Improved target path calculation of inline functions

  • Added validation when deploying a draft that the start time is not after the end time of the previous version

  • SQL: Disable automatic column name generation

  • Support cancelling pending integration


  • No Local API Page: Fixed showing "You can't connect" instead of "local DNS resolve" error

  • CloudFormation: link to the right region in deploy stack

  • Less API Calls to Cloud Storage in order to check completion of tasks

  • Calculated Function TO_DATE: Changed threshold to not return negative dates

  • Fixed bug with PostgreSQL outputs not allowing to alter the column types


  • Support Workspaces in Clusters

  • Catch all errors from GCP / Azure and show in UI

  • Hive Metastore Outputs: the column names year, month, day, and hour are now reserved


  • Big performance improvements for replay in Kinesis & Kafka Data Sources

  • Big performance improvements for replay in Hive Metastore Outputs


  • Compute Cluster: IO Tasks will now run only on Master cluster and will never run on Replay Cluster

  • Compute Cluster: Option to limit number of Elastic IPs allocated for the cluster

  • Added XX_HASH and SORT_BY calculated functions

  • UI : Support literal inputs in aggregations


  • Performance improvements to Hive Metastore Outputs

  • Fixed bug with very large parquet file outputs used to make servers crash on OOM

  • Preview Output will now stop after 15 seconds instead of making the API server hang

  • Support Redshift and PostgreSQL in JDBC Data Source

  • UI: Output - New Partitions Modal


  • SQL now supports target site inference, this fixes a lot of confusing bug when using arrays with calculated functions

  • SQL: Fixed bug with throwing 500 errors on missing properties of calculated functions

  • Athena Output: new outputs will not nest compaction files for better compatibility support with external systems


  • Fixed bug when previewing completed Output with Lookups

  • Update Retention validation message is now dismissible

  • Regex and Split Content Formats have been added for better compatibility with custom data formats


  • MS SQL Server Output

  • Elasticsearch Output: Removed index_type argument, using _doc / doc by default

  • UI: overhauled the properties pages

  • UI: Split field statistics by Data Source in Output page


  • JSON_TO_RECORD calculated functinon: Allow whitespace in CSV mapping definition and improve exception handling

  • Athena Output: Faster replays when run compactions is set to false

  • Less red notification errors due to internal errors

  • Aggregated Outputs now delete the intermediate aggregations immediately after outputing the data (instead of waiting to the retention period, if defined)


  • MySQL Output: Fixed bug with quote followed by delimiter char inside the data to output

  • Create Calculated Function: Fixed a bug with the default output path calculation

  • JDBC Data Source now supports creating new tables instead of only inserting data to existing tables

  • JDBC Connections: indicative validation error messages on creation


  • PostgreSQL Output

  • Writing logs to Customer Bucket now supports writing to specific path in the customer's bucket

  • SQL: Show indicative error when trying to filter subquery

  • MySQL Output: Fixed writing of date/time fields

  • UI: Refined the time range picker

  • New boolean operators and calculated functions: AND, OR, NOT, and IS DISTINCT FROM now works like in SQL

  • UI: Calculated Functions Gallery now matches to the SQL syntax


  • Redshift: Support configuring

  • Added TO_DATE calculated function (converts strings to dates without having to insert format)

  • Added APPROX_COUNT_DISTINCT_EACH aggregation

  • IAM Role Credentials: Assume role via the Server Role created in the AWS Integration

  • Booting a Cluster after stopping it for a while is faster

  • SQL: infer null type instead of asking the user to explicit insert the type of the null (null:string)

  • SQS: Allow configuring KMS key

  • UI: Fixes to "Add Lookup to Data Source" page

  • S3: Show the right action on access error


  • UI: Charts now shows shared crosshair between graphs

  • "Update Shards" error message is now more informative

  • Added deployment support to more AWS regions

  • Fixed rare case where AWS Redshift Output would duplicate data

  • Fixed bug where multiple rows with the same Upsert Key would insert in the same output interval in Snowflake and Redshift Upsert Outputs


  • Git Integration: Don't cancel git integration after one failure to push changes

  • UI now allows operating aggregated outputs without key columns (Aggregate all data within the output interval)

  • UI: Refinments in the Fields Tree

  • Snowflake Output: Better replay performance with sparse Data Sources


  • Fixed bug with REPLACE calculated function could throw errors in some cases


  • Added RPAD, LPAD, STRPOS, DATE_ADD, and DATE_DIFF calculated functions

  • Private API now uses r5 instead of r4 instances in AWS by default

  • SQL: Better error messages for inline features


  • The "Archive" operation has removed from the System, Deleted items can be seen using the "Trash" button in the list view

  • JDBC Data Source: Support Start Time

  • Multiple Bug Fixes in Snowflake Output

  • Added DATE_TRUNC calculated function

  • Fixed bug with copying big files in S3

  • UI Performance enhancements


  • Update Configuration of Upsert Outputs using the UI

  • Allow writing logs from Upsolver to Customer requested location as well as Upsolver

  • Reduced dramatically the number of API class to Cloud Storage


  • Performance improvements and bug fixes


  • Data Sources:

    • JDBC: Added support for connecting to an Oracle DB

    • Bug fix for event type statistics breakdown in local APIs

    • Performance and cost improvements


  • Revised output preview screen

  • Minor bug fixes and improvements


  • Data Sources:

    • S3 Over SQS: Allow creating Data Sources from multiple connections with the same prefix

  • Outputs:

    • Added output to Snowflake

    • Monitoring improvements


  • Data Sources:

    • Added properties to Upsolver data source.

    • Kafka: Added support for custom consumer/producer properties.

  • Outputs:

    • UI improvements in sources fields tree

    • Kafka: Added support for custom consumer/producer properties.

  • Monitoring Repots:

    • Added Splunk export support


  • UI updates and performance improvements


  • Data Sources:

    • Split meta-data by Event Type field - you are now able to split and view your data source by the desired field in your data source.

  • Outputs:

    • SELECT * is supported for Upsolver and Elasticsearch outputs.

    • Added Amazon Kinesis connector.

    • Qubole connector now supports using an HTTPs proxy address to override the endpoint used to access Qubole.

  • IAM:

    • Added support for SAML with provisioning capabilities.


  • Clusters:

    • Compute cluster monitoring: Compute Units Graph was updated and now provides a breakdown of the compute units used by each task (Data Source/Output/Lookup Table).


  • Outputs:

    • Elasticsearch: Editing the connection string is now supported - as long as the new nodes belong to the same cluster.

    • Elasticsearch: Added support for setting the event to _doc.


  • Transform with SQL:

    • Added support for partitioning configuration.

    • Casting improvements.

  • Outputs:

    • Redshift: Added support for configuring fail on write error. If enabled, any error while copying data to Redshift will cause the entire bulk to be skipped. The skipped manifest will be saved aside for manual re-processing once the copy error has been fixed. If disabled the same behavior will occur after 100K errors (The max allowed by Redshift).

  • Monitoring Reporting:

    • A bug caused false reported delay (in rare cases) was fixed.


  • Data Sources:

    • JDBC Data Source - added support for PostgreSQL.


  • Outputs:

    • Added UUID Generator Calculated Function

  • Transform with SQL:

    • Added support for SQL comments using -- (see example below)

    • Improved error messages

SELECT your_Select_clause -- your comment
FROM your_table -- another comment


  • Data Sources:

    • Parquet reader: support INT96 timestamps and non-canonical field names

    • Added support for LZO decompression

    • Added a JDBC connector

  • Outputs:

    • Support correcting a specific time frame in an output

    • Added UpdateSql programmatic API operation for creating outputs


  • IAML

    • Multi-organization support

  • Outputs:

    • Support lazy load of lookup tables

    • Support querying lookup table in SQL

    • Support sharding of aggregated outputs

  • Data Sources:

    • Support S3 data source initial load configuration

    • Support non-lexicographic date patterns in S3

    • UI & performance improvements


  • Data Sources:

    • Support XML as content type


  • Performance improvements and bug fixes



  • Outputs:

    • Elasticsearch - Add option not to delete indices from Elasticsearch based on retention

  • Transform with SQL:

    • Support data source features

  • UI:

    • Outputs - Add support for filtering the Preview when in SQL mode

    • Performance improvements


  • Data Sources:

    • Support changing the number of shards using increments of one (instead of multiplies of two)

  • Outputs:

    • Athena - add support for excluding partitions from the table

  • Transform with SQL:

    • Support default field names instead of col_x

    • Generate SQL for running Outputs

    • Refer to fields by index in the GROUP BY statement


  • UI improvements and bug fixes


  • Outputs:

    • Add support for Redshift Spectrum

    • Update table schema in Qubole is now optional (the default behavior would be to update)


  • Outputs:

    • Allow switching between raw and aggregated modes

    • Added QUERY_STRING_TO_RECORD calculated function for query string extractions

  • Transform with SQL:

    • Unify SQL code blocks into a single block


  • Athena Upserts: Update and delete existing data in your Data Lake

  • Transform with SQL:

    • Support having statement in Aggregated Outputs

    • Support DECIMAL types

    • Support Athena Upserts

  • S3 Output: JSON files will end with one "\n" instead of two "\n" (as stated in


  • When deploying an output, "Now" is resolved when submitting the form

  • Connections and Clusters can be attached to Workspaces

  • IAM: Lists of Data Sources, Outputs, Lookup Tables, Connections and Clusters are filtered by the user "list" permission


  • UI improvements

    • Fixed bug on lookup to COLLECT_SET_EACH column

  • Stability improvements


  • Allow changing default organization connection

  • Added decimal support to Athena Outputs

  • Allow turning off/on compactions in Athena Outputs

  • Better support for Data Sources with large amounts of fields

  • Notebook (Beta)





  • Various Performance Improvements in UI

  • Added ZIP Calculated Function to ZIP between multiple arrays

  • MySQL Output: Row is replaced if duplicate key is found

  • Notebook (Beta)

like / not like syntax (e.g. “name” like ‘a__%’)

not in syntax (e.g. “status” not in (“failed”, “canceled”))

= as equality operator syntax (e.g. “status” = ‘ok’ instead of “status” == ‘ok’)

  • Better error messages


  • Lookup Tables / API Playground

  • Support querying multiple rows

  • Auto complete for keys

  • Querying on specific time range

  • Notebook (Beta): a better way to create enrichments


  • Calculated Functions: Added numeric in feature (e.g. “data.a”:number in (1,2,3))

  • Parse Avro data using Confluent Schema Registry


  • Various Performance Improvements in UI

  • Show connection errors when creating/editing MySQL/Redshift Output

  • Fixed intermittent recoverable errors in tasks

  • Fixed delay when using the same connection for multiple Redshift/Elasticsearch Outputs


  • Experimental: updating / deleting rows in output to Athena, you can try it out by using the “Upsert Key” and “Is Delete Field” special fields


  • Ingestion - Added “index” header to all messages (useful when ingesting multiple events in one message)

  • Hive Metastore Outputs now drops duplicate logical partitions

  • API - list Output / Materialized Views returns faster

  • GDPR - Materialized Views now supports deleting rows

  • Physical Deletion runs much faster with fewer operations on the underlying Cloud Storage

  • Retention is now set on Materialized Views created by DEDUP features


  • Data Source - Simplified creation of Kafka, Kinesis and AWS S3 Data Sources


  • Replay Cluster - Fixes some cases where the replay cluster might not shut down


  • Qubole Client - set hive.on.master and use database for all queries

  • Performance improvements for retention

  • Elasticsearch Output - Better retry mechanism


  • Athena - Switch to using Glue API for all DDL statements

  • Monitoring Tab - fix bug that would display some rows twice

  • Outputs page - Correct the range of some of the graphs

  • Add timeout to copy/read S3 requests to prevent processing delays

  • Data Source - show a preview of data immediately upon creation

  • Improve UI performance related to connections page


  • Dry run environment support

  • Monitoring - added written items and written bytes

  • Monitoring - added original-task-name tag to all metrics

  • Qubole - set hive.on.master=false

  • Permissions - added policy editor

  • Athena - reduce spam of Athena history

  • Athena - drop table when deleting an output if the option is selected

  • Kafka - support changing the number of shards in the UI

  • Some performance improvements

  • UI - Added multi-unmap fields (for Avishai)


  • Increase Kafka consumer version to 2.1.1

  • Monitor delay in managing partitions

  • Bug fix - add connection timeout to ElasticSearch connections

  • Remove dependency on Upsolver DynamoDB for servers starting up



  • Data Sources / Materialized Views / Outputs: Toggle between card view and table view


  • Translate Calculated Function: Show CSV Editor for the dictionary field

  • Cluster Details Page: show the elastic IPs of the Cluster

  • Outputs: Qubole Output

  • Outputs: Usability Improvements in Creation/Deploy flow

  • Upsolver Language: "data.str":string in ('a','b','c') syntax

  • Upsolver Language: supports coalesce operator

"data.str":string? # COALESCE("data.str":string, '')

"data.str":string?'default-value' # COALESCE("data.str":string, 'default-value')

"data.bool":boolean? # COALESCE("data.bool":boolean, false)

"data.bool":boolean?true # COALESCE("data.bool":boolean, true)

"data.number":number? # COALESCE("data.number":number, 0)

"data.number":number?2.5 # COALESCE("data.number":number, 2.5)


  • Output / Materialized Views: Added ability to edit the Data Sources from the properties tab (Only if the object isn't deployed yet)


  • Aggregated Output: Added option to add calculated fields over aggregations


  • Compute Cluster: Allow to spin up "Replay" Cluster when needed

  • Outputs: Edit S3 and Upsolver Outputs

  • Filters: Improved UX (Whitelist and Blacklist Filters)

  • Materialized Views: Time Series Aggregations are shown as graphs in the Data Sample tab


  • Materialized Views: Added an API to iterate the MVs

  • Added Time Zone Offset Function

  • Outputs: Added automatic time field to Athena and Upsolver outputs

  • Calculated Fields: Support editing of calculated fields inputs and parameters

  • Users can now create readonly S3 Connections

  • Athena Output now supports setting of event time which is used for partitioning

  • Elasticsearch Output now supports retention

  • Various performance improvements to UI

  • Support filtering on time range in Data Source inspection page

  • Support for editing lookup enrichments

  • Monitoring now shows Materialized Views that are used in Lookup enrichments

  • Improvements to Auto Scaling

  • Support non string Key Columns in Materialized Views

  • Aggregated output doesn't change the type of the Key Columns to string anymore

Last updated