Change log

Change log for Upsolver Classic (app.upsolver.com)

Release notes for sqlake.upsolver.com may be found here.

2024.05.19-13.23

Enhancements

  • Help widget adjustments (size and location have changed to prevent hiding application data)

Bug Fixes

  • Fixed Snapshotting of tables sometimes being stuck in CDC

  • Fixed a bug causing Amazon S3 outputs to be stuck on retention deletion when the output has multiple versions

  • Minor bug fixes

2024.05.12-10.26

Enhancements

  • Upgraded libraries to include recent security patches to enhance system security and stability

Bug Fixes

  • Minor bug fixes

2024.05.09-08.33

Enhancements

  • Upgraded libraries to include recent security patches to enhance system security and stability

Bug fixes

  • Minor bug fixes

2024.05.02-15.37

Enhancements

  • Libraries upgrade to include security patches

Bug Fixes

  • Minor bug fixes

2024.04.25-12.36

Bug Fixes

  • Run connection validation tasks on servers with elastic ips when elastic ips are enabled

  • MongoDB CDC:

    • Corrected the parsing of Decimal types to Double

    • Resolved errors encountered when replicating collections containing fields with types Regex, Min Key, and Max Key

2024.04.16-12.06

Enhancements

  • Upgraded the Snowflake driver to 3.15.0

Bug Fixes

  • Minor bug fixes

2024.04.04-09.33

Enhancements

  • For new entities, use the updated Parquet list structure (parquet.avro.write-old-list-structure = false) when writing Parquet files to S3 and Upsolver Tables

Bug Fixes

  • Fixed a bug that could skip data when reading from CDC sources

  • Minor bug fixes

2024.03.27-07.52

Bug Fixes

  • Minor bug fixes

2024.03.10-10.52

Bug Fixes

  • Minor bug fixes

2024.03.05-11.08

Enhancements

Bug Fixes

  • Minor bug fixes

2024.02.25-14.23

Bug Fixes

  • Minor bug fixes

2024.02.18-12.36

Bug Fixes

  • Minor bug fixes

2024.02.13-09.06

Bug Fixes

  • Minor bug fixes

  • Please note that in addition to the updates mentioned in this release note, it encompasses all enhancements and changes from previous versions.

2024.02.08-09.59

Bug Fixes

  • Minor bug fixes

2024.01.16-08.45

Enhancements

  • Performance Improvement in Transformations of Outputs / Lookup Tables

  • Allow downloading the outputs list grid as CSV

Bug Fixes

  • Apache Kafka Jobs:

    • Fixed new Kafka ingestion/data sources stalling when reading from the start in certain situations

2024.01.09-14.59

Bug Fixes

  • Minor bug fixes

2024.01.02-13.40

Bug Fixes

  • S3 Data Source: Resolved a race condition that could lead to duplicated ingestion of the same file in scenarios where an S3 data source is used with a date pattern that does not follow lexicographical order

  • Minor Bug Fixes

2023.12.25-09.02

Bug Fixes

  • Minor Bug Fixes

2023.12.11-01.58

Bug Fixes

  • Minor Bug Fixes

2023.12.06-11.31

Enhancements

  • When using Avro Schema Registry content format with Debezium, support parsing JSON type

Bug Fixes

  • Minor bug fixes

2023.11.29-01.59

Bug Fixes

  • Minor bug fixes

2023.11.27-01.59

  • Minor bug fixes

2023.11.21-01.59

Enhancements

  • Added support for r7 instance types in Compute Clusters

Bug Fixes

  • Fixed an issue preventing users from being able to create compute clusters

  • Fixed the replay cluster not being shut down in some situations

2023.11.19-08.51

Enhancements

  • New compactions metrics: Compaction delay, Number of files in WAL

Bug Fixes

  • Minor bug fixes

2023.11.14-13.42

Enhancements

  • S3 With SQS: Limit the size of bulk reads from SQS to ensure data is distributed evenly

Bug Fixes

  • Minor bug fixes

2023.11.07-12.30

Bug Fixes

  • Fixed some rare instances where a broken JDBC Data Source could interfere with other task executions

  • Fixed a bug causing loading of all tasks to fail if a task was created with a start time way in the future

2023.10.30-11.38

Bug Fixes

  • Minor bug fixes

2023.10.25-08.55

Bug Fixes

  • Minor bug fixes

2023.10.17-17.45

Bug Fixes

  • Fixed an issue causing CDC inputs to get stuck if a table failed to snapshot

2023.10.11-11.26

Bug Fixes

  • Minor bug fixes

2023.10.04-10.37

Bug Fixes

  • Kafka/Kinesis output: Fixed an issue that could cause events to arrive out of order when changing the number of shards in the output properties.

2023.09.13-12.16

Enhancements

  • Snowflake Output: changed the intermediate format from Avro to JSON. This change improves performance when writing to Snowflake and fixes an issue where writing to a column of type VARIANT with sub-fields that contain special chars in the field name

Bug Fixes

  • Minor bug fixes

  • Performance improvements when writing Parquet files

2023.09.05-11.12

Enhancements

  • PostgreSQL CDC:

    • Tables that aren't included in the publication will not be part of the snapshot

  • Support added for il-central-1 region. This region is currently only supported with private VPC deployments

  • Elasticsearch Jobs:

    • Write timestamp and date types as ISO-8601 strings in jobs that write to Elasticsearch

  • Reduced the number of Amazon S3 API calls to lower S3 costs

Bug Fixes

  • Minor bug fixes

2023.08.29-15.15

Enhancements

  • Write Timestamp and Date types as ISO-8601 strings in Elasticsearch output

  • Performance Improvement: Reduce the number of file operations when coordinating future table operations

Bug Fixes

  • Minor bug fixes

2023.08.17-13.30

Enhancements

  • Write Timestamp and Date types as ISO-8601 strings in string outputs, for example: Amazon S3 output with format JSON/CSV

  • Write Timestamp and Date types as ISO-8601 in RECORD_TO_JSON function

Bug Fixes

  • Performance improvements in CDC data sources

  • Minor bug fixes

2023.08.16-07.42

Enhancements

  • Improved the performance of CDC jobs reading from databases with a large number of table

  • Upgraded Avro and Parquet libraries to the latest versions

Bug Fixes

  • Fixed the SQL Parser to parse the LOG function and DECAYED_SUM aggregation

  • Minor bug fixes

2023.08.02-01.57

Bug Fixes

  • Minor bug fixes

2023.07.31-02.02

Enhancements

  • Cluster version appears in the UI on the clusters page

Bug Fixes

  • Minor bug fixes

2023.07.27-01.59

Enhancements

  • Updated snowflake JDBC driver version to 3.13.33

Bug Fixes

  • Fix UI error, "The client couldn't connect to the API cluster."

  • Minor bug fixes

2023.07.19-02.34

Bug Fixes

  • Minor bug fixes

2023.07.13-02.20

Bug Fixes

  • Minor bug fixes

2023.07.06-02.18

Bug Fixes

  • Minor bug fixes and improvements

2023.07.04-14.42

Enhancements

  • New UUID() function returns a unique identifier (UUID) string

  • Upgraded Debezium version from 2.1.3 to 2.2.1

Bug Fixes

  • Fixed the conversion of float to double to preserve the perceived semantic value in CDC sources and in data sources that get Avro or Parquet .

  • Minor bug fixes

2023.06.26-03.44

Enhancements

  • Add new headers in Data Sources: parser_shard_number and parser_row_number

Bug Fixes

  • Fixed a bug reading Avro and Parquet files that caused fields of type Date to be ignored

  • Minor bug fixes

2023.06.19-10.37

Bug Fixes

  • Fixed an issue reading from empty Kafka topics that contain empty partitions

  • Fixed a bug reading Avro files that use a named type more than once

  • Minor bug fixes

2023.06.12-08.57

Bug Fixes

  • Snowflake Merge Jobs: enforce the ON clause expression to prevent creating an array

  • Minor bug fixes

2023.06.05-11.39

Bug Fixes

  • Minor bug fixes

Enhancements

  • CDC: PostgreSQL with partitioned tables - expose data.full_partition_table_name field specifying the name of the event's original partition

  • UI Performance improvements

2023.05.28-18.43

Bug Fixes

  • CASE WHEN now handles NULL as input and returns the ELSE value

  • CDC: Fixed the bug that caused the ingestion of a decimal type column to be converted to binary base64 string

Enhancements

  • [BREAKING CHANGE] GET_SHARD_NUMBER function no longer requires arguments

2023.05.17-13.45

Bug Fixes

  • Minor bug fixes

Enhancements

  • Validate that the first parameter in an ARRAY_JOIN is not a literal

2023.05.15-02.23

Bug Fixes

  • Parquet Files are now distributed more evenly when ingesting data from Amazon S3 with high execution parallelism

Enhancements

  • Snowflake: Added query tag to queries executed by Upsolver for easier cost tracking

2023.05.04-07.39

Bug Fixes

  • Minor bug fixes

2023.04.27-07.52

Bug Fixes

  • Minor bug fixes

  • CDC PostgreSQL: Fixed a bug that caused the replication slot to be not deleted when deleting the Data Source

  • Athena Output: Filter out rows when the partition field value is an empty string (Partition cannot be an empty string)

2023.04.18-07.10

Bug Fixes

  • Fixed an issue collecting field statistics and metadata for large data files with a large number of unique field names

Enhancements

  • Delete intermediate files after copy to Redshift

2023.03.26-19.27

Bug Fixes

  • JDBC Data Source: Fixed a bug that would not close the JDBC connection in some situations when using fullLoadInterval

  • AvroRegistry content type: Support URL encoded authentication information

  • Snowflake: Support keeping old values on partial updates

  • Upgrade Debezium to version 2.1.3

2023.03.15-10.04

Bug Fixes

  • JDBC Outputs: delete intermediate files after being written to the database

  • Revert Debezium to version 1.4

2023.03.09-13.48

Bug Fixes

  • JDBC Outputs: delete intermediate files after being written to the database

2023.02.26-15.40

Bug Fixes

  • Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka

Enhancements

  • Upgrade Debezium to V2.1.2

  • Add Debezium version header

  • Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API

  • Memory allocation optimizations in Lookup Table Query servers

2023.02.19-15.15

Bug Fixes

  • Fixed memory leak on Elasticsearch outputs

  • Minor bug fixes

2023.02.12-15.57

Bug Fixes

  • Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures

  • Fixed an issue where discovering a new partition / topic without any messages would cause Kafka / Kinesis Data Sources to hang until a message arrived.

  • Fixed an issue when creating a Kafka Data Source over high number of topics would cause CPU spike in the API

Enhancements

  • Use regional STS endpoints if available

2023.02.07-10.02

  • Bug Fixes

    • Minor bug fixes

2023.01.31-15.19

  • Bug Fixes

    • Fixed bucket region detection when using an Amazon S3 Private VPC endpoint

    • API: Fixed a bug that cause to fail to Run new output with Lookup when using full history snapshot

    • Fixed an edge case that could cause data loss when editing stopped Athena output

  • Enhancements

    • Outputs: support window size override in non aggregated outputs

2023.01.22-16.42

  • Bug Fixes

    • Monitoring: Fixed the 'operation_name' of aggregation steps to be the original 'operation_name' instead of "Output Aggregation". This means metrics reported via Monitoring Reports will now show aggregation step information under the correct 'operation_name'

    • Unsynchronized data sources no longer fail if they can't construct their consumers

2023.01.16-14.02

  • Bug Fixes

    • SQL: Improved error messages and auto completion

  • Enhancements

    • Performance and memory improvements

2023.01.10-20.52

  • Bug Fixes:

    • API: Prevent changing the end execution time for old output versions

      • API: Added validation to prevent creating Cloud Storage outputs with a date format that is not refined enough to include the Output Interval

      • Improved performance of Python UDF validations when uploading a new UDF

      • Fixed slow replay progress for Snowflake and PostgreSQL outputs

2023.01.03-339

  • Enhancements

    • Added RAND function and added overload to RANDOM function that gets no arguments and returns a value between 0 and 1

2022

2022.12.29-325

  • BREAKING CHANGE

    • Hive Metastore (Athena) Output: When using SELECT * with partition fields, if there is a field in the source that mapped to the partition field column, the field won't be written to the parquet files because this value can't be queried

  • Enhancements

    • Kafka Data Sources: Support unsynced mode, which allows the stream to continue processing even when there are errors or a backlog from the topic

    • Add presto-compliant RANDOM() and RAND() functions

    • We now support clusters that mix both Intel-based (e.g. r6i, r6a) and ARM-based types (e.g. r6g) within the same Elastigroup

  • Bug Fixes

    • Fix deadlock between the indexing task and index entry deletion task that could end up waiting for each other when modifying an Athena output's data

    • When deploying clusters to a region, we now filter out instance types that don't exist in that region

    • Hive Metastore (Athena) Output: not calculate statistics of rows that were filtered out due to missing partition field value Previously, if a row was filtered out because the partition field value was missing or null, the rows counted in Output Fields statistics and in Events over Time graphs.

    • Improved recovery mechanism when our configuration database is unavailable

2022.12.18-235

  • Enhancements

    • Avoid out of order per key in Kinesis outputs by sending the same key only once within the same PutRecords request.

    • Improved performance of server boot time and memory usage for organizations that use high number of shards.

2022.12.15-638

  • Bug Fixes

    • Fixed stack overflow in JDBC Data Source in some cases

    • API: Fixed a bug on generating SQL statement when the SQL is not sync with output's definition

  • Enhancements

    • Minor performance improvements in data processing critical path

    • Improved performance of servers boot and periodic configuration load, this might improve reliability and performance of data flow for organizations and clusters that have a lot of processing entities

2022.12.05-201

  • Bug Fixes

    • Fixed a bug where Compactions would stop working when advancing the "End Execution At" property of the Hive Metastore Output after it has arrived (now > End Execution At).

    • API: Added validation to prevent creating connections with empty names

    • Minor Performance Enhancements

2022.11.29-164

  • Bug Fixes

    • API: Fixed an issue that caused the SQL statement to be invalid after changing data source of an output

    • Fixed an issue when mapping numeric field to an upsert column of type string in JDBC outputs (Redshift, Snowflake, ...)

    • Fixed a rare bug where an internal metadata index would stop progressing, preventing compactions from occurring.

  • Enhancements

    • The Elasticsearch client version was upgraded from 6.x to 7.x in order to also support Elasticsearch 7 & 8 as output targets

    • Performance enhancements for clusters with a lot of tasks (more to come in the future)

    • Snowflake Output: Support writing to Transient Tables

    • Kafka Data Sources: Added an option to restart reading partition when the end offset of that partition is larger than the last offset read by the Data Source for the same partition. This should allow users to reset partitions.

2022.11.17-118

  • Bug Fixes

    • Hive Metastore Output: performance improvements on calculating partition compaction trigger

    • Fixed a bug where Outputs with IS_DUPLICATE with big window sizes wouldn't be considered as completed

    • Fixed a bug where Outputs that depend on an Upsolver Output would run with a Runtime Delay based on the maximum Runtime Delay of all the versions of the Upsolver Output, the new behaviour will skip completed versions

    • Upsolver Query (Table output) was visible in the UI. This will now only be available via SQLake.

    • [BREAKING CHANGE] Simple S3 Data Source: changed the value of the time field to be the beginning of the minute instead of the end of the minute. This change will be applied only on new data sources

  • Enhancements

    • More informative errors when missing access to S3 resources

2022.11.09-61

  • Bug Fixes:

    • API: Fixed being able to create a Kafka input with an invalid storage connection

    • API: ModifyServerFile changeset now adds file if not exists

  • New Features:

    • Compression: Add ZStandard

    • Redshift Output: Support authentication with IAM

    • Redshift Output: Support Super type

    • Roles Anywhere - Hide internal access/secret keys for SoC2

  • Enhancements

    • Upgraded Kafka Client to Version 3.2.0

    • Upgraded Redshift to Version 2.1.0.9

    • Improved the reliability of the connection between User Clusters and the Configuration Database

    • Performance improvements in the Compaction Coordinator in Athena Outputs

    • Improve error messages.

    • Enlarged maximum number of shards, output shards and compaction shards in outputs to 512

2022/Mar/22

Bug Fixes

  • Simple Cloud Storage Input: Improvements to file discovery

Enhancements

  • Athena Outputs: Enabled partition column types other than string

  • Performance improvements

2022/Mar/21

Changes in this Release:

  • API: The return value of shards and related fields changed from number to struct. The struct contains executionParallelism which represents the old number. Customers using API endpoints related to data sources, lookup tables or outputs may need to update their code. Please contact our support for details.

Bug Fixes

  • SQL Fixed the code-generation of the decimal type

  • Compute Cluster Fixed a bug that would cause the Compute cluster, in rare cases, not to update against the configuration.

  • Monitoring Fixed a bug that would not expose reporting tags to external monitors.

  • API Fixed a bug that prevented the rolling of clusters.

  • API Fixed a race condition that prevented multiple concurrent requests to work and returned Forbidden (403).

  • Snowflake Output Fixed a bug when writing values to DATE columns.

  • CDC Fixed a bug that failed to write data which was larger than 2GB.

Enhancements

  • Functions Added the PRESTO_ZIP function

  • Python Allow using URLLIB

  • CDC Improved the binlog delay monitor

  • AWS VPC integration Validated subnet ids in Existing AWS VPC integration

  • Athena Output Non-string partition columns now supported

2022/Feb/20

Bug Fixes

  • Show scaling policy in the Cluster page.

  • Wurfl User Agent: fixed a bug that appeared when there was more than one wurfl file in the organization.

  • Fixed a bug that caused the metrics to stop being reported to external monitoring systems (Datadog / Influx).

  • Deprecated SPLIT, CONCAT and DATE_DIFF functions and introduced new functions:

- SPLIT: SPLIT_DELIMITER_FIRST & PRESTO_SPLIT

- CONCAT: ARRAY_JOIN & PRESTO_CONCAT

- DATE_DIFF: DATE_DIFF_PRECISE & PRESTO_DATE_DIFF

Enhancements

  • Added function LN.

  • DATE_DIFF function now supports dynamic units.

  • LIKE operation now supports getting another field as a pattern.

2022/Feb/08 ANNOUNCEMENT

Recently Implemented Changes (Currently Enabled)

As part of Upsolver's effort to adopt industry standards, we are gradually changing functions to be more Presto compatible. The functions that changed are CONCAT, SPLIT and DATE_DIFF.

CONCAT, SPLIT and DATE_DIFF are being deprecated. Henceforth, SQLs that attempt to use CONCAT, SPLIT and DATE_DIFF will include a warning message when executed. This behavior is designed to draw attention to the changes. Currently running outputs are NOT affected by these changes.

The change log summary:

Important: All information in this table, including planned versions and dates, is subject to change; the information is provided only as a guideline for updates you may make in the future.

Schedule

Enabled by default in February, 2022

Functional Area

SQL Changes - Commands & Functions

2021

2021/10/05

  • Bug Fixes

    • MySQL Output: Fixed bug with boolean fields that were not written as expected.

    • Redshift Output: Fixed race condition in upsert tables that could cause rows not to get deleted in rare cases.

    • SQL:

      • Improved SQL editor responsiveness.

      • Fixed a bug in SQL parsing.

      • Fixed an exception arising when using infix operations.

      • Fixed join/match expressions not working correctly with >3 terms.

    • API:

      • Fixed an issue with distinct data sources that had the same name.

      • Prevented "SPLIT TABLE ON" on non-Athena Outputs.

      • Fixed name suggestion in hierarchical Athena outputs.

  • Enhancements

    • Azure Event Hubs: Support more features.

    • Streaming Output: Support setting an upsert key.

    • ContentTypes:

      • Support null values in TSV.

      • Support fixed width content type.

    • Oracle Object Storage: Various enhancements.

    • SQL: Support for WHERE filter in sub-select expressions.

    • S3 Data Source: Don't require AWS integration when creating S3 data source.

    • S3 Output: Support bucket-level access control.

    • UI: Added various annotations cluster graphs in the monitoring tab.

2021/08/09

  • Enhancements

    • CSV Content Format: allows repeating header names in files.

    • Function changes: the * CONCAT function was changed to ARRAY_JOIN.

      • ARRAY_JOIN - gets an array of strings and a delimiter and concats them.

      • * CONCAT - now gets multiple arguments and concats them (like || in SQL).

  • Bug Fixes

    • Athena Output: fixed a performance issue when deleting files due to retention.

    • Clusters: Show "Additional Processing Units for Replay" only in Compute Clusters.

    • Redshift Spectrum: fixed boolean casting when running output with SELECT *

    • API: Show thrown errors from Hive Metastore.

    • SQL: Fixed a bug when join with sub-query.

2021/08/02

  • Enhancements

    • Support dynamic position in ELEMENT_AT function.

    • Allow updating the boot script in Clusters.

    • Support fixed schema in S3 outputs with Avro format.

  • Bug Fixes

    • Fixed a bug when reading from multiple topics in Kafka Data Source.

    • API - Fixed column name suggester when mapping new fields in Athena Output.

2021/07/19

  • Bug Fixes

    • API

      • Fixed a bug with Azure Integration not working in some regions

      • Fixed validation when updating Columns Retention in Hive Metastore outputs

      • Data Source Page: don't show statistics from the preview when querying on a time range without data

      • Show output's fields on outputs with SELECT *

    • SQL

      • Prevent SQL regeneration when updating duplicate handling (APPEND ON DUPLICATE or REPLACE ON DUPLICATE)

      • Added some validation errors when trying to create invalid state

    • Backend

      • Fixed a bug that caused duplicated rows when editing Hive Metastore output with upserts

2021/07/11

  • Enhancements

    • Monitoring Reporters: Support Graphite

    • Hive Metastore Output: support splitting the output by schemas/databases in addition to splitting by table names. For example, if the value of the multi table field is "foo.bar", the "foo" will be the schema/database name, and "bar" will be the table name

  • Bug Fixes

    • S3 Data Sources Advanced: Fixed a bug with Glob File Name pattern

    • Hive Metastore Output: save storage by deleting manifest files after their usage

2021/07/05

  • Enhancements

    • Athena output: create Views with Glue API

  • Bug Fixes

    • Don't show completed dependencies in Lineage tab

    • Select * in Hive Metastore Output

      • Return the defined fields first

      • Removed the multi table column from the view definitions

    • Hive Metastore Output: fixed a bug when editing output with upserts

    • API: Allow changing the cluster size on Trial plans

2021/06/28

  • Enhancements

    • Added new modal and new SQL syntax for Table Name Suffix Field, which allow you to create multi tables in Hive Metastore output with a single output.

    • CDC Data source (MySQL) - added Destination part that allows replicating the source database to your data lake

    • Qubole Metastore: allow changing the time partition column type to String

  • Bug Fixes

    • Fixed health check parameters in Query clusters

    • Don't show deleting data sources in the main page

    • Hive Metastore output: added a cache layer in the Partition Manager that prevents redundant calls to the Metastore

    • API: Limit number of running previews. This should fix high CPU usage of the API when many previews are running in the same time.

2021/06/21

  • Enhancements

    • Support Select * in Redshift Spectrum

    • API: Support Select * and Upserts on Preview

    • Lookup Table: when running Output with a lookup to a Lookup Table, don't calculate the start/end times of the Lookup Table implicitly but use the original times.

  • Bug Fixes

    • SAML: Don't regenerate group when changing display name in Upsolver

    • Athena Output: fixed bug in Columns Retention

    • API: Fixed a bug that caused deleted inputs to not work

    • Snowflake Output: fixed columns casing

    • Removed "errors" outputs from outputs with Parquet format (Athena/S3)

2021/06/13

  • Enhancements

    • CDC ingestion is more stable when scaling cluster

    • Previewing outputs now considers the upsert definition of it

    • Compactions are now prioritized by urgency and age in order to prevent starvation

    • Support epoch time date pattern with prefixes in Cloud Storage Data Sources

  • Bug Fixes

    • Fixed database name validation in Microsoft SQL Server Connection

2021/06/07

  • Enhancements

    • HiveMetastoreClient: Better SET LOCATION method

2021/05/31

  • Enhancements

    • Elasticsearch Output: Support Upsert Keys

    • CDC: Support Column Exclude List

    • Added SHA512 and SHA3_512 functions

  • Bug Fixes

    • S3 Connection with SQS now works with paths that ends with slash

2021/05/24

  • Enhancements

    • Added FROM_UNIXTIME function

    • Qubole Output: added an option to support changing column types

    • Hive Metastore Outputs: trigger more than one compaction if there is a backlog

    • Upsolver Output: support new field type: JSON. This type will be extracted when using as an Upsolver Data Source

    • CSV Content Format: support custom quote escape char

    • When duplicating output, copy the workspaces from the previous output

  • Bug Fixes

    • Fixed memory leak in External Hive Metastore outputs

2021/05/12

  • Enhancements

    • Added External Hive Metastore to the output types list

    • Support SELECT * on External Hive Metastore when querying with PrestoDB and SparkSQL

    • Reference Data can now be deleted after output is not using it (i.e. output deleted or output completed and was edited)

    • Reference Data can't be created with the same name as another Reference data or Lookup table

2021/05/04

  • Enhancements

    • Kafka Output - Allow ignoring messages that are too large (According to broker settings and producer settings)

    • Streaming Data Sources (Kafka, Kinesis, EventHubs) - Allow deleting offsets metadata files

    • API - Performance enhancements when updating Outputs / Lookup Tables

  • Bug Fixes

    • Hive Metastore: Fixed bug with SELECT *

2021/05/03

  • Features

    • Support MAX/MIN aggregations on more data types

    • Support <,<=,>,>= on timestamps

2021/04/18

  • Features

    • Support SELECT * in Hive Metastore Outputs, this will update the table definition every time a new field arrives

    • Oracle Object Storage Support

  • Bug Fixes

    • Aggregation calculated fields now works in SQL mode

2021/04/04

  • Features

    • CDC (Capture Data Change) Data Sources

    • Dremio and PrestoDB Outputs

    • Stop/Start Data Sources

  • Enhancements

    • Allow setting Lazy Load on Lookup Tables using the Properties tab

    • Update base AMI image in AWS to Amazon Linux 2

  • Bug Fixes

    • Data Lake Output: Filter out partitions that were deleted due to retention compaction

2021/03/28

  • Features

    • Hive Metastore: Allow creating an Output to External Hive Metastore

  • Enhancements

    • Lower latencies between dependencies in Compute Cluster

2021/03/21

  • Features

    • Ahana Output

    • Starburst Output

  • Enhancements

    • Redshift: Allow inserting 'now' into date / time fields in order to set a column to the insertion time

  • Bug Fixes

    • Kinesis Stream Autocomplete filter out Upsolver Internal Streams

    • Fixed bug in S3 IAM policy generation with slash in end of path

    • Avro Schema Registry: Don't treat HTTP errors as parse errors

    • SQL Parser: Don't regenerate the SQL when there is an expression that returns boolean with extra parentheses

2021/03/14

  • Support Real Time Kafka Output - Support running Kafka Outputs on the Real Time cluster with ms latency

  • Hive Metastore Output with Upserts - fixed a bug that caused the compaction process to get stuck after edit

  • Hive Metastore Output with Upserts - support number as an upsert key

  • Lookup Tables: fixed a bug when using sharded lookup tables in outputs

  • API: show the current capacity when clicking Update Capacity button on Clusters page

  • API: fixed wrong validation on Kafka Outputs (support numbers on topic names)

  • Microsoft SQL Server Output: fixed create statement when primary key is empty

  • API: fixed a bug when removing mapping of fields

2021/03/07

  • S3 Data Source with Parquet Content Format - split files by 200MB

  • Lookup Table - support compaction shards on lookup tables with multiple windows

  • SQL - fixed a bug generating the SQL when "Is Delete Field" is mapped to a column

2021/03/01

  • Monitoring: Added three metrics to Hive Metastore Outputs

    • partitions-delay - The delay between now and the last partition time

    • data-loading-delay - The delay on loading data to the metastore

    • partitions-count - Number of partitions in the table

  • IS_DUPLICATE and Lookup from Data Sources: Don't omit key columns for new versions

  • Avro: Fixed escaping of [] in array namespaces

    • Fixes a bug in Snowflake Output with VARIANT column output with arrays

2021/02/23

  • Azure: Support billing SaaS offering

  • DNS: Ability to sync Route53 records with private IP addresses for customers with own Spotinst Account

  • SSO/bugfix: attach endpoints don't have permissions

  • Partners: Support exporting logs and monitoring to external domain

  • Free Plan: Support upgrading account

2021/01/04

  • Snowflake Output: Configurable DbDecimal

  • CSV Content Type: Don't ignore values starting with #

  • SQL: Support unmapped columns in JDBC outputs. New mapped columns will be created when deploying the output

  • Infra: performacne improvements

  • Lookup Table: fixed a bug when using Delete column

  • Singup: Create sample data source on register

  • SQL: Fixed a bug with autocomplete Lookup Table names

  • SQL: Support Lookup time

  • Athena Output: Fixed a bug with editing Athena Output when Upsert Partition Fields is true

2020

2020/12/08

  • JDBC Data Sources: Fixed an issue that could cause it to get stuck and not read any data

  • JDBC Connections: Fixed an issue that would allow connections to be created with a concurrency of 0

  • Monitoring: Include the actual time an index is ready to be read form in the monitoring delay charts *

  • Allow using anonymous credentials to access data in public S3 bucketsA

  • AppFlow: Autocomplete buckets and flow names during setup

  • Functions:

    • Added a Subtract Time Zone Feature to complement Add Time Zone

  • UI:

    • Show SQL Errors when deploying Outputs

    • Show indicative error message when Reference Data file couldn't be found

2020/11/01

  • Deployment: Allow deploy Upsolver servers to Azure

  • Add support for Azure EventsHub data source

  • Athena: Create Glue database if doesn't exist

  • Functions: Fixed a bug in TO_DATE function

  • Function: Added new function: RECORD_TO_JSON

  • Query Cluster: Improvements in the underlying files cache

  • SQL: Show validation error when mapping an array to unrelated path

  • SQL: Show validation error when mapping null without specifying type

  • API: When creating data source, fixed a bug when previewing large file with tar compression

  • API: Fixed high CPU on boot

2020/10/21

  • Kafka data source: support reading custom kafka headers

  • Metastore Ouptut: support running Athena/Qubole output without partitioning by time

  • Snowflake Output: support Azure storage as the intermediate storage

  • Compute Cluster Infra: optimize threads when running low priority tasks

  • ETL: Improved target path inference for some scenarios

  • Monitoring Task: fixed failure when one of the monitoring reporters is not avaiable

  • SQL: Fixed validation of inline functions in aggregations

  • Metastore Output: set the table location to the root path of the output

  • Qubole: allow defining if TIMESTAMP fields will be created as TIMESTAMP or BIGINT columns in the table per output

  • Qubole: Added feature flag to deprecate the "SET hive.on.master=?" statement

  • Elasticsearch Output: Fixed a bug that could cause high memory usage

2020/10/15

  • Add Amazon AppFlow support

  • Zip Function- Added optional field names

  • Api - Fixed validation message for Kafka input

  • Elastic Search - upgraded client version

2020/10/13

  • S3 Data source with Parquet Content Format - when the file is not a parquet file, handle it as a parse error

  • Added Free plan

  • SQL - Fixed a duplication issue when function target name and select target name are the same

  • Hive Metastore Output with Upsert keys - Trigger compactions in a better way to avoid compacting in a loop

  • SQL - Fixed target path inferrence of key columns with inline functions on aggregated outputs

  • API - Allow setting higher number of shards in the output than number of execution parallelism in the data source. This will parallel the data by the data source files

  • Support "SELECT * " in cloud storage outputs with parquet content format

  • API - Fixed a bug that allowed creating more than one draft in the same output

2020/10/05

  • Show number of sparse fields inside fields tree in inputs and outputs and allow to toggle the filter

2020/10/01

  • Jdbc data source: use field types from the table definition

  • PostgreSQL output: support timestamptz data type

  • UI: New modal when adding multiple fields in tabular outputs to prevent cartesian product between unrelated arrays

  • No need to specify a target field for filters when creating a filter from the UI

  • Some bug fixes in API

2020/09/30 - SNAPSHOT

  • Query Agent - Support round robin

  • "No Local API" page - Show "Connection Established" instead of error when able to connect

  • Input creation preview - Filter big JSONs and let the user know about it

2020/09/23

  • Performance improvements in internal cache mechanism

  • Performance improvements in Hive Metastore outputs Raw Blame

  • Fixed bug that caused Hive Metastore outputs with upserts to stuck after editing a new version

  • Avro w/ Schema Registry Content Format: Support Tagged Avro Schema Registry

  • Improved target path calculation of inline functions

  • Added validation when deploying a draft that the start time is not after the end time of the previous version

  • SQL: Disable automatic column name generation

  • Support cancelling pending integration

2020/09/14

  • No Local API Page: Fixed showing "You can't connect" instead of "local DNS resolve" error

  • CloudFormation: link to the right region in deploy stack

  • Less API Calls to Cloud Storage in order to check completion of tasks

  • Calculated Function TO_DATE: Changed threshold to not return negative dates

  • Fixed bug with PostgreSQL outputs not allowing to alter the column types

2020/09/07

  • Support Workspaces in Clusters

  • Catch all errors from GCP / Azure and show in UI

  • Hive Metastore Outputs: the column names year, month, day, and hour are now reserved

2020/08/31

  • Big performance improvements for replay in Kinesis & Kafka Data Sources

  • Big performance improvements for replay in Hive Metastore Outputs

2020/08/24

  • Compute Cluster: IO Tasks will now run only on Master cluster and will never run on Replay Cluster

  • Compute Cluster: Option to limit number of Elastic IPs allocated for the cluster

  • Added XX_HASH and SORT_BY calculated functions

  • UI : Support literal inputs in aggregations

2020/08/17

  • Performance improvements to Hive Metastore Outputs

  • Fixed bug with very large parquet file outputs used to make servers crash on OOM

  • Preview Output will now stop after 15 seconds instead of making the API server hang

  • Support Redshift and PostgreSQL in JDBC Data Source

  • UI: Output - New Partitions Modal

2020/08/10

  • SQL now supports target site inference, this fixes a lot of confusing bug when using arrays with calculated functions

  • SQL: Fixed bug with throwing 500 errors on missing properties of calculated functions

  • Athena Output: new outputs will not nest compaction files for better compatibility support with external systems

2020/08/03

  • Fixed bug when previewing completed Output with Lookups

  • Update Retention validation message is now dismissible

  • Regex and Split Content Formats have been added for better compatibility with custom data formats

2020/07/27

  • MS SQL Server Output

  • Elasticsearch Output: Removed index_type argument, using _doc / doc by default

  • UI: overhauled the properties pages

  • UI: Split field statistics by Data Source in Output page

2020/07/20

  • JSON_TO_RECORD calculated functinon: Allow whitespace in CSV mapping definition and improve exception handling

  • Athena Output: Faster replays when run compactions is set to false

  • Less red notification errors due to internal errors

  • Aggregated Outputs now delete the intermediate aggregations immediately after outputing the data (instead of waiting to the retention period, if defined)

2020/07/13

  • MySQL Output: Fixed bug with quote followed by delimiter char inside the data to output

  • Create Calculated Function: Fixed a bug with the default output path calculation

  • JDBC Data Source now supports creating new tables instead of only inserting data to existing tables

  • JDBC Connections: indicative validation error messages on creation

2020/07/06

  • PostgreSQL Output

  • Writing logs to Customer Bucket now supports writing to specific path in the customer's bucket

  • SQL: Show indicative error when trying to filter subquery

  • MySQL Output: Fixed writing of date/time fields

  • UI: Refined the time range picker

  • New boolean operators and calculated functions: AND, OR, NOT, and IS DISTINCT FROM now works like in SQL

  • UI: Calculated Functions Gallery now matches to the SQL syntax

2020/06/29

  • Redshift: Support configuring

  • Added TO_DATE calculated function (converts strings to dates without having to insert format)

  • Added APPROX_COUNT_DISTINCT_EACH aggregation

  • IAM Role Credentials: Assume role via the Server Role created in the AWS Integration

  • Booting a Cluster after stopping it for a while is faster

  • SQL: infer null type instead of asking the user to explicit insert the type of the null (null:string)

  • SQS: Allow configuring KMS key

  • UI: Fixes to "Add Lookup to Data Source" page

  • S3: Show the right action on access error

2020/06/22

  • UI: Charts now shows shared crosshair between graphs

  • "Update Shards" error message is now more informative

  • Added deployment support to more AWS regions

  • Fixed rare case where AWS Redshift Output would duplicate data

  • Fixed bug where multiple rows with the same Upsert Key would insert in the same output interval in Snowflake and Redshift Upsert Outputs

2020/06/15

  • Git Integration: Don't cancel git integration after one failure to push changes

  • UI now allows operating aggregated outputs without key columns (Aggregate all data within the output interval)

  • UI: Refinments in the Fields Tree

  • Snowflake Output: Better replay performance with sparse Data Sources

  • Added EXTRACT, MILLISECOND, SECOND, MINUTE, HOUR, DAY, DAY_OF_MONTH, DAY_OF_WEEK, DAY_OF_YEAR, WEEK, MONTH, QUARTER, YEAR, and YEAR_OF_WEEK date extraction calculated functions

  • Fixed bug with REPLACE calculated function could throw errors in some cases

2020/06/08

  • Added RPAD, LPAD, STRPOS, DATE_ADD, and DATE_DIFF calculated functions

  • Private API now uses r5 instead of r4 instances in AWS by default

  • SQL: Better error messages for inline features

2020/06/01

  • The "Archive" operation has removed from the System, Deleted items can be seen using the "Trash" button in the list view

  • JDBC Data Source: Support Start Time

  • Multiple Bug Fixes in Snowflake Output

  • Added DATE_TRUNC calculated function

  • Fixed bug with copying big files in S3

  • UI Performance enhancements

2020/05/25

  • Update Configuration of Upsert Outputs using the UI

  • Allow writing logs from Upsolver to Customer requested location as well as Upsolver

  • Reduced dramatically the number of API class to Cloud Storage

2020/05/18

  • Performance improvements and bug fixes

2020/05/04

  • Data Sources:

    • JDBC: Added support for connecting to an Oracle DB

    • Bug fix for event type statistics breakdown in local APIs

    • Performance and cost improvements

2020/04/27

  • Revised output preview screen

  • Minor bug fixes and improvements

2020/04/20

  • Data Sources:

    • S3 Over SQS: Allow creating Data Sources from multiple connections with the same prefix

  • Outputs:

    • Added output to Snowflake

    • Monitoring improvements

2020/04/06

  • Data Sources:

    • Added properties to Upsolver data source.

    • Kafka: Added support for custom consumer/producer properties.

  • Outputs:

    • UI improvements in sources fields tree

    • Kafka: Added support for custom consumer/producer properties.

  • Monitoring Repots:

    • Added Splunk export support

2020/03/30

  • UI updates and performance improvements

2020/03/16

  • Data Sources:

    • Split meta-data by Event Type field - you are now able to split and view your data source by the desired field in your data source.

  • Outputs:

    • SELECT * is supported for Upsolver and Elasticsearch outputs.

    • Added Amazon Kinesis connector.

    • Qubole connector now supports using an HTTPs proxy address to override the endpoint used to access Qubole.

  • IAM:

    • Added support for SAML with provisioning capabilities.

2020/03/09

  • Clusters:

    • Compute cluster monitoring: Compute Units Graph was updated and now provides a breakdown of the compute units used by each task (Data Source/Output/Lookup Table).

2020/02/24

  • Outputs:

    • Elasticsearch: Editing the connection string is now supported - as long as the new nodes belong to the same cluster.

    • Elasticsearch: Added support for setting the event to _doc.

2020/02/18

  • Transform with SQL:

    • Added support for partitioning configuration.

    • Casting improvements.

  • Outputs:

    • Redshift: Added support for configuring fail on write error. If enabled, any error while copying data to Redshift will cause the entire bulk to be skipped. The skipped manifest will be saved aside for manual re-processing once the copy error has been fixed. If disabled the same behavior will occur after 100K errors (The max allowed by Redshift).

  • Monitoring Reporting:

    • A bug caused false reported delay (in rare cases) was fixed.

2020/02/10

  • Data Sources:

    • JDBC Data Source - added support for PostgreSQL.

2020/02/05

  • Outputs:

    • Added UUID Generator Calculated Function

  • Transform with SQL:

    • Added support for SQL comments using -- (see example below)

    • Improved error messages

SELECT your_Select_clause -- your comment
FROM your_table -- another comment

2020/01/30

  • Data Sources:

    • Parquet reader: support INT96 timestamps and non-canonical field names

    • Added support for LZO decompression

    • Added a JDBC connector

  • Outputs:

    • Support correcting a specific time frame in an output

    • Added UpdateSql programmatic API operation for creating outputs

2020/01/22

  • IAML

    • Multi-organization support

  • Outputs:

    • Support lazy load of lookup tables

    • Support querying lookup table in SQL

    • Support sharding of aggregated outputs

  • Data Sources:

    • Support S3 data source initial load configuration

    • Support non-lexicographic date patterns in S3

    • UI & performance improvements

2020/01/13

  • Data Sources:

    • Support XML as content type

2020/01/06

  • Performance improvements and bug fixes

2019

2019/12/22

  • Outputs:

    • Elasticsearch - Add option not to delete indices from Elasticsearch based on retention

  • Transform with SQL:

    • Support data source features

  • UI:

    • Outputs - Add support for filtering the Preview when in SQL mode

    • Performance improvements

2019/12/16

  • Data Sources:

    • Support changing the number of shards using increments of one (instead of multiplies of two)

  • Outputs:

    • Athena - add support for excluding partitions from the table

  • Transform with SQL:

    • Support default field names instead of col_x

    • Generate SQL for running Outputs

    • Refer to fields by index in the GROUP BY statement

2019/12/09

  • UI improvements and bug fixes

2019/12/03

  • Outputs:

    • Add support for Redshift Spectrum

    • Update table schema in Qubole is now optional (the default behavior would be to update)

2019/11/17

  • Outputs:

    • Allow switching between raw and aggregated modes

    • Added QUERY_STRING_TO_RECORD calculated function for query string extractions

  • Transform with SQL:

    • Unify SQL code blocks into a single block

2019/11/10

  • Athena Upserts: Update and delete existing data in your Data Lake

  • Transform with SQL:

    • Support having statement in Aggregated Outputs

    • Support DECIMAL types

    • Support Athena Upserts

  • S3 Output: JSON files will end with one "\n" instead of two "\n" (as stated in jsonlines.org)

2019/10/31

  • When deploying an output, "Now" is resolved when submitting the form

  • Connections and Clusters can be attached to Workspaces

  • IAM: Lists of Data Sources, Outputs, Lookup Tables, Connections and Clusters are filtered by the user "list" permission

2019/09/23

  • UI improvements

    • Fixed bug on lookup to COLLECT_SET_EACH column

  • Stability improvements

2019/09/02

  • Allow changing default organization connection

  • Added decimal support to Athena Outputs

  • Allow turning off/on compactions in Athena Outputs

  • Better support for Data Sources with large amounts of fields

  • Notebook (Beta)

JOIN

GROUP BY

HAVING

2019/07/08

  • Various Performance Improvements in UI

  • Added ZIP Calculated Function to ZIP between multiple arrays

  • MySQL Output: Row is replaced if duplicate key is found

  • Notebook (Beta)

like / not like syntax (e.g. “name” like ‘a__%’)

not in syntax (e.g. “status” not in (“failed”, “canceled”))

= as equality operator syntax (e.g. “status” = ‘ok’ instead of “status” == ‘ok’)

  • Better error messages

2019/06/24

  • Lookup Tables / API Playground

  • Support querying multiple rows

  • Auto complete for keys

  • Querying on specific time range

  • Notebook (Beta): a better way to create enrichments

2019/06/17

  • Calculated Functions: Added numeric in feature (e.g. “data.a”:number in (1,2,3))

  • Parse Avro data using Confluent Schema Registry

2019/06/03

  • Various Performance Improvements in UI

  • Show connection errors when creating/editing MySQL/Redshift Output

  • Fixed intermittent recoverable errors in tasks

  • Fixed delay when using the same connection for multiple Redshift/Elasticsearch Outputs

2019/05/26

  • Experimental: updating / deleting rows in output to Athena, you can try it out by using the “Upsert Key” and “Is Delete Field” special fields

2019/05/19

  • Ingestion - Added “index” header to all messages (useful when ingesting multiple events in one message)

  • Hive Metastore Outputs now drops duplicate logical partitions

  • API - list Output / Materialized Views returns faster

  • GDPR - Materialized Views now supports deleting rows

  • Physical Deletion runs much faster with fewer operations on the underlying Cloud Storage

  • Retention is now set on Materialized Views created by DEDUP features

2019/05/14

  • Data Source - Simplified creation of Kafka, Kinesis and AWS S3 Data Sources

2019/05/13

  • Replay Cluster - Fixes some cases where the replay cluster might not shut down

2019/05/06

  • Qubole Client - set hive.on.master and use database for all queries

  • Performance improvements for retention

  • Elasticsearch Output - Better retry mechanism

2019/04/22

  • Athena - Switch to using Glue API for all DDL statements

  • Monitoring Tab - fix bug that would display some rows twice

  • Outputs page - Correct the range of some of the graphs

  • Add timeout to copy/read S3 requests to prevent processing delays

  • Data Source - show a preview of data immediately upon creation

  • Improve UI performance related to connections page

2019/04/08

  • Dry run environment support

  • Monitoring - added written items and written bytes

  • Monitoring - added original-task-name tag to all metrics

  • Qubole - set hive.on.master=false

  • Permissions - added policy editor

  • Athena - reduce spam of Athena history

  • Athena - drop table when deleting an output if the option is selected

  • Kafka - support changing the number of shards in the UI

  • Some performance improvements

  • UI - Added multi-unmap fields (for Avishai)

2019/04/01

  • Increase Kafka consumer version to 2.1.1

  • Monitor delay in managing partitions

  • Bug fix - add connection timeout to ElasticSearch connections

  • Remove dependency on Upsolver DynamoDB for servers starting up

2018

2018/11/15

  • Data Sources / Materialized Views / Outputs: Toggle between card view and table view

2018/11/14

  • Translate Calculated Function: Show CSV Editor for the dictionary field

  • Cluster Details Page: show the elastic IPs of the Cluster

  • Outputs: Qubole Output

  • Outputs: Usability Improvements in Creation/Deploy flow

  • Upsolver Language: "data.str":string in ('a','b','c') syntax

  • Upsolver Language: supports coalesce operator

"data.str":string? # COALESCE("data.str":string, '')

"data.str":string?'default-value' # COALESCE("data.str":string, 'default-value')

"data.bool":boolean? # COALESCE("data.bool":boolean, false)

"data.bool":boolean?true # COALESCE("data.bool":boolean, true)

"data.number":number? # COALESCE("data.number":number, 0)

"data.number":number?2.5 # COALESCE("data.number":number, 2.5)

2018/06/26

  • Output / Materialized Views: Added ability to edit the Data Sources from the properties tab (Only if the object isn't deployed yet)

2018/06/18

  • Aggregated Output: Added option to add calculated fields over aggregations

2018/06/17

  • Compute Cluster: Allow to spin up "Replay" Cluster when needed

  • Outputs: Edit S3 and Upsolver Outputs

  • Filters: Improved UX (Whitelist and Blacklist Filters)

  • Materialized Views: Time Series Aggregations are shown as graphs in the Data Sample tab

2018/03/01

  • Materialized Views: Added an API to iterate the MVs

  • Added Time Zone Offset Function

  • Outputs: Added automatic time field to Athena and Upsolver outputs

  • Calculated Fields: Support editing of calculated fields inputs and parameters

  • Users can now create readonly S3 Connections

  • Athena Output now supports setting of event time which is used for partitioning

  • Elasticsearch Output now supports retention

  • Various performance improvements to UI

  • Support filtering on time range in Data Source inspection page

  • Support for editing lookup enrichments

  • Monitoring now shows Materialized Views that are used in Lookup enrichments

  • Improvements to Auto Scaling

  • Support non string Key Columns in Materialized Views

  • Aggregated output doesn't change the type of the Key Columns to string anymore

Last updated