Change log
Change log for Upsolver Classic (app.upsolver.com)
Release notes for sqlake.upsolver.com may be found here.
2025.01.13-09.11
Enhancements
Support setting custom properties for tables and views in Hive metastore connection
Bug fixes
Minor bug fixes
2025.01.08-07.52
Bug fixes
Minor bug fixes
2024.11.28-08.49
Bug fixes
Minor bug fixes
2024.11.14-07.22
Enhancements
Upgraded libraries to include recent security patches, enhancing system security and stability
Bug fixes
Minor bug fixes
2024.11.05-11.18
Enhancements
Upgraded libraries to include recent security patches, enhancing system security and stability
Bug fixes
Minor bug fixes
2024.10.07-11.42
Bug fixes
Minor bug fixes
2024.09.26-09.41
Bug fixes
Minor bug fixes
2024.09.18-13.03
Bug fixes
Minor bug fixes
2024.09.18-09.11
Bug fixes
Minor bug fixes
2024.09.10-07.45
Bug fixes
Minor bug fixes
2024.09.01-09.29
Bug fixes
Minor bug fixes
2024.08.26-09.46
Bug fixes
Minor bug fixes
2024.08.19-12.07
Bug Fixes
Minor bug fixes
2024.08.15-10.03
Enhancements
Use JSON as the intermediate format when writing to Redshift
Bug Fixes
Fixed a rare case where some task executions would stop running until the server is restarted
Fixed a bug where outputs with more shards than the data source failed to distribute rows across shards when data correction was applied evenly
2024.08.07-11.23
Enhancements
Athena Output:
Added support for using Field Transforms on columns with an array of primitive types
Bug Fixes
Fixes Minor bug fixes
2024.07.30-06.58
Enhancements
Reduced overhead of tasks discovery, especially in compute clusters with a lot of assigned entities or with a large number of shards
Bug Fixes
Minor bug fixes
2024.07.29-08.06
Enhancements
Athena Output:
Optimized the coordinator task to prevent long-running processes and avoid delays in the output
Performance improvements for high-loaded clusters
Bug Fixes
Minor bug fixes
2024.07.11-12.30
Bug Fixes
Minor bug fixes
2024.07.08-11.10
Bug Fixes
Fixed a bug occurring when the Primary Key is of numeric type and the table is large enough to require chunk splitting
Minor bug fixes
2024.06.30-08.33
Enhancements
Amazon S3: Reduced the number of upload file part requests by using larger parts
Performance improvement: Reduced task scheduling overhead in outputs that read from versioned Upsolver outputs
Bug Fixes
Disabled the creation of data corrections in output types that do not support it, such as JDBC Outputs
Minor UI bug fixes
2024.06.20-15.35
Enhancements
Implemented deletion of temporary intermediate files in Data Sources after their usage. This feature will be enabled gradually.
Bug Fixes
Reference Data: When an old version of an output is completed, stop loading the Reference Data file being used in this version. This prevents errors caused by deleted Reference Data files that are no longer used in the latest versions.
2024.06.10-17.20
Enhancements
Added support for shards in outputs writing to Elasticsearch
Bug Fixes
CDC Inputs: Implemented deletion of intermediate files that store the metadata for the source database
2024.06.04-11.16
Bug Fixes
Fix an issue with SAML that causes Azure to fail to parse the request
2024.06.02-15.50
Bug Fixes
Athena Output:
Optimized the performance of retention deletion
2024.05.28-11.58
Bug Fixes
Fixed an issue where 2-minute merge files were sometimes not properly deleted when deleting Materialized Views
Fixed a bug where temporary files wouldn't be deleted in the index creation task when the task failed
Minor bug fixes
2024.05.23-08.45
Enhancements
Upgraded the Amazon Redshift driver
Bug Fixes
Fixed CDC client sometimes becoming temporarily stuck after server replacement
Minor bug fixes
2024.05.19-13.23
Enhancements
Help widget adjustments (size and location have changed to prevent hiding application data)
Bug Fixes
Fixed Snapshotting of tables sometimes being stuck in CDC
Fixed a bug causing Amazon S3 outputs to be stuck on retention deletion when the output has multiple versions
Minor bug fixes
2024.05.12-10.26
Enhancements
Upgraded libraries to include recent security patches to enhance system security and stability
Bug Fixes
Minor bug fixes
2024.05.09-08.33
Enhancements
Upgraded libraries to include recent security patches to enhance system security and stability
Bug fixes
Minor bug fixes
2024.05.02-15.37
Enhancements
Libraries upgrade to include security patches
Bug Fixes
Minor bug fixes
2024.04.25-12.36
Bug Fixes
Run connection validation tasks on servers with elastic ips when elastic ips are enabled
MongoDB CDC:
Corrected the parsing of Decimal types to Double
Resolved errors encountered when replicating collections containing fields with types Regex, Min Key, and Max Key
2024.04.16-12.06
Enhancements
Upgraded the Snowflake driver to 3.15.0
Bug Fixes
Minor bug fixes
2024.04.04-09.33
Enhancements
For new entities, use the updated Parquet list structure (
parquet.avro.write-old-list-structure = false
) when writing Parquet files to S3 and Upsolver Tables
Bug Fixes
Fixed a bug that could skip data when reading from CDC sources
Minor bug fixes
2024.03.27-07.52
Bug Fixes
Minor bug fixes
2024.03.10-10.52
Bug Fixes
Minor bug fixes
2024.03.05-11.08
Enhancements
Bug Fixes
Minor bug fixes
2024.02.25-14.23
Bug Fixes
Minor bug fixes
2024.02.18-12.36
Bug Fixes
Minor bug fixes
2024.02.13-09.06
Bug Fixes
Minor bug fixes
Please note that in addition to the updates mentioned in this release note, it encompasses all enhancements and changes from previous versions.
2024.02.08-09.59
Bug Fixes
Minor bug fixes
2024.01.16-08.45
Enhancements
Performance Improvement in Transformations of Outputs / Lookup Tables
Allow downloading the outputs list grid as CSV
Bug Fixes
Apache Kafka Jobs:
Fixed new Kafka ingestion/data sources stalling when reading from the start in certain situations
2024.01.09-14.59
Bug Fixes
Minor bug fixes
2024.01.02-13.40
Bug Fixes
S3 Data Source: Resolved a race condition that could lead to duplicated ingestion of the same file in scenarios where an S3 data source is used with a date pattern that does not follow lexicographical order
Minor Bug Fixes
2023.12.25-09.02
Bug Fixes
Minor Bug Fixes
2023.12.11-01.58
Bug Fixes
Minor Bug Fixes
2023.12.06-11.31
Enhancements
When using Avro Schema Registry content format with Debezium, support parsing JSON type
Bug Fixes
Minor bug fixes
2023.11.29-01.59
Bug Fixes
Minor bug fixes
2023.11.27-01.59
Minor bug fixes
2023.11.21-01.59
Enhancements
Added support for r7 instance types in Compute Clusters
Bug Fixes
Fixed an issue preventing users from being able to create compute clusters
Fixed the replay cluster not being shut down in some situations
2023.11.19-08.51
Enhancements
New compactions metrics: Compaction delay, Number of files in WAL
Bug Fixes
Minor bug fixes
2023.11.14-13.42
Enhancements
S3 With SQS: Limit the size of bulk reads from SQS to ensure data is distributed evenly
Bug Fixes
Minor bug fixes
2023.11.07-12.30
Bug Fixes
Fixed some rare instances where a broken JDBC Data Source could interfere with other task executions
Fixed a bug causing loading of all tasks to fail if a task was created with a start time way in the future
2023.10.30-11.38
Bug Fixes
Minor bug fixes
2023.10.25-08.55
Bug Fixes
Minor bug fixes
2023.10.17-17.45
Bug Fixes
Fixed an issue causing CDC inputs to get stuck if a table failed to snapshot
2023.10.11-11.26
Bug Fixes
Minor bug fixes
2023.10.04-10.37
Bug Fixes
Kafka/Kinesis output: Fixed an issue that could cause events to arrive out of order when changing the number of shards in the output properties.
2023.09.13-12.16
Enhancements
Snowflake Output: changed the intermediate format from Avro to JSON. This change improves performance when writing to Snowflake and fixes an issue where writing to a column of type VARIANT with sub-fields that contain special chars in the field name
Bug Fixes
Minor bug fixes
Performance improvements when writing Parquet files
2023.09.05-11.12
Enhancements
PostgreSQL CDC:
Tables that aren't included in the publication will not be part of the snapshot
Support added for il-central-1 region. This region is currently only supported with private VPC deployments
Elasticsearch Jobs:
Write
timestamp
anddate
types as ISO-8601 strings in jobs that write to Elasticsearch
Reduced the number of Amazon S3 API calls to lower S3 costs
Bug Fixes
Minor bug fixes
2023.08.29-15.15
Enhancements
Write
Timestamp
andDate
types as ISO-8601 strings in Elasticsearch outputPerformance Improvement: Reduce the number of file operations when coordinating future table operations
Bug Fixes
Minor bug fixes
2023.08.17-13.30
Enhancements
Write
Timestamp
andDate
types as ISO-8601 strings in string outputs, for example: Amazon S3 output with format JSON/CSVWrite
Timestamp
andDate
types as ISO-8601 inRECORD_TO_JSON
function
Bug Fixes
Performance improvements in CDC data sources
Minor bug fixes
2023.08.16-07.42
Enhancements
Improved the performance of CDC jobs reading from databases with a large number of table
Upgraded Avro and Parquet libraries to the latest versions
Bug Fixes
Fixed the SQL Parser to parse the
LOG
function andDECAYED_SUM
aggregationMinor bug fixes
2023.08.02-01.57
Bug Fixes
Minor bug fixes
2023.07.31-02.02
Enhancements
Cluster version appears in the UI on the clusters page
Bug Fixes
Minor bug fixes
2023.07.27-01.59
Enhancements
Updated snowflake JDBC driver version to 3.13.33
Bug Fixes
Fix UI error, "The client couldn't connect to the API cluster."
Minor bug fixes
2023.07.19-02.34
Bug Fixes
Minor bug fixes
2023.07.13-02.20
Bug Fixes
Minor bug fixes
2023.07.06-02.18
Bug Fixes
Minor bug fixes and improvements
2023.07.04-14.42
Enhancements
New
UUID()
function returns a unique identifier (UUID) stringUpgraded Debezium version from 2.1.3 to 2.2.1
Bug Fixes
Fixed the conversion of float to double to preserve the perceived semantic value in CDC sources and in data sources that get Avro or Parquet .
Minor bug fixes
2023.06.26-03.44
Enhancements
Add new headers in Data Sources:
parser_shard_number
andparser_row_number
Bug Fixes
Fixed a bug reading Avro and Parquet files that caused fields of type
Date
to be ignoredMinor bug fixes
2023.06.19-10.37
Bug Fixes
Fixed an issue reading from empty Kafka topics that contain empty partitions
Fixed a bug reading Avro files that use a named type more than once
Minor bug fixes
2023.06.12-08.57
Bug Fixes
Snowflake Merge Jobs: enforce the
ON
clause expression to prevent creating an arrayMinor bug fixes
2023.06.05-11.39
Bug Fixes
Minor bug fixes
Enhancements
CDC: PostgreSQL with partitioned tables - expose
data.full_partition_table_name
field specifying the name of the event's original partitionUI Performance improvements
2023.05.28-18.43
Bug Fixes
CASE WHEN
now handlesNULL
as input and returns theELSE
valueCDC: Fixed the bug that caused the ingestion of a decimal type column to be converted to binary base64 string
Enhancements
[BREAKING CHANGE] GET_SHARD_NUMBER function no longer requires arguments
2023.05.17-13.45
Bug Fixes
Minor bug fixes
Enhancements
Validate that the first parameter in an
ARRAY_JOIN
is not a literal
2023.05.15-02.23
Bug Fixes
Parquet Files are now distributed more evenly when ingesting data from Amazon S3 with high execution parallelism
Enhancements
Snowflake: Added query tag to queries executed by Upsolver for easier cost tracking
2023.05.04-07.39
Bug Fixes
Minor bug fixes
2023.04.27-07.52
Bug Fixes
Minor bug fixes
CDC PostgreSQL: Fixed a bug that caused the replication slot to be not deleted when deleting the Data Source
Athena Output: Filter out rows when the partition field value is an empty string (Partition cannot be an empty string)
2023.04.18-07.10
Bug Fixes
Fixed an issue collecting field statistics and metadata for large data files with a large number of unique field names
Enhancements
Delete intermediate files after copy to Redshift
2023.03.26-19.27
Bug Fixes
JDBC Data Source: Fixed a bug that would not close the JDBC connection in some situations when using fullLoadInterval
AvroRegistry content type: Support URL encoded authentication information
Snowflake: Support keeping old values on partial updates
Upgrade Debezium to version 2.1.3
2023.03.15-10.04
Bug Fixes
JDBC Outputs: delete intermediate files after being written to the database
Revert Debezium to version 1.4
2023.03.09-13.48
Bug Fixes
JDBC Outputs: delete intermediate files after being written to the database
2023.02.26-15.40
Bug Fixes
Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka
Enhancements
Upgrade Debezium to V2.1.2
Add Debezium version header
Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API
Memory allocation optimizations in Lookup Table Query servers
2023.02.19-15.15
Bug Fixes
Fixed memory leak on Elasticsearch outputs
Minor bug fixes
2023.02.12-15.57
Bug Fixes
Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures
Fixed an issue where discovering a new partition / topic without any messages would cause Kafka / Kinesis Data Sources to hang until a message arrived.
Fixed an issue when creating a Kafka Data Source over high number of topics would cause CPU spike in the API
Enhancements
Use regional STS endpoints if available
2023.02.07-10.02
Bug Fixes
Minor bug fixes
2023.01.31-15.19
Bug Fixes
Fixed bucket region detection when using an Amazon S3 Private VPC endpoint
API: Fixed a bug that cause to fail to Run new output with Lookup when using full history snapshot
Fixed an edge case that could cause data loss when editing stopped Athena output
Enhancements
Outputs: support window size override in non aggregated outputs
2023.01.22-16.42
Bug Fixes
Monitoring: Fixed the 'operation_name' of aggregation steps to be the original 'operation_name' instead of "Output Aggregation". This means metrics reported via Monitoring Reports will now show aggregation step information under the correct 'operation_name'
Unsynchronized data sources no longer fail if they can't construct their consumers
2023.01.16-14.02
Bug Fixes
SQL: Improved error messages and auto completion
Enhancements
Performance and memory improvements
2023.01.10-20.52
Bug Fixes:
API: Prevent changing the end execution time for old output versions
API: Added validation to prevent creating Cloud Storage outputs with a date format that is not refined enough to include the Output Interval
Improved performance of Python UDF validations when uploading a new UDF
Fixed slow replay progress for Snowflake and PostgreSQL outputs
2023.01.03-339
Enhancements
Added RAND function and added overload to RANDOM function that gets no arguments and returns a value between 0 and 1
2022
2022.12.29-325
BREAKING CHANGE
Hive Metastore (Athena) Output: When using SELECT * with partition fields, if there is a field in the source that mapped to the partition field column, the field won't be written to the parquet files because this value can't be queried
Enhancements
Kafka Data Sources: Support unsynced mode, which allows the stream to continue processing even when there are errors or a backlog from the topic
Add presto-compliant RANDOM() and RAND() functions
We now support clusters that mix both Intel-based (e.g. r6i, r6a) and ARM-based types (e.g. r6g) within the same Elastigroup
Bug Fixes
Fix deadlock between the indexing task and index entry deletion task that could end up waiting for each other when modifying an Athena output's data
When deploying clusters to a region, we now filter out instance types that don't exist in that region
Hive Metastore (Athena) Output: not calculate statistics of rows that were filtered out due to missing partition field value Previously, if a row was filtered out because the partition field value was missing or null, the rows counted in Output Fields statistics and in Events over Time graphs.
Improved recovery mechanism when our configuration database is unavailable
2022.12.18-235
Enhancements
Avoid out of order per key in Kinesis outputs by sending the same key only once within the same PutRecords request.
Improved performance of server boot time and memory usage for organizations that use high number of shards.
2022.12.15-638
Bug Fixes
Fixed stack overflow in JDBC Data Source in some cases
API: Fixed a bug on generating SQL statement when the SQL is not sync with output's definition
Enhancements
Minor performance improvements in data processing critical path
Improved performance of servers boot and periodic configuration load, this might improve reliability and performance of data flow for organizations and clusters that have a lot of processing entities
2022.12.05-201
Bug Fixes
Fixed a bug where Compactions would stop working when advancing the "End Execution At" property of the Hive Metastore Output after it has arrived (now > End Execution At).
API: Added validation to prevent creating connections with empty names
Minor Performance Enhancements
2022.11.29-164
Bug Fixes
API: Fixed an issue that caused the SQL statement to be invalid after changing data source of an output
Fixed an issue when mapping numeric field to an upsert column of type string in JDBC outputs (Redshift, Snowflake, ...)
Fixed a rare bug where an internal metadata index would stop progressing, preventing compactions from occurring.
Enhancements
The Elasticsearch client version was upgraded from 6.x to 7.x in order to also support Elasticsearch 7 & 8 as output targets
Performance enhancements for clusters with a lot of tasks (more to come in the future)
Snowflake Output: Support writing to Transient Tables
Kafka Data Sources: Added an option to restart reading partition when the end offset of that partition is larger than the last offset read by the Data Source for the same partition. This should allow users to reset partitions.
2022.11.17-118
Bug Fixes
Hive Metastore Output: performance improvements on calculating partition compaction trigger
Fixed a bug where Outputs with IS_DUPLICATE with big window sizes wouldn't be considered as completed
Fixed a bug where Outputs that depend on an Upsolver Output would run with a Runtime Delay based on the maximum Runtime Delay of all the versions of the Upsolver Output, the new behaviour will skip completed versions
Upsolver Query (Table output) was visible in the UI. This will now only be available via SQLake.
[BREAKING CHANGE] Simple S3 Data Source: changed the value of the time field to be the beginning of the minute instead of the end of the minute. This change will be applied only on new data sources
Enhancements
More informative errors when missing access to S3 resources
2022.11.09-61
Bug Fixes:
API: Fixed being able to create a Kafka input with an invalid storage connection
API: ModifyServerFile changeset now adds file if not exists
New Features:
Compression: Add ZStandard
Redshift Output: Support authentication with IAM
Redshift Output: Support Super type
Roles Anywhere - Hide internal access/secret keys for SoC2
Enhancements
Upgraded Kafka Client to Version 3.2.0
Upgraded Redshift to Version 2.1.0.9
Improved the reliability of the connection between User Clusters and the Configuration Database
Performance improvements in the Compaction Coordinator in Athena Outputs
Improve error messages.
Enlarged maximum number of shards, output shards and compaction shards in outputs to 512
2022/Mar/22
Bug Fixes
Simple Cloud Storage Input: Improvements to file discovery
Enhancements
Athena Outputs: Enabled partition column types other than string
Performance improvements
2022/Mar/21
Changes in this Release:
API: The return value of shards and related fields changed from number to struct. The struct contains executionParallelism which represents the old number. Customers using API endpoints related to data sources, lookup tables or outputs may need to update their code. Please contact our support for details.
Bug Fixes
SQL
Compute Cluster
Fixed a bug that would cause the Compute cluster, in rare cases, notMonitoring
API
API
Fixed a race condition that prevented multiple concurrent requests toSnowflake Output
Fixed a bug when writing values to DATE columns.CDC
Fixed a bug that failed to write data which was larger than 2GB.
Enhancements
Functions
Python
CDC
AWS VPC integration
Validated subnet ids in Existing AWS VPC integrationAthena Output
Non-string partition columns now supported
2022/Feb/20
Bug Fixes
Show scaling policy in the Cluster page.
Wurfl User Agent: fixed a bug that appeared when there was more than one wurfl file in the organization.
Fixed a bug that caused the metrics to stop being reported to external monitoring systems (Datadog / Influx).
Deprecated SPLIT, CONCAT and DATE_DIFF functions and introduced new functions:
- SPLIT: SPLIT_DELIMITER_FIRST & PRESTO_SPLIT
- CONCAT: ARRAY_JOIN & PRESTO_CONCAT
- DATE_DIFF: DATE_DIFF_PRECISE & PRESTO_DATE_DIFF
Enhancements
Added function LN.
DATE_DIFF function now supports dynamic units.
LIKE operation now supports getting another field as a pattern.
2022/Feb/08 ANNOUNCEMENT
Recently Implemented Changes (Currently Enabled)
As part of Upsolver's effort to adopt industry standards, we are gradually changing functions to be more Presto compatible. The functions that changed are CONCAT, SPLIT and DATE_DIFF.
CONCAT, SPLIT and DATE_DIFF are being deprecated. Henceforth, SQLs that attempt to use CONCAT, SPLIT and DATE_DIFF will include a warning message when executed. This behavior is designed to draw attention to the changes. Currently running outputs are NOT affected by these changes.
The change log summary:
Important: All information in this table, including planned versions and dates, is subject to change; the information is provided only as a guideline for updates you may make in the future.
Schedule
Enabled by default in February, 2022
Functional Area
SQL Changes - Commands & Functions
ARRAY_JOIN The new function name for the deprecated CONCAT function.
Presto-compatible version of the CONCAT functions: PRESTO_CONCAT
SPLIT_DELIMITER_FIRST The new function name for the deprecated SPLIT Function.
Presto-compatible version of the SPLIT functions: PRESTO_SPLIT
DATE_DIFF_PRECISE The new function name for the deprecated DATE_DIFF Function.
Presto-compatible version of the DATE_DIFF functions: PRESTO_DATE_DIFF
2021
2021/10/05
Bug Fixes
MySQL Output: Fixed bug with boolean fields that were not written as expected.
Redshift Output: Fixed race condition in upsert tables that could cause rows not to get deleted in rare cases.
SQL:
Improved SQL editor responsiveness.
Fixed a bug in SQL parsing.
Fixed an exception arising when using infix operations.
Fixed join/match expressions not working correctly with >3 terms.
API:
Fixed an issue with distinct data sources that had the same name.
Prevented "SPLIT TABLE ON" on non-Athena Outputs.
Fixed name suggestion in hierarchical Athena outputs.
Enhancements
Azure Event Hubs: Support more features.
Streaming Output: Support setting an upsert key.
ContentTypes:
Support null values in TSV.
Support fixed width content type.
Oracle Object Storage: Various enhancements.
SQL: Support for WHERE filter in sub-select expressions.
S3 Data Source: Don't require AWS integration when creating S3 data source.
S3 Output: Support bucket-level access control.
UI: Added various annotations cluster graphs in the monitoring tab.
2021/08/09
Enhancements
CSV Content Format: allows repeating header names in files.
Function changes: the * CONCAT function was changed to ARRAY_JOIN.
ARRAY_JOIN - gets an array of strings and a delimiter and concats them.
* CONCAT - now gets multiple arguments and concats them (like || in SQL).
Bug Fixes
Athena Output: fixed a performance issue when deleting files due to retention.
Clusters: Show "Additional Processing Units for Replay" only in Compute Clusters.
Redshift Spectrum: fixed boolean casting when running output with SELECT *
API: Show thrown errors from Hive Metastore.
SQL: Fixed a bug when join with sub-query.
2021/08/02
Enhancements
Support dynamic position in ELEMENT_AT function.
Allow updating the boot script in Clusters.
Support fixed schema in S3 outputs with Avro format.
Bug Fixes
Fixed a bug when reading from multiple topics in Kafka Data Source.
API - Fixed column name suggester when mapping new fields in Athena Output.
2021/07/19
Bug Fixes
API
Fixed a bug with Azure Integration not working in some regions
Fixed validation when updating Columns Retention in Hive Metastore outputs
Data Source Page: don't show statistics from the preview when querying on a time range without data
Show output's fields on outputs with SELECT *
SQL
Prevent SQL regeneration when updating duplicate handling (APPEND ON DUPLICATE or REPLACE ON DUPLICATE)
Added some validation errors when trying to create invalid state
Backend
Fixed a bug that caused duplicated rows when editing Hive Metastore output with upserts
2021/07/11
Enhancements
Monitoring Reporters: Support Graphite
Hive Metastore Output: support splitting the output by schemas/databases in addition to splitting by table names. For example, if the value of the multi table field is "foo.bar", the "foo" will be the schema/database name, and "bar" will be the table name
Bug Fixes
S3 Data Sources Advanced: Fixed a bug with Glob File Name pattern
Hive Metastore Output: save storage by deleting manifest files after their usage
2021/07/05
Enhancements
Athena output: create Views with Glue API
Bug Fixes
Don't show completed dependencies in Lineage tab
Select * in Hive Metastore Output
Return the defined fields first
Removed the multi table column from the view definitions
Hive Metastore Output: fixed a bug when editing output with upserts
API: Allow changing the cluster size on Trial plans
2021/06/28
Enhancements
Added new modal and new SQL syntax for Table Name Suffix Field, which allow you to create multi tables in Hive Metastore output with a single output.
CDC Data source (MySQL) - added Destination part that allows replicating the source database to your data lake
Qubole Metastore: allow changing the time partition column type to String
Bug Fixes
Fixed health check parameters in Query clusters
Don't show deleting data sources in the main page
Hive Metastore output: added a cache layer in the Partition Manager that prevents redundant calls to the Metastore
API: Limit number of running previews. This should fix high CPU usage of the API when many previews are running in the same time.
2021/06/21
Enhancements
Support Select * in Redshift Spectrum
API: Support Select * and Upserts on Preview
Lookup Table: when running Output with a lookup to a Lookup Table, don't calculate the start/end times of the Lookup Table implicitly but use the original times.
Bug Fixes
SAML: Don't regenerate group when changing display name in Upsolver
Athena Output: fixed bug in Columns Retention
API: Fixed a bug that caused deleted inputs to not work
Snowflake Output: fixed columns casing
Removed "errors" outputs from outputs with Parquet format (Athena/S3)
2021/06/13
Enhancements
CDC ingestion is more stable when scaling cluster
Previewing outputs now considers the upsert definition of it
Compactions are now prioritized by urgency and age in order to prevent starvation
Support epoch time date pattern with prefixes in Cloud Storage Data Sources
Bug Fixes
Fixed database name validation in Microsoft SQL Server Connection
2021/06/07
Enhancements
HiveMetastoreClient: Better SET LOCATION method
2021/05/31
Enhancements
Elasticsearch Output: Support Upsert Keys
CDC: Support Column Exclude List
Added
SHA512
andSHA3_512
functions
Bug Fixes
S3 Connection with SQS now works with paths that ends with slash
2021/05/24
Enhancements
Added FROM_UNIXTIME function
Qubole Output: added an option to support changing column types
Hive Metastore Outputs: trigger more than one compaction if there is a backlog
Upsolver Output: support new field type: JSON. This type will be extracted when using as an Upsolver Data Source
CSV Content Format: support custom quote escape char
When duplicating output, copy the workspaces from the previous output
Bug Fixes
Fixed memory leak in External Hive Metastore outputs
2021/05/12
Enhancements
Added External Hive Metastore to the output types list
Support
SELECT *
on External Hive Metastore when querying with PrestoDB and SparkSQLReference Data can now be deleted after output is not using it (i.e. output deleted or output completed and was edited)
Reference Data can't be created with the same name as another Reference data or Lookup table
2021/05/04
Enhancements
Kafka Output - Allow ignoring messages that are too large (According to broker settings and producer settings)
Streaming Data Sources (Kafka, Kinesis, EventHubs) - Allow deleting offsets metadata files
API - Performance enhancements when updating Outputs / Lookup Tables
Bug Fixes
Hive Metastore: Fixed bug with
SELECT *
2021/05/03
Features
Support MAX/MIN aggregations on more data types
Support <,<=,>,>= on timestamps
2021/04/18
Features
Support
SELECT *
in Hive Metastore Outputs, this will update the table definition every time a new field arrivesOracle Object Storage Support
Bug Fixes
Aggregation calculated fields now works in SQL mode
2021/04/04
Features
CDC (Capture Data Change) Data Sources
Dremio and PrestoDB Outputs
Stop/Start Data Sources
Enhancements
Allow setting Lazy Load on Lookup Tables using the Properties tab
Update base AMI image in AWS to Amazon Linux 2
Bug Fixes
Data Lake Output: Filter out partitions that were deleted due to retention compaction
2021/03/28
Features
Hive Metastore: Allow creating an Output to External Hive Metastore
Enhancements
Lower latencies between dependencies in Compute Cluster
2021/03/21
Features
Ahana Output
Starburst Output
Enhancements
Redshift: Allow inserting 'now' into date / time fields in order to set a column to the insertion time
Bug Fixes
Kinesis Stream Autocomplete filter out Upsolver Internal Streams
Fixed bug in S3 IAM policy generation with slash in end of path
Avro Schema Registry: Don't treat HTTP errors as parse errors
SQL Parser: Don't regenerate the SQL when there is an expression that returns boolean with extra parentheses
2021/03/14
Support Real Time Kafka Output - Support running Kafka Outputs on the Real Time cluster with ms latency
Hive Metastore Output with Upserts - fixed a bug that caused the compaction process to get stuck after edit
Hive Metastore Output with Upserts - support number as an upsert key
Lookup Tables: fixed a bug when using sharded lookup tables in outputs
API: show the current capacity when clicking Update Capacity button on Clusters page
API: fixed wrong validation on Kafka Outputs (support numbers on topic names)
Microsoft SQL Server Output: fixed create statement when primary key is empty
API: fixed a bug when removing mapping of fields
2021/03/07
S3 Data Source with Parquet Content Format - split files by 200MB
Lookup Table - support compaction shards on lookup tables with multiple windows
SQL - fixed a bug generating the SQL when "Is Delete Field" is mapped to a column
2021/03/01
Monitoring: Added three metrics to Hive Metastore Outputs
partitions-delay
- The delay between now and the last partition timedata-loading-delay
- The delay on loading data to the metastorepartitions-count
- Number of partitions in the table
IS_DUPLICATE and Lookup from Data Sources: Don't omit key columns for new versions
Avro: Fixed escaping of
[]
in array namespacesFixes a bug in Snowflake Output with VARIANT column output with arrays
2021/02/23
Azure: Support billing SaaS offering
DNS: Ability to sync Route53 records with private IP addresses for customers with own Spotinst Account
SSO/bugfix: attach endpoints don't have permissions
Partners: Support exporting logs and monitoring to external domain
Free Plan: Support upgrading account
2021/01/04
Snowflake Output: Configurable DbDecimal
CSV Content Type: Don't ignore values starting with #
SQL: Support unmapped columns in JDBC outputs. New mapped columns will be created when deploying the output
Infra: performacne improvements
Lookup Table: fixed a bug when using Delete column
Singup: Create sample data source on register
SQL: Fixed a bug with autocomplete Lookup Table names
SQL: Support Lookup time
Athena Output: Fixed a bug with editing Athena Output when Upsert Partition Fields is true
2020
2020/12/08
JDBC Data Sources: Fixed an issue that could cause it to get stuck and not read any data
JDBC Connections: Fixed an issue that would allow connections to be created with a concurrency of 0
Monitoring: Include the actual time an index is ready to be read form in the monitoring delay charts *
Allow using anonymous credentials to access data in public S3 bucketsA
AppFlow: Autocomplete buckets and flow names during setup
Functions:
Added a Subtract Time Zone Feature to complement Add Time Zone
UI:
Show SQL Errors when deploying Outputs
Show indicative error message when Reference Data file couldn't be found
2020/11/01
Deployment: Allow deploy Upsolver servers to Azure
Add support for Azure EventsHub data source
Athena: Create Glue database if doesn't exist
Functions: Fixed a bug in TO_DATE function
Function: Added new function: RECORD_TO_JSON
Query Cluster: Improvements in the underlying files cache
SQL: Show validation error when mapping an array to unrelated path
SQL: Show validation error when mapping null without specifying type
API: When creating data source, fixed a bug when previewing large file with tar compression
API: Fixed high CPU on boot
2020/10/21
Kafka data source: support reading custom kafka headers
Metastore Ouptut: support running Athena/Qubole output without partitioning by time
Snowflake Output: support Azure storage as the intermediate storage
Compute Cluster Infra: optimize threads when running low priority tasks
ETL: Improved target path inference for some scenarios
Monitoring Task: fixed failure when one of the monitoring reporters is not avaiable
SQL: Fixed validation of inline functions in aggregations
Metastore Output: set the table location to the root path of the output
Qubole: allow defining if TIMESTAMP fields will be created as TIMESTAMP or BIGINT columns in the table per output
Qubole: Added feature flag to deprecate the "SET hive.on.master=?" statement
Elasticsearch Output: Fixed a bug that could cause high memory usage
2020/10/15
Add Amazon AppFlow support
Zip Function- Added optional field names
Api - Fixed validation message for Kafka input
Elastic Search - upgraded client version
2020/10/13
S3 Data source with Parquet Content Format - when the file is not a parquet file, handle it as a parse error
Added Free plan
SQL - Fixed a duplication issue when function target name and select target name are the same
Hive Metastore Output with Upsert keys - Trigger compactions in a better way to avoid compacting in a loop
SQL - Fixed target path inferrence of key columns with inline functions on aggregated outputs
API - Allow setting higher number of shards in the output than number of execution parallelism in the data source. This will parallel the data by the data source files
Support "SELECT * " in cloud storage outputs with parquet content format
API - Fixed a bug that allowed creating more than one draft in the same output
2020/10/05
Show number of sparse fields inside fields tree in inputs and outputs and allow to toggle the filter
2020/10/01
Jdbc data source: use field types from the table definition
PostgreSQL output: support timestamptz data type
UI: New modal when adding multiple fields in tabular outputs to prevent cartesian product between unrelated arrays
No need to specify a target field for filters when creating a filter from the UI
Some bug fixes in API
2020/09/30 - SNAPSHOT
Query Agent - Support round robin
"No Local API" page - Show "Connection Established" instead of error when able to connect
Input creation preview - Filter big JSONs and let the user know about it
2020/09/23
Performance improvements in internal cache mechanism
Performance improvements in Hive Metastore outputs Raw Blame
Fixed bug that caused Hive Metastore outputs with upserts to stuck after editing a new version
Avro w/ Schema Registry Content Format: Support Tagged Avro Schema Registry
Improved target path calculation of inline functions
Added validation when deploying a draft that the start time is not after the end time of the previous version
SQL: Disable automatic column name generation
Support cancelling pending integration
2020/09/14
No Local API Page: Fixed showing "You can't connect" instead of "local DNS resolve" error
CloudFormation: link to the right region in deploy stack
Less API Calls to Cloud Storage in order to check completion of tasks
Calculated Function
TO_DATE
: Changed threshold to not return negative datesFixed bug with PostgreSQL outputs not allowing to alter the column types
2020/09/07
Support Workspaces in Clusters
Catch all errors from GCP / Azure and show in UI
Hive Metastore Outputs: the column names
year
,month
,day
, andhour
are now reserved
2020/08/31
Big performance improvements for replay in Kinesis & Kafka Data Sources
Big performance improvements for replay in Hive Metastore Outputs
2020/08/24
Compute Cluster: IO Tasks will now run only on Master cluster and will never run on Replay Cluster
Compute Cluster: Option to limit number of Elastic IPs allocated for the cluster
Added
XX_HASH
andSORT_BY
calculated functionsUI : Support literal inputs in aggregations
2020/08/17
Performance improvements to Hive Metastore Outputs
Fixed bug with very large parquet file outputs used to make servers crash on OOM
Preview Output will now stop after 15 seconds instead of making the API server hang
Support Redshift and PostgreSQL in JDBC Data Source
UI: Output - New Partitions Modal
2020/08/10
SQL now supports target site inference, this fixes a lot of confusing bug when using arrays with calculated functions
SQL: Fixed bug with throwing 500 errors on missing properties of calculated functions
Athena Output: new outputs will not nest compaction files for better compatibility support with external systems
2020/08/03
Fixed bug when previewing completed Output with Lookups
Update Retention validation message is now dismissible
Regex and Split Content Formats have been added for better compatibility with custom data formats
2020/07/27
MS SQL Server Output
Elasticsearch Output: Removed
index_type
argument, using_doc
/doc
by defaultUI: overhauled the properties pages
UI: Split field statistics by Data Source in Output page
2020/07/20
JSON_TO_RECORD
calculated functinon: Allow whitespace in CSV mapping definition and improve exception handlingAthena Output: Faster replays when run compactions is set to false
Less red notification errors due to internal errors
Aggregated Outputs now delete the intermediate aggregations immediately after outputing the data (instead of waiting to the retention period, if defined)
2020/07/13
MySQL Output: Fixed bug with quote followed by delimiter char inside the data to output
Create Calculated Function: Fixed a bug with the default output path calculation
JDBC Data Source now supports creating new tables instead of only inserting data to existing tables
JDBC Connections: indicative validation error messages on creation
2020/07/06
PostgreSQL Output
Writing logs to Customer Bucket now supports writing to specific path in the customer's bucket
SQL: Show indicative error when trying to filter subquery
MySQL Output: Fixed writing of date/time fields
UI: Refined the time range picker
New boolean operators and calculated functions:
AND
,OR
,NOT
, andIS DISTINCT FROM
now works like in SQLUI: Calculated Functions Gallery now matches to the SQL syntax
2020/06/29
Redshift: Support configuring
Added
TO_DATE
calculated function (converts strings to dates without having to insert format)Added
APPROX_COUNT_DISTINCT_EACH
aggregationIAM Role Credentials: Assume role via the Server Role created in the AWS Integration
Booting a Cluster after stopping it for a while is faster
SQL: infer null type instead of asking the user to explicit insert the type of the null (
null:string
)SQS: Allow configuring KMS key
UI: Fixes to "Add Lookup to Data Source" page
S3: Show the right action on access error
2020/06/22
UI: Charts now shows shared crosshair between graphs
"Update Shards" error message is now more informative
Added deployment support to more AWS regions
Fixed rare case where AWS Redshift Output would duplicate data
Fixed bug where multiple rows with the same Upsert Key would insert in the same output interval in Snowflake and Redshift Upsert Outputs
2020/06/15
Git Integration: Don't cancel git integration after one failure to push changes
UI now allows operating aggregated outputs without key columns (Aggregate all data within the output interval)
UI: Refinments in the Fields Tree
Snowflake Output: Better replay performance with sparse Data Sources
Added
EXTRACT
,MILLISECOND
,SECOND
,MINUTE
,HOUR
,DAY
,DAY_OF_MONTH
,DAY_OF_WEEK
,DAY_OF_YEAR
,WEEK
,MONTH
,QUARTER
,YEAR
, andYEAR_OF_WEEK
date extraction calculated functionsFixed bug with
REPLACE
calculated function could throw errors in some cases
2020/06/08
Added
RPAD
,LPAD
,STRPOS
,DATE_ADD
, andDATE_DIFF
calculated functionsPrivate API now uses
r5
instead ofr4
instances in AWS by defaultSQL: Better error messages for inline features
2020/06/01
The "Archive" operation has removed from the System, Deleted items can be seen using the "Trash" button in the list view
JDBC Data Source: Support Start Time
Multiple Bug Fixes in Snowflake Output
Added
DATE_TRUNC
calculated functionFixed bug with copying big files in S3
UI Performance enhancements
2020/05/25
Update Configuration of Upsert Outputs using the UI
Allow writing logs from Upsolver to Customer requested location as well as Upsolver
Reduced dramatically the number of API class to Cloud Storage
2020/05/18
Performance improvements and bug fixes
2020/05/04
Data Sources:
JDBC: Added support for connecting to an Oracle DB
Bug fix for event type statistics breakdown in local APIs
Performance and cost improvements
2020/04/27
Revised output preview screen
Minor bug fixes and improvements
2020/04/20
Data Sources:
S3 Over SQS: Allow creating Data Sources from multiple connections with the same prefix
Outputs:
Added output to Snowflake
Monitoring improvements
2020/04/06
Data Sources:
Added properties to Upsolver data source.
Kafka: Added support for custom consumer/producer properties.
Outputs:
UI improvements in sources fields tree
Kafka: Added support for custom consumer/producer properties.
Monitoring Repots:
Added Splunk export support
2020/03/30
UI updates and performance improvements
2020/03/16
Data Sources:
Split meta-data by Event Type field - you are now able to split and view your data source by the desired field in your data source.
Outputs:
SELECT * is supported for Upsolver and Elasticsearch outputs.
Added Amazon Kinesis connector.
Qubole connector now supports using an HTTPs proxy address to override the endpoint used to access Qubole.
IAM:
Added support for SAML with provisioning capabilities.
2020/03/09
Clusters:
Compute cluster monitoring: Compute Units Graph was updated and now provides a breakdown of the compute units used by each task (Data Source/Output/Lookup Table).
2020/02/24
Outputs:
Elasticsearch: Editing the connection string is now supported - as long as the new nodes belong to the same cluster.
Elasticsearch: Added support for setting the event to _doc.
2020/02/18
Transform with SQL:
Added support for partitioning configuration.
Casting improvements.
Outputs:
Redshift: Added support for configuring fail on write error. If enabled, any error while copying data to Redshift will cause the entire bulk to be skipped. The skipped manifest will be saved aside for manual re-processing once the copy error has been fixed. If disabled the same behavior will occur after 100K errors (The max allowed by Redshift).
Monitoring Reporting:
A bug caused false reported delay (in rare cases) was fixed.
2020/02/10
Data Sources:
JDBC Data Source - added support for PostgreSQL.
2020/02/05
Outputs:
Added UUID Generator Calculated Function
Transform with SQL:
Added support for SQL comments using
--
(see example below)Improved error messages
2020/01/30
Data Sources:
Parquet reader: support INT96 timestamps and non-canonical field names
Added support for LZO decompression
Added a JDBC connector
Outputs:
Support correcting a specific time frame in an output
Added UpdateSql programmatic API operation for creating outputs
2020/01/22
IAML
Multi-organization support
Outputs:
Support lazy load of lookup tables
Support querying lookup table in SQL
Support sharding of aggregated outputs
Data Sources:
Support S3 data source initial load configuration
Support non-lexicographic date patterns in S3
UI & performance improvements
2020/01/13
Data Sources:
Support XML as content type
2020/01/06
Performance improvements and bug fixes
2019
2019/12/22
Outputs:
Elasticsearch - Add option not to delete indices from Elasticsearch based on retention
Transform with SQL:
Support data source features
UI:
Outputs - Add support for filtering the Preview when in SQL mode
Performance improvements
2019/12/16
Data Sources:
Support changing the number of shards using increments of one (instead of multiplies of two)
Outputs:
Athena - add support for excluding partitions from the table
Transform with SQL:
Support default field names instead of col_x
Generate SQL for running Outputs
Refer to fields by index in the GROUP BY statement
2019/12/09
UI improvements and bug fixes
2019/12/03
Outputs:
Add support for Redshift Spectrum
Update table schema in Qubole is now optional (the default behavior would be to update)
2019/11/17
Outputs:
Allow switching between raw and aggregated modes
Added QUERY_STRING_TO_RECORD calculated function for query string extractions
Transform with SQL:
Unify SQL code blocks into a single block
2019/11/10
Athena Upserts: Update and delete existing data in your Data Lake
Transform with SQL:
Support having statement in Aggregated Outputs
Support DECIMAL types
Support Athena Upserts
S3 Output: JSON files will end with one "\n" instead of two "\n" (as stated in jsonlines.org)
2019/10/31
When deploying an output, "Now" is resolved when submitting the form
Connections and Clusters can be attached to Workspaces
IAM: Lists of Data Sources, Outputs, Lookup Tables, Connections and Clusters are filtered by the user "list" permission
2019/09/23
UI improvements
Fixed bug on lookup to COLLECT_SET_EACH column
Stability improvements
2019/09/02
Allow changing default organization connection
Added decimal support to Athena Outputs
Allow turning off/on compactions in Athena Outputs
Better support for Data Sources with large amounts of fields
Notebook (Beta)
JOIN
GROUP BY
HAVING
2019/07/08
Various Performance Improvements in UI
Added ZIP Calculated Function to ZIP between multiple arrays
MySQL Output: Row is replaced if duplicate key is found
Notebook (Beta)
like / not like syntax (e.g. “name” like ‘a__%’)
not in syntax (e.g. “status” not in (“failed”, “canceled”))
= as equality operator syntax (e.g. “status” = ‘ok’ instead of “status” == ‘ok’)
Better error messages
2019/06/24
Lookup Tables / API Playground
Support querying multiple rows
Auto complete for keys
Querying on specific time range
Notebook (Beta): a better way to create enrichments
2019/06/17
Calculated Functions: Added numeric in feature (e.g. “data.a”:number in (1,2,3))
Parse Avro data using Confluent Schema Registry
2019/06/03
Various Performance Improvements in UI
Show connection errors when creating/editing MySQL/Redshift Output
Fixed intermittent recoverable errors in tasks
Fixed delay when using the same connection for multiple Redshift/Elasticsearch Outputs
2019/05/26
Experimental: updating / deleting rows in output to Athena, you can try it out by using the “Upsert Key” and “Is Delete Field” special fields
2019/05/19
Ingestion - Added “index” header to all messages (useful when ingesting multiple events in one message)
Hive Metastore Outputs now drops duplicate logical partitions
API - list Output / Materialized Views returns faster
GDPR - Materialized Views now supports deleting rows
Physical Deletion runs much faster with fewer operations on the underlying Cloud Storage
Retention is now set on Materialized Views created by DEDUP features
2019/05/14
Data Source - Simplified creation of Kafka, Kinesis and AWS S3 Data Sources
2019/05/13
Replay Cluster - Fixes some cases where the replay cluster might not shut down
2019/05/06
Qubole Client - set hive.on.master and use database for all queries
Performance improvements for retention
Elasticsearch Output - Better retry mechanism
2019/04/22
Athena - Switch to using Glue API for all DDL statements
Monitoring Tab - fix bug that would display some rows twice
Outputs page - Correct the range of some of the graphs
Add timeout to copy/read S3 requests to prevent processing delays
Data Source - show a preview of data immediately upon creation
Improve UI performance related to connections page
2019/04/08
Dry run environment support
Monitoring - added written items and written bytes
Monitoring - added original-task-name tag to all metrics
Qubole - set hive.on.master=false
Permissions - added policy editor
Athena - reduce spam of Athena history
Athena - drop table when deleting an output if the option is selected
Kafka - support changing the number of shards in the UI
Some performance improvements
UI - Added multi-unmap fields (for Avishai)
2019/04/01
Increase Kafka consumer version to 2.1.1
Monitor delay in managing partitions
Bug fix - add connection timeout to ElasticSearch connections
Remove dependency on Upsolver DynamoDB for servers starting up
2018
2018/11/15
Data Sources / Materialized Views / Outputs: Toggle between card view and table view
2018/11/14
Translate Calculated Function: Show CSV Editor for the dictionary field
Cluster Details Page: show the elastic IPs of the Cluster
Outputs: Qubole Output
Outputs: Usability Improvements in Creation/Deploy flow
Upsolver Language:
"data.str":string in ('a','b','c')
syntaxUpsolver Language: supports coalesce operator
"data.str":string? # COALESCE("data.str":string, '')
"data.str":string?'default-value' # COALESCE("data.str":string, 'default-value')
"data.bool":boolean? # COALESCE("data.bool":boolean, false)
"data.bool":boolean?true # COALESCE("data.bool":boolean, true)
"data.number":number? # COALESCE("data.number":number, 0)
"data.number":number?2.5 # COALESCE("data.number":number, 2.5)
2018/06/26
Output / Materialized Views: Added ability to edit the Data Sources from the properties tab (Only if the object isn't deployed yet)
2018/06/18
Aggregated Output: Added option to add calculated fields over aggregations
2018/06/17
Compute Cluster: Allow to spin up "Replay" Cluster when needed
Outputs: Edit S3 and Upsolver Outputs
Filters: Improved UX (Whitelist and Blacklist Filters)
Materialized Views: Time Series Aggregations are shown as graphs in the Data Sample tab
2018/03/01
Materialized Views: Added an API to iterate the MVs
Added Time Zone Offset Function
Outputs: Added automatic time field to Athena and Upsolver outputs
Calculated Fields: Support editing of calculated fields inputs and parameters
Users can now create readonly S3 Connections
Athena Output now supports setting of event time which is used for partitioning
Elasticsearch Output now supports retention
Various performance improvements to UI
Support filtering on time range in Data Source inspection page
Support for editing lookup enrichments
Monitoring now shows Materialized Views that are used in Lookup enrichments
Improvements to Auto Scaling
Support non string Key Columns in Materialized Views
Aggregated output doesn't change the type of the Key Columns to string anymore
Last updated