Change log
Change log for Upsolver Classic (app.upsolver.com)
Bug Fixes
- Jdbc Outputs: delete intermediate files after beging written to the db
- Revert Debezium to version 1.4
​
Bug Fixes
- JDBC Outputs: delete intermediate files after being written to the DB
Bug Fixes
- Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka
Enhancements
- Upgrade debezium to V2.1.2
- Add debezium version header
- Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API
- Memory allocation optimizations in Lookup Table Query servers
​
Bug Fixes
- Fixed memory leak on Elastic Search Outputs
- Minor bug fixes
Bug Fixes
- Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures
- Fixed an issue where discovering a new partition / topic without any messages would cause Kafka / Kinesis Data Sources to hang until a message arrived.
- Fixed an issue when creating a Kafka Data Source over high number of topics would cause CPU spike in the API
Enhancements
- Use regional STS endpoints if available
- Bug Fixes
- Minor bug fixes
- Bug Fixes
- Fixed bucket region detection when using a S3 Private VPC endpoint
- API: Fixed a bug that cause to fail to Run new output with Lookup when using full history snapshot
- Fixed an edge case that could cause data loss when editing stopped Athena output
- Enhancements
- Outputs: support window size override in non aggregated outputs
- Bug Fixes
- Monitoring: Fixed the 'operation_name' of aggregation steps to be the original 'operation_name' instead of "Output Aggregation". This means metrics reported via Monitoring Reports will now show aggregation step information under the correct 'operation_name'
- Unsynchronized data sources no longer fail if they can't construct their consumers
- Bug Fixes
- SQL: Improved error messages and auto completion
- Enhancements
- Performance and memory improvements
- Bug Fixes:
- API: Prevent changing the end execution time for old output versions
- API: Added validation to prevent creating Cloud Storage outputs with a date format that is not refined enough to include the Output Interval
- Improved performance of Python UDF validations when uploading a new UDF
- Fixed slow replay progress for Snowflake and Postgresql outputs
- Enhancements
- Added RAND function and added overload to RANDOM function that gets no arguments and returns a value between 0 and 1
- BREAKING CHANGE
- Hive Metastore (Athena) Output: When using SELECT * with partition fields, if there is a field in the source that mapped to the partition field column, the field won't be written to the parquet files because this value can't be queried
- Enhancements
- Kafka Data Sources: Support unsynced mode, which allows the stream to continue processing even when there are errors or a backlog from the topic
- Add presto-compliant RANDOM() and RAND() functions
- We now support clusters that mix both Intel-based (e.g. r6i, r6a) and ARM-based types (e.g. r6g) within the same Elastigroup
- Bug Fixes
- Fix deadlock between the indexing task and index entry deletion task that could end up waiting for each other when modifying an Athena output's data
- When deploying clusters to a region, we now filter out instance types that don't exist in that region
- Hive Metastore (Athena) Output: not calculate statistics of rows that were filtered out due to missing partition field value Previously, if a row was filtered out because the partition field value was missing or null, the rows counted in Output Fields statistics and in Events over Time graphs.
- Improved recovery mechanism when our configuration database is unavailable
- Enhancements
- Avoid out of order per key in Kinesis outputs by sending the same key only once within the same PutRecords request.
- Improved performance of server boot time and memory usage for organizations that use high number of shards.
- Bug Fixes
- Fixed stack overflow in JDBC Data Source in some cases
- API: Fixed a bug on generating SQL statement when the SQL is not sync with output's definition
- Enhancements
- Minor performance improvements in data processing critical path
- Improved performance of servers boot and periodic configuration load, this might improve reliability and performance of data flow for organizations and clusters that have a lot of processing entities
- Bug Fixes
- Fixed a bug where Compactions would stop working when advancing the "End Execution At" property of the Hive Metastore Output after it has arrived (now > End Execution At).
- API: Added validation to prevent creating connections with empty names
- Minor Performance Enhancements
- Bug Fixes
- API: Fixed an issue that caused the SQL statement to be invalid after changing data source of an output
- Fixed an issue when mapping numeric field to an upsert column of type string in JDBC outputs (Redshift, Snowflake, ...)
- Fixed a rare bug where an internal metadata index would stop progressing, preventing compactions from occurring.
- Enhancements
- The Elasticsearch client version was upgraded from 6.x to 7.x in order to also support ElasticSearch 7 & 8 as output targets
- Performance enhancements for clusters with a lot of tasks (More to come in the future)
- Snowflake Output: Support writing to Transient Tables
- Kafka Data Sources: Added an option to restart reading partition when the end offset of that partition is larger than the last offset read by the Data Source for the same partition. This should allow users to reset partitions.
- Bug Fixes
- Hive Metastore Output: performance improvements on calculating partition compaction trigger
- Fixed a bug where Outputs with IS_DUPLICATE with big window sizes wouldn't be considered as completed
- Fixed a bug where Outputs that depend on an Upsolver Output would run with a Runtime Delay based on the maximum Runtime Delay of all the versions of the Upsolver Output, the new behaviour will skip completed versions
- Upsolver Query (Table output) was visible in the UI. This will now only be available via SQLake.
- [BREAKING CHANGE] Simple S3 Data Source: changed the value of the time field to be the beginning of the minute instead of the end of the minute. This change will be applied only on new data sources
- Enhancements
- More informative errors when missing access to S3 resources
- Bug Fixes:
- API: Fixed being able to create a kafka input with an inavlid storage connection
- API: ModifyServerFile changeset now adds file if not exists
- New Features:
- Compression: Add ZStandard
- Redshift Output: Support authentication with IAM
- Redshift Output: Support Super type
- Roles Anywhere - Hide internal access/secret keys for SoC2
- Enhancements
- Upgraded Kafka Client to Version 3.2.0
- Upsgarded Redshift to Version 2.1.0.9
- Improved the reliability of the connection between User Clusters and the Configuration Database
- Performance improvements in the Compaction Coordinator in Athena Outputs
- Improve error meeseges.
- Enalrged maximum number of shards, output shards and compaction shards in outputs to 512
​
Bug Fixes
- Simple Cloud Storage Input: Improvements to file discovery
Enhancements
- Athena Outputs: Enabled partition column types other than string
- Performance improvements
​
Changes in this Release:
- API: The return value of shards and related fields changed from number to struct. The struct contains executionParallelism which represents the old number. Customers using API endpoints related to data sources, lookup tables or outputs may need to update their code. Please contact our support for details.
Bug Fixes
SQL
Compute Cluster
Fixed a bug that would cause the Compute cluster, in rare cases, notMonitoring
API
API
Fixed a race condition that prevented multiple concurrent requests toSnowflake Output
Fixed a bug when writing values to DATE columns.CDC
Fixed a bug that failed to write data which was larger than 2GB.
Enhancements
Functions
Python
CDC
AWS VPC integration
Validated subnet ids in Existing AWS VPC integrationAthena Output
Non-string partition columns now supported
Bug Fixes
- Show scaling policy in the Cluster page.
- Wurfl User Agent: fixed a bug that appeared when there was more than one wurfl file in the organization.
- Fixed a bug that caused the metrics to stop being reported to external monitoring systems (Datadog / Influx).
- Deprecated SPLIT, CONCAT and DATE_DIFF functions and introduced new functions:
- SPLIT:
SPLIT_DELIMITER_FIRST & PRESTO_SPLIT
- CONCAT:
ARRAY_JOIN & PRESTO_CONCAT
- DATE_DIFF:
DATE_DIFF_PRECISE & PRESTO_DATE_DIFF
Enhancements
- Added function LN.
- DATE_DIFF function now supports dynamic units.
- LIKE operation now supports getting another field as a pattern.
Recently Implemented Changes (Currently Enabled)
As part of Upsolver's effort to adopt industry standards, we are gradually changing functions to be more Presto compatible. The functions that changed are CONCAT, SPLIT and DATE_DIFF.
CONCAT, SPLIT and DATE_DIFF are being deprecated. Henceforth, SQLs that attempt to use CONCAT, SPLIT and DATE_DIFF will include a warning message when executed. This behavior is designed to draw attention to the changes. Currently running outputs are NOT affected by these changes.
The change log summary:
Important: All information in this table, including planned versions and dates, is subject to change; the information is provided only as a guideline for updates you may make in the future.
Enabled by default in February, 2022
SQL Changes - Commands & Functions
Behavior Change | Additional Notes |
---|---|
- Bug Fixes
- MySQL Output: Fixed bug with boolean fields that were not written as expected.
- Redshift Output: Fixed race condition in upsert tables that could cause rows not to get deleted in rare cases.
- SQL:
- Improved SQL editor responsiveness.
- Fixed a bug in SQL parsing.
- Fixed an exception arising when using infix operations.
- Fixed join/match expressions not working correctly with >3 terms.
- API:
- Fixed an issue with distinct data sources that had the same name.
- Prevented "SPLIT TABLE ON" on non-Athena Outputs.
- Fixed name suggestion in hierarchical Athena outputs.
- Enhancements
- Azure Event Hubs: Support more features.
- Streaming Output: Support setting an upsert key.
- ContentTypes:
- Support null values in TSV.
- Support fixed width content type.
- Oracle Object Storage: Various enhancements.
- SQL: Support for WHERE filter in sub-select expressions.
- S3 Data Source: Don't require AWS integration when creating S3 data source.
- S3 Output: Support bucket-level access control.
- UI: Added various annotations cluster graphs in the monitoring tab.
- Enhancements
- CSV Content Format: allows repeating header names in files.
- Function changes: the * CONCAT function was changed to ARRAY_JOIN.
- ARRAY_JOIN - gets an array of strings and a delimiter and concats them.
- * CONCAT - now gets multiple arguments and concats them (like || in SQL).
- Bug Fixes
- Athena Output: fixed a performance issue when deleting files due to retention.
- Clusters: Show "Additional Processing Units for Replay" only in Compute Clusters.
- Redshift Spectrum: fixed boolean casting when running output with SELECT *
- API: Show thrown errors from Hive Metastore.
- SQL: Fixed a bug when join with sub-query.
- Enhancements
- Support dynamic position in ELEMENT_AT function.
- Allow updating the boot script in Clusters.
- Support fixed schema in S3 outputs with Avro format.
- Bug Fixes
- Fixed a bug when reading from multiple topics in Kafka Data Source.
- API - Fixed column name suggester when mapping new fields in Athena Output.
- Bug Fixes
- API
- Fixed a bug with Azure Integration not working in some regions
- Fixed validation when updating Columns Retention in Hive Metastore outputs
- Data Source Page: don't show statistics from the preview when querying on a time range without data
- Show output's fields on outputs with SELECT *
- SQL
- Prevent SQL regeneration when updating duplicate handling (APPEND ON DUPLICATE or REPLACE ON DUPLICATE)
- Added some validation errors when trying to create invalid state
- Backend
- Fixed a bug that caused duplicated rows when editing Hive Metastore output with upserts
- Enhancements
- Monitoring Reporters: Support Graphite
- Hive Metastore Output: support splitting the output by schemas/databases in addition to splitting by table names. For example, if the value of the multi table field is "foo.bar", the "foo" will be the schema/database name, and "bar" will be the table name
- Bug Fixes
- S3 Data Sources Advanced: Fixed a bug with Glob File Name pattern
- Hive Metastore Output: save storage by deleting manifest files after their usage
- Enhancements
- Athena output: create Views with Glue API
- Bug Fixes
- Don't show completed dependencies in Lineage tab
- Select * in Hive Metastore Output
- Return the defined fields first
- Removed the multi table column from the view definitions
- Hive Metastore Output: fixed a bug when editing output with upserts
- API: Allow changing the cluster size on Trial plans
- Enhancements
- Added new modal and new SQL syntax for Table Name Suffix Field, which allow you to create multi tables in Hive Metastore output with a single output.
- CDC Data source (MySQL) - added Destination part that allows replicating the source database to your data lake
- Qubole Metastore: allow changing the time partition column type to String
- Bug Fixes
- Fixed health check parameters in Query clusters
- Don't show deleting data sources in the main page
- Hive Metastore output: added a cache layer in the Partition Manager that prevents redundant calls to the Metastore
- API: Limit number of running previews. This should fix high CPU usage of the API when many previews are running in the same time.
- Enhancements
- Support Select * in Redshift Spectrum
- API: Support Select * and Upserts on Preview
- Lookup Table: when running Output with a lookup to a Lookup Table, don't calculate the start/end times of the Lookup Table implicitly but use the original times.
- Bug Fixes
- SAML: Don't regenerate group when changing display name in Upsolver
- Athena Output: fixed bug in Columns Retention
- API: Fixed a bug that caused deleted inputs to not work
- Snowflake Output: fixed columns casing
- Removed "errors" outputs from outputs with Parquet format (Athena/S3)
- Enhancements
- CDC ingestion is more stable when scaling cluster
- Previewing outputs now considers the upsert definition of it
- Compactions are now prioritized by urgency and age in order to prevent starvation
- Support epoch time date pattern with prefixes in Cloud Storage Data Sources
- Bug Fixes
- Fixed database name validation in Microsoft SQL Server Connection
- Enhancements
- HiveMetastoreClient: Better SET LOCATION method
- Enhancements
- Elasticsearch Output: Support Upsert Keys
- CDC: Support Column Exclude List
- Added
SHA512
andSHA3_512
functions
- Bug Fixes
- S3 Connection with SQS now works with paths that ends with slash
- Enhancements
- Added FROM_UNIXTIME function
- Qubole Output: added an option to support changing column types
- Hive Metastore Outputs: trigger more than one compaction if there is a backlog
- Upsolver Output: support new field type: JSON. This type will be extracted when using as an Upsolver Data Source
- CSV Content Format: support custom quote escape char
- When duplicating output, copy the workspaces from the previous output
- Bug Fixes
- Fixed memory leak in External Hive Metastore outputs
- Enhancements
- Added External Hive Metastore to the output types list
- Support
SELECT *
on External Hive Metastore when querying with PrestoDB and SparkSQL - Reference Data can now be deleted after output is not using it (i.e. output deleted or output completed and was edited)
- Reference Data can't be created with the same name as another Reference data or Lookup table
- Enhancements
- Kafka Output - Allow ignoring messages that are too large (According to broker settings and producer settings)
- Streaming Data Sources (Kafka, Kinesis, EventHubs) - Allow deleting offsets metadata files
- API - Performance enhancements when updating Outputs / Lookup Tables
- Bug Fixes
- Hive Metastore: Fixed bug with
SELECT *
- Features
- Support MAX/MIN aggregations on more data types
- Support <,<=,>,>= on timestamps
- Features
- Support
SELECT *
in Hive Metastore Outputs, this will update the table definition every time a new field arrives - Oracle Object Storage Support
- Bug Fixes
- Aggregation calculated fields now works in SQL mode
- Features
- CDC (Capture Data Change) Data Sources
- Dremio and PrestoDB Outputs
- Stop/Start Data Sources
- Enhancements
- Allow setting Lazy Load on Lookup Tables using the Properties tab
- Update base AMI image in AWS to Amazon Linux 2
- Bug Fixes
- Data Lake Output: Filter out partitions that were deleted due to retention compaction
- Features
- Hive Metastore: Allow creating an Output to External Hive Metastore
- Enhancements
- Lower latencies between dependencies in Compute Cluster
- Features
- Ahana Output
- Starburst Output
- Enhancements
- Redshift: Allow inserting 'now' into date / time fields in order to set a column to the insertion time
- Bug Fixes
- Kinesis Stream Autocomplete filter out Upsolver Internal Streams
- Fixed bug in S3 IAM policy generation with slash in end of path
- Avro Schema Registry: Don't treat HTTP errors as parse errors
- SQL Parser: Don't regenerate the SQL when there is an expression that returns boolean with extra parentheses
- Support Real Time Kafka Output - Support running Kafka Outputs on the Real Time cluster with ms latency
- Hive Metastore Output with Upserts - fixed a bug that caused the compaction process to get stuck after edit
- Hive Metastore Output with Upserts - support number as an upsert key
- Lookup Tables: fixed a bug when using sharded lookup tables in outputs
- API: show the current capacity when clicking Update Capacity button on Clusters page
- API: fixed wrong validation on Kafka Outputs (support numbers on topic names)
- Microsoft SQL Server Output: fixed create statement when primary key is empty
- API: fixed a bug when removing mapping of fields
- S3 Data Source with Parquet Content Format - split files by 200MB
- Lookup Table - support compaction shards on lookup tables with multiple windows
- SQL - fixed a bug generating the SQL when "Is Delete Field" is mapped to a column
- Monitoring: Added three metrics to Hive Metastore Outputs
partitions-delay
- The delay between now and the last partition timedata-loading-delay
- The delay on loading data to the metastorepartitions-count
- Number of partitions in the table
- IS_DUPLICATE and Lookup from Data Sources: Don't omit key columns for new versions
- Avro: Fixed escaping of
[]
in array namespaces- Fixes a bug in Snowflake Output with VARIANT column output with arrays
- Azure: Support billing SaaS offering
- DNS: Ability to sync Route53 records with private IP addresses for customers with own Spotinst Account
- SSO/bugfix: attach endpoints don't have permissions
- Partners: Support exporting logs and monitoring to external domain
- Free Plan: Support upgrading account
- Snowflake Output: Configurable DbDecimal
- CSV Content Type: Don't ignore values starting with #
- SQL: Support unmapped columns in JDBC outputs. New mapped columns will be created when deploying the output
- Infra: performacne improvements
- Lookup Table: fixed a bug when using Delete column
- Singup: Create sample data source on register
- SQL: Fixed a bug with autocomplete Lookup Table names
- SQL: Support Lookup time
- Athena Output: Fixed a bug with editing Athena Output when Upsert Partition Fields is true
- JDBC Data Sources: Fixed an issue that could cause it to get stuck and not read any data
- JDBC Connections: Fixed an issue that would allow connections to be created with a concurrency of 0
- Monitoring: Include the actual time an index is ready to be read form in the monitoring delay charts *
- Allow using anonymous credentials to access data in public S3 bucketsA
- AppFlow: Autocomplete buckets and flow names during setup
- Functions:
- Added a Subtract Time Zone Feature to complement Add Time Zone
- UI:
- Show SQL Errors when deploying Outputs
- Show indicative error message when Reference Data file couldn't be found
- Deployment: Allow deploy Upsolver servers to Azure
- Add support for Azure EventsHub data source
- Athena: Create Glue database if doesn't exist
- Functions: Fixed a bug in TO_DATE function
- Function: Added new function: RECORD_TO_JSON
- Query Cluster: Improvements in the underlying files cache
- SQL: Show validation error when mapping an array to unrelated path
- SQL: Show validation error when mapping null without specifying type
- API: When creating data source, fixed a bug when previewing large file with tar compression
- API: Fixed high CPU on boot
- Kafka data source: support reading custom kafka headers
- Metastore Ouptut: support running Athena/Qubole output without partitioning by time
- Snowflake Output: support Azure storage as the intermediate storage
- Compute Cluster Infra: optimize threads when running low priority tasks
- ETL: Improved target path inference for some scenarios
- Monitoring Task: fixed failure when one of the monitoring reporters is not avaiable
- SQL: Fixed validation of inline functions in aggregations
- Metastore Output: set the table location to the root path of the output
- Qubole: allow defining if TIMESTAMP fields will be created as TIMESTAMP or BIGINT columns in the table per output
- Qubole: Added feature flag to deprecate the "SET hive.on.master=?" statement
- Elasticsearch Output: Fixed a bug that could cause high memory usage
- Add Amazon AppFlow support
- Zip Function- Added optional field names
- Api - Fixed validation message for Kafka input
- Elastic Search - upgraded client version
- S3 Data source with Parquet Content Format - when the file is not a parquet file, handle it as a parse error
- Added Free plan
- SQL - Fixed a duplication issue when function target name and select target name are the same
- Hive Metastore Output with Upsert keys - Trigger compactions in a better way to avoid compacting in a loop
- SQL - Fixed target path inferrence of key columns with inline functions on aggregated outputs
- API - Allow setting higher number of shards in the output than number of execution parallelism in the data source. This will parallel the data by the data source files
- Support "SELECT * " in cloud storage outputs with parquet content format
- API - Fixed a bug that allowed creating more than one draft in the same output
- Show number of sparse fields inside fields tree in inputs and outputs and allow to toggle the filter
- Jdbc data source: use field types from the table definition
- PostgreSQL output: support timestamptz data type
- UI: New modal when adding multiple fields in tabular outputs to prevent cartesian product between unrelated arrays
- No need to specify a target field for filters when creating a filter from the UI
- Some bug fixes in API
- Query Agent - Support round robin
- "No Local API" page - Show "Connection Established" instead of error when able to connect
- Input creation preview - Filter big JSONs and let the user know about it
- Performance improvements in internal cache mechanism
- Performance improvements in Hive Metastore outputs Raw Blame
- Fixed bug that caused Hive Metastore outputs with upserts to stuck after editing a new version
- Avro w/ Schema Registry Content Format: Support Tagged Avro Schema Registry
- Improved target path calculation of inline functions
- Added validation when deploying a draft that the start time is not after the end time of the previous version
- SQL: Disable automatic column name generation
- Support cancelling pending integration
- No Local API Page: Fixed showing "You can't connect" instead of "local DNS resolve" error
- CloudFormation: link to the right region in deploy stack
- Less API Calls to Cloud Storage in order to check completion of tasks
- Calculated Function
TO_DATE
: Changed threshold to not return negative dates - Fixed bug with PostgreSQL outputs not allowing to alter the column types
- Support Workspaces in Clusters
- Catch all errors from GCP / Azure and show in UI
- Hive Metastore Outputs: the column names
year
,month
,day
, andhour
are now reserved
- Big performance improvements for replay in Kinesis & Kafka Data Sources
- Big performance improvements for replay in Hive Metastore Outputs
- Compute Cluster: IO Tasks will now run only on Master cluster and will never run on Replay Cluster
- Compute Cluster: Option to limit number of Elastic IPs allocated for the cluster
- Added
XX_HASH
andSORT_BY
calculated functions - UI : Support literal inputs in aggregations
- Performance improvements to Hive Metastore Outputs
- Fixed bug with very large parquet file outputs used to make servers crash on OOM
- Preview Output will now stop after 15 seconds instead of making the API server hang
- Support Redshift and PostgreSQL in JDBC Data Source
- UI: Output - New Partitions Modal
- SQL now supports target site inference, this fixes a lot of confusing bug when using arrays with calculated functions
- SQL: Fixed bug with throwing 500 errors on missing properties of calculated functions
- Athena Output: new outputs will not nest compaction files for better compatibility support with external systems
- Fixed bug when previewing completed Output with Lookups
- Update Retention validation message is now dismissible
- Regex and Split Content Formats have been added for better compatibility with custom data formats
- MS SQL Server Output