Change log
Change log for SQLake (sqlake.upsolver.com)
Release notes for app.upsolver.com can be found here.
2024.01.16-08.45
Enhancements
Support AWS Glue Data Catalog as a target. This will create a table and maintain schema evolution in the target for every replicated table
Performance improvement in transformations of jobs / materialized views
Bug Fixes
Fixed new Kafka ingestion/data sources stalling when reading from the start in certain situations
2024.01.09-14.59
Bug Fixes
[BREAKING CHANGE]
Aggregated Jobs Update: For aggregated jobs (transformation jobs using GROUP BY) with an unspecified time window in time filters, e.g.
WHERE TIME_FILTER()
orWHERE $event_time BETWEEN RUN_START_TIME() AND RUN_END_TIME()
, the default aggregation window is now set to the job's interval, replacing the previous infinite window default
2024.01.02-13.40
Bug Fixes
S3 Data Source: Resolved a race condition that could lead to duplicated ingestion of the same file in scenarios where an S3 data source is used with a date pattern that does not follow lexicographical order
Minor Bug Fixes
2023.12.25-09.02
Enhancements
Placeholder (Dynamic Jobs) support in Transformation Jobs to Glue Data Catalog
Bug Fixes
Minor Bug Fixes
2023.12.11-01.58
Enhancements
CDC Jobs:
Incremental snapshot is now supported as the default behavior for the initial snapshot.
Bug Fixes
Minor Bug Fixes
2023.12.06-11.31
Enhancements
When using Avro Schema Registry content type with Debezium, support parsing JSON, Timestamp, and Date types
Jobs Page:
Support Drop job from the UI
Support Pause/Resume job via UI and SQL (Pause is not supported for ingestion or transformation jobs that write to a data lake table as a target)
Bug Fixes
Jobs List:
Fixed Backlog calculation to show more accurate times of the current running jobs
Fixed Events Over Time graph for Replication job and capped its range by the job running times
Minor bug fixes
2023.11.29-01.59
Enhancements
Jobs page: Support the ability to filter by Job Status
Bug Fixes
Minor bug fixes
2023.11.27-01.59
Breaking Change Notice: Deprecation of the
CREATE_TABLE_IF_MISSING
job option, specifically for copying into data lake tables. Please read this notice for more information.
2023.11.21-01.59
Enhancements
Added support for r7 instance types in Compute Clusters
Bug Fixes
Fixed a bug causing recent data from a partitioned table with a primary key to be discarded if the transformation job was filtering by at least one partition field
Fixed an issue preventing users from being able to create compute clusters
Fixed the replay cluster not being shut down in some situations
Fixed an issue of a job reading from a system table
Fixed an issue of high CPU in Amazon S3 outputs while viewing Datasets
2023.11.19-08.51
Enhancements
CDC Replication Group Jobs:
Support new write mode option
OPERATION_TYPE_COLUMN
for replication groupAPPEND
write mode
Bug Fixes
Minor bug fixes
2023.11.14-13.42
Enhancements
New status for Failed (Retrying) jobs: If a job is currently failing, for example, it encountered fatal errors that currently prevent or will prevent it from proceeding, the job status will be changed to Failed (Retrying)
S3 With SQS: Limit the size of bulk reads from SQS to ensure data is distributed evenly
Bug Fixes
Fixed a bug on the jobs page where the Events Over Time graph shows the wrong range
Minor bug fixes
2023.11.07-12.30
Enhancements
Sidebar redesign and improvements
Introduce Datasets:
View outputs\destinations schema
Monitor data health, freshness, and volume when loaded into the destination
Monitor data quality violations
List all target tables
Bug Fixes
Filtered out the heartbeat table from the CDC status display for Postgres CDC
Fixed a bug causing loading of all tasks to fail if a task was created with a start time way in the future
Select
a.*
(without an alias) will behave the same asselect a.* as *
, directly extracting the nested fields from object a, and returning them as separate columns instead of one object
2023.10.30-11.38
Enhancements
Added
KEEP_REPLICATION_SLOT
option to ingestion and replication jobs that read from PostgreSQL. This preserves the replication slot created when the job is dropped
Bug Fixes
Minor bug fixes
2023.10.25-08.55
Enhancements
Running queries can now be stopped via the UI with a new Cancel button next to a running query in the Event Log
Running queries in Upsolver's Query Engine can now be stopped using the following new SQL syntax:
ABORT QUERY <query_id>
You can now see what queries are running in Upsolver's Query Engine by querying the
system.monitoring.running_queries
tableImproved performance during replay/initial-processing for outputs to Redshift
Bug Fixes
Fixed inconsistent catalog names in
system.insights
tables for Snowflake outputsFixed an issue where
FLATTEN_PATH
wasn't case insensitive with columns from aSELECT *
statement
2023.10.17-17.45
Bug Fixes
Jobs: Fixed an issue causing NULL values when casting raw columns to multiple different types in the same query
2023.10.11-11.26
Enhancements
Improved query performance when selecting from the system.monitoring.jobs table
Added two new system tables for viewing written column statistics: system.insights.dataset_column_stats and system.insights.job_output_column_stats
Enabled integrating monitoring data from Upsolver into third-party systems to ensure reliability and performance of data pipelines:
Support for Amazon CloudWatch
Support for Datadog
Bug Fixes
Minor bug fixes
2023.10.04-10.37
Enhancements
Apache Kafka and Confluent Kafka connections: support setting
SASL
username and password with dedicated parametersSASL_USERNAME
andSASL_PASSWORD
Support new function:
REGEXP_EXTRACT
Allow the creation of a new cloud integration on a new organization
Ingestion wizard: now supports Confluent Cloud as a source
Ingestion jobs: now supports Amazon S3 via SQS
UI design improvements
Bug Fixes
When editing the number of shards in an Apache Kafka output job, Upsolver waits until the previous shards are completed before running the new shards
Fixed an issue of Query engine request timed out
2023.09.13-12.16
Enhancements
Support creating query clusters and attaching materialized views to query clusters in order to query them via HTTP API
Add
IF EXISTS
syntax toDROP
statements, e.g.DROP TABLE IF EXISTS "my_table"
, to prevent the statement from failing if an entity does not exist. Applies toDROP CLUSTER
,DROP CONNECTION
,DROP TABLE
,DROP JOB
, andDROP MATERIALIZED VIEW
.Snowflake Jobs:
Create the Snowflake table when there are no dynamic columns and the
CREATE_TABLE_IF_MISSING
option is TRUE
2023.09.05-11.12
Enhancements
Support is now available for using an external Hive Metastore as a catalog
PostgreSQL CDC:
Tables that aren't included in the publication will not be part of the snapshot
Apache Kafka Jobs:
When copying data from Kafka topics, names are now treated as globs (stars match any number of chars, and question marks match one char)
Elasticsearch Jobs:
Write
timestamp
anddate
types as ISO-8601 strings in jobs that write to Elasticsearch
Support added for il-central-1 region. This region is currently only supported with private VPC deployments
Reduced the number of Amazon S3 API calls to lower S3 costs
Bug Fixes
Synced transformation jobs with an interval smaller than one of the jobs writing to a source table, that did not read the respective data
Minor bug fixes
2023.08.29-15.15
Enhancements
Snowflake Jobs:
SELECT *
will preserve the original case of field names in variant columns
SQL: Allow altering
EXPOSE_IN_CATALOG
property in tablesPerformance Improvement: Reduce the number of file operations when coordinating future table operations
Write
Timestamp
andDate
types as ISO-8601 strings in jobs that write to ElasticsearchIngestion wizard:
Support CDC from MongoDB source (Preview)
Bug Fixes
Jobs:
When using
MAP_COLUMNS_BY_NAME
, theEXCEPT
columns list was fixed to be case-insensitive
2023.08.17-13.30
Enhancements
Write
Timestamp
andDate
types as ISO-8601 strings in string output jobs, for example: job to Amazon S3 with format JSON/CSVWrite
Timestamp
andDate
types as ISO-8601 inRECORD_TO_JSON
functionIngestion wizard: Support CDC from Microsoft SQL Server source (Preview)
Bug Fixes
Performance improvements in CDC jobs
Performance improvements when querying the Upsolver Query Engine
Minor bug fixes
2023.08.16-07.42
Enhancements
Users can now omit the connection type when specifying a source or target in jobs (e.g.
INSERT INTO S3 catalog LOCATION = '...'
can be replaced inINSERT INTO catalog LOCATION = '...'
)Improved the performance of CDC jobs reading from databases with a large number of tables
Querying the information schema tables now returns Jobs and Materialized Views in deleting state
Elasticsearch Jobs:
MERGE jobs now support deleting documents by using the
WHEN MATCHED AND
...THEN DELETE
syntax
Upgraded Avro and Parquet libraries to the latest versions
Bug Fixes
Major improvements when reading from a table with a large number of partitions
Minor bug fixes
2023.08.02-01.57
Enhancements
CDC Jobs:
Elasticsearch Jobs:
Support deleting documents with MERGE jobs
Bug Fixes
Minor bug fixes
2023.07.31-02.02
Enhancements
Snowflake Jobs:
Support setting
COMMIT_INTERVAL
. This allows configuring different intervals for processing the job and writing to Snowflake
Bug Fixes
Minor bug fixes
2023.07.27-01.59
Enhancements
Elasticsearch Jobs:
Support setting routing (
_routing
) by using the new propertyROUTING_FIELD_NAME
Added a new option to
INDEX_PARTITION_SIZE
property:NONE
. This allows us to write to a single index name
CDC Jobs:
Add the ability to snapshot multiple tables at once (Microsoft SQL Server, MySQL, PostgreSQL)
Bug Fixes
Snowflake Jobs:
Fixed an issue with a custom insert/update expression causing the job to fail if the field is also mapped in the select statement
On auto-managed tables, Upsolver will not create an extra column if the following conversion happens:
Original column is Double and got a value of type Long
Original column is Timestamp and got a value of type Date
Original column is Varchar
Original column is Variant In all other cases, we will create an extra column with the new type as the column name suffix. For example: if a column
col
was of type Bigint and got a Double value, we will create a columnCOL_DOUBLE
in the Snowflake table
Fixed a delay in Materialized View on Job List/Index page
2023.07.19-02.34
Bug Fixes
Fixed tree on fields containing dots, e.g. turning
{"a\.b": 1}
to{"a.b": 1}
.Snowflake jobs: changed the file format to copy from Avro to JSON. This fixed an issue when ingesting records with sub-fields that have special characters.
2023.07.13-02.20
Enhancements
Added
VALUE_INDEX_IN_ROW()
- this function receives an element of an array of records and returns the 1-based index of the element position (incrementing regardless of whether the array is nested). Null values are not counted.Added
VALUE_INDEX_IN_ARRAY()
- this function receives an element of an array of records and returns the 1-based index of the element position (index resets to 1 for each sub-array). Null values are not counted.Ingestion wizard - support added for creating a heartbeat table within the wizard.
Bug Fixes
Reduce the frequency of metadata queries to Snowflake in order to reduce the cost of COMPUTE SERVICES charged by Snowflake.
[BREAKING CHANGE] Fixed
RECORD_TO_JSON
on fields containing dots, e.g. turning{"a.b": 1}
to{"a\.b": 1}
.
2023.07.06-02.18
Bug Fixes
Minor bug fixes and improvements
2023.07.04-14.42
Enhancements
New
UUID()
function returns a unique identifier (UUID) string.PostgreSQL CDC: ignore rows from the heartbeat table
Upgraded Debezium version from 2.1.3 to 2.2.1
Ingestion wizard:
Supports compute cluster input (in case the organization has more than one compute cluster)
Supports basic expectation
The cluster version appears in the UI on the clusters page
Snowflake table: Show Variant columns statistics on field level
Sign-out is now available from the main screen
Bug Fixes
Fixed the conversion of float to double to preserve the perceived semantic value in CDC sources and in data sources that get Avro or Parquet
Minor bug fixes and improvements
2023.06.26-03.44
Enhancements
Added
$row_number
system column to transformation jobs[BREAKING CHANGE] Changed
$row_number
system field from 0-based to 1-basedAdded
$item_index
system column, representing the source batch's row index. For example, in S3 sources, it will be the row index in a fileS3 outputs now support the inclusion of the shard number in the target path. This allows the use of output shards without overwriting the output files
User information and organization name now displayed on the main pages with the ability to switch between organizations
Bug Fixes
Fixed a bug in the
IS_DUPLICATE
function that caused the wrong results when the job is running with an interval higher than 1 minuteFixed a bug reading Avro and Parquet files that caused fields of type
Date
to be ignoredMinor bug fixes
2023.06.19-10.37
Bug Fixes
Fixed an issue reading from empty Kafka topics that contain empty partitions
Fixed a bug reading Avro files that use a named type more than once
Minor bug fixes
Enhancements
Snowflake table statistics are now available
2023.06.12-08.57
Bug Fixes
Snowflake Merge Jobs: enforce the
ON
clause expression to prevent creating an arrayMinor bug fixes
Enhancements
Ingestion wizard - easy ingest to Snowflake, including:
Job monitoring improvements
2023.06.05-11.39
Bug Fixes
Job status page improvements
Minor bug fixes
Enhancements
CDC: PostgreSQL with partitioned tables - expose
data.full_partition_table_name
field specifying the name of the event's original partitionError messages improvements
2023.05.28-18.43
Bug Fixes
CASE WHEN
now handlesNULL
as input and returns theELSE
valueCDC: Fixed the bug that caused the ingestion of a decimal type column to be converted to binary base64 string
2023.05.17-13.45
Bug Fixes
COLUMN_TRANSFORMATIONS
with dependencies between them created the wrong name for the nested columnFixed target name column value for Snowflake outputs in the
system.information_schema.jobs
table
Enhancements
Validate that the first parameter in an
ARRAY_JOIN
is not a literalIngestion wizard now supports Amazon Kinesis source
2023.05.15-02.23
Bug Fixes
Fixed the bug where
TABLE_DATA_RETENTION
could be disabled by disabling compactionsDropping a table while specifying
DELETE_DATA = true
did not delete data files written by jobs withRUN_PARALLELISM > 1
Parquet Files are now distributed more evenly when ingesting data from Amazon S3 with high execution parallelism
Fixed a bug when selecting from large Materialized Views with predicates on key columns would return "Query exceeded input row limit"
Fixed a bug where a job reading from
information_schema.columns
does not write data into a tableFixed a bug where querying
system.monitoring.jobs
can result in an errorBYTES_SUBSTRING
position now starts from 1 asSUBSTRING
(previously started from 0)
Enhancements
New SQL syntax is now supported:
SHOW CREATE JOB "Job name"
SHOW CREATE TABLE "Table name"
SHOW CREATE MATERIALIZED VIEW "MV name"
SHOW CREATE CLUSTER "Cluster name"
PostgreSQL CDC: Support reading 14+ partitioned tables by the root table name instead of the underlying partition table names
Snowflake: Added query tag to queries executed by Upsolver for easier cost tracking
2023.05.04-07.39
Bug Fixes
Improved statistics in Job Status
Fixed the issue of inviting a member to the organization not working
Prevented the creation of sync jobs that read from system tables
Fixed a bug in jobs when writing to Amazon S3 with a dynamic location
Fixed a bug that caused some columns to be missing when reading from a table
Enhancements
Support querying all system tables using the syntax:
SELECT $*
Information Schema: added a
type_evolution
column to the system tablesystem.information_schema.columns
to show all previous types of the column
2023.04.27-07.52
Bug Fixes
Job Status page bug fixes
Improved error messages in the Ingestion Wizard
Enhancements
Added $event_date column to all transformation jobs that write to a Managed Upsolver table
SQL/AutoComplete: Show aggregation result fields
System Tables: added elastic IPs column to system.information_schema.clusters
2023.04.18-07.10
Bug Fixes
Fixed an issue collecting field statistics and metadata for large data files with a large number of unique field names
Enhancements
Snowflake output job now supports SELECT *: creating and managing the snowflake table.
CDC to Snowflake SELECT *: Support ingested JSONB as a variant
Allow syntax in JOB: START_FROM = NOW - INTERVAL '6' HOURS
Delete intermediate files after copy to Redshift
Copy From Features: Add a Deduplication option to the COPY FROM job
Added PARSE_JSON_COLUMNS option to CDC COPY FROM jobs. This will parse any JSON typed columns in the database as nested objects in the target table.
SQL/AutoComplete: Show aggregation result fields
Support partial flattening of arrays in jobs that write to Upsolver tables: FLATTEN_PATHS = (A)
Ingestion wizard - Easy Ingest to Snowflake:
Step-by-step wizard, no SQL, no data lake tables. Supports significant data quantities, streaming data, and strong ordering of data. Comes with deduplication and field hashing capabilities
Execution results and event log experience will store outside the worksheet page. Users can return to the worksheet later and start where they left off
Job Status (beta)
2023.03.26-19.27
Bug Fixes
Fixed the system catalog name from
System
tosystem
AvroRegistry content type: Support URL encoded authentication information
Snowflake: Support keeping old values on partial updates
Fixed "deleting" entities showing up in information_schema tables
Enhancements
Show all JDBC jobs on the tree
Show "Staging Location" in inspection panel of S3 Copy From jobs with enabled DELETE_AFTER_LOAD option
Add editor shortcuts to increase/decrease the font size CMD+/- on mac
S3 output file Type options (set delimiter for S3 outputs)
SQLake S3 output: Allow overwrite
Expose editor shortcuts in the help panel widget
Display the original file path in the "copy from job" info
Functions: New function: RECORD_TO_JSON
2023.03.15-10.04
Bug Fixes
JDBC Outputs: delete intermediate files after being written to the DB
Enhancements
The cluster catalog is now visible in the tree
Gather all system entities in the tree under a catalog named "System"
Improved AS OF syntax
Auto complete on Jobs
Daily usage graph and report are available
Ability to decide which query engine to use to run a select statement (Athena/Upsolver)
Put all System entities under a catalog named "System"
Support AS of syntax
Information Schema: Add a table for columns
2023.03.09-13.48
Bug Fixes
JDBC Outputs: delete intermediate files after being written to the DB
Enhancements
Cluster catalog is now visible in the tree
Gather all system entities in the tree under a catalog named "System"
Improved AS OF syntax
Auto complete on Jobs
2023.02.26-15.40
Bug Fixes
Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka
Enhancements
Show information schema catalog on the tree
Auto Complete on Information Schema tables and columns
Allow creating jobs from Information Schema tables
Add support for Timestamp, Date, and Decimal types in CDC and AVRO sources
Added support for bigserial in Postgres outputs
Support EXCLUDE_COLUMNS for a COPY FROM (ingestion) job
Memory allocation optimizations in Lookup Table Query servers
2023.02.19-15.15
Bug fixes
Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API.
Enhancements
Upgrade debezium to V2.1.2
Support transformation job to PostgreSQL
Expose security information within the app to allow easier AWS configuration to connect your own data
Memory allocation optimizations in Lookup Table Query servers
Allow to alter Materialized View COMMENT.
Display managed entities in the tree even when can't connect to Athena.
Support ignoring fields in COPY FROM jobs by specifying the EXCLUDE_COLUMNS option.
2023.02.12-15.57
Bug fixes
Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures
Enhancements
Use regional STS endpoints if available
Indication on an executed statement in the editor, successs/failed
Column appears immediately on the tree on creating transformation job
Support
CAST
expression in the languageRenamed function TO_LONG to TO_BIGINT
SELECT * now returns columns from joined Materialized View
2023.01.31-15.19
Enhancements
Add support for
information_schema
queriesAdd support for
SKIP_VALIDATION
andSKIP_ALL_VALIDATION
options.DEPRECATION:
ALLOW_EMPTY_SOURCES
will be deprecated in favor of the new options.
Added validation to prevent explicitly mapping fields with different data type to the defined output table columns.
Bug Fixes
Add support for hierarchical system columns in the tree
2023.01.22-16.42
Enhancements
Support using non fully qualified names for tables and materialized views
Improve error message when trying to create a table with the same name as existing one
Support querying without WHERE statement (Infinite Window)
S3 Output Job: Support split files to folders
Home page redesign
Cluster management tab in the UI
Bug Fixes
Alter Cluster: Fix Alter to null not set default values
Transformation Jobs: Fix missing columns mapping validations for partition and key columns
API: Support join materialized view with array
2023.01.16-14.02
Enhancements
Jobs monitoring will now show materialized views
API: Join with materialized view is no longer requiring alias for the mv
Auto completion for CREATE TABLE options
Added a new System Table
jobs.transform_job_state
that provides a summary status of all running transform jobs.Remove RETENTION property from all transformation jobs
Improved query results tab
Bug Fixes
Fixed slow loading of Schemas under Athena connections in the tree
Job SQL Statement in Inspection Pane doesn't omit parenthesis when they're required
Monitoring: Fixed the 'job_name' of aggregation stages to be the original 'job_name' instead of "Output Aggregation". This means logs in the System Table 'logs.task_executions' will now have a correct 'job_name' for aggregation stages.
Fixed
MAP_COLUMNS_BY_NAME
is not needed for S3, Elasticsearch targets
2023.01.10-20.52
Enhancements
New system columns added: $source_id , $shard_number , $row_number
Support Time Travel in joins!
Support running a SELECT query without a FROM clause
Support running a SELECT query reading from an Upsolver Classic Data Source
Support running UNNEST queries
Support selecting columns by their fully qualified name (e.g. catalog.schema.table.column)
Support select System Columns with glob patterns
Bug Fixes
Fixed slow replay progress for Snowflake and PostgreSQL outputs
2023.01.03-339
Enhancements
Event log improvements + Present informative diagram for copy/transform jobs
Pipeline Monitoring: Expose filtered rows due to missing PK or Partition Column
CLI: Show only message on DDL commands success
Support using Classic Data Sources
2022.12.29-325
Enhancements
Inviting a user to an organization is now supported
Expose the column data type in the tree
Improved CLI experience
New SQLake templates added (CDC MySQL, CDC PostgreSQL, Elasticsearch, Snowflake, Redshift)
2022.12.18-235
Enhancements
Support Transformation Jobs to Elastic Search
2022.12.15-638
Enhancements
PostgreSQL CDC: Moved TABLE_INCLUDE_LIST and COLUMN_EXCLUDE_LIST from job options to source definition
MySQL CDC: Moved TABLE_INCLUDE_LIST and COLUMN_EXCLUDE_LIST from job options to source definition
API token management
2022.12.05-201
Enhancements
Support Transformation Jobs to S3
Support Copy From PostgreSQL Jobs
New Home Page
Support private VPC integration
2022.11.29-164
Enhancements
New system tables: running_tasks, failing_tasks
Support Copy From MySQL Jobs
Support Transformation Jobs to Redshift
2022.11.17-118
Features
New system tables were added:
running_tasks
failing_tasks
copy_from_job_status
Changes
Preview is now limited to a fixed amount of input rows. Queries that are too large for preview will be aborted
Bug Fixes
Jobs: transformation jobs with an interval larger than one minute did not handle cases where the start time or end time of the job was not fully aligned with that interval
Last updated