Change log

Change log for SQLake (sqlake.upsolver.com)

Release notes for app.upsolver.com can be found here.

2024.01.16-08.45

Enhancements

  • Replication Jobs:

    • Support AWS Glue Data Catalog as a target. This will create a table and maintain schema evolution in the target for every replicated table

  • Performance improvement in transformations of jobs / materialized views

Bug Fixes

  • Apache Kafka Jobs:

    • Fixed new Kafka ingestion/data sources stalling when reading from the start in certain situations

2024.01.09-14.59

Bug Fixes

  • [BREAKING CHANGE]

    • Aggregated Jobs Update: For aggregated jobs (transformation jobs using GROUP BY) with an unspecified time window in time filters, e.g. WHERE TIME_FILTER() or WHERE $event_time BETWEEN RUN_START_TIME() AND RUN_END_TIME(), the default aggregation window is now set to the job's interval, replacing the previous infinite window default

2024.01.02-13.40

Bug Fixes

  • S3 Data Source: Resolved a race condition that could lead to duplicated ingestion of the same file in scenarios where an S3 data source is used with a date pattern that does not follow lexicographical order

  • Minor Bug Fixes

2023.12.25-09.02

  • Enhancements

    • Placeholder (Dynamic Jobs) support in Transformation Jobs to Glue Data Catalog

  • Bug Fixes

    • Minor Bug Fixes

2023.12.11-01.58

Enhancements

Bug Fixes

  • Minor Bug Fixes

2023.12.06-11.31

Enhancements

  • When using Avro Schema Registry content type with Debezium, support parsing JSON, Timestamp, and Date types

  • Jobs Page:

    • Support Drop job from the UI

    • Support Pause/Resume job via UI and SQL (Pause is not supported for ingestion or transformation jobs that write to a data lake table as a target)

Bug Fixes

  • Jobs List:

    • Fixed Backlog calculation to show more accurate times of the current running jobs

    • Fixed Events Over Time graph for Replication job and capped its range by the job running times

  • Minor bug fixes

2023.11.29-01.59

Enhancements

  • Jobs page: Support the ability to filter by Job Status

Bug Fixes

  • Minor bug fixes

2023.11.27-01.59

2023.11.21-01.59

Enhancements

  • Added support for r7 instance types in Compute Clusters

Bug Fixes

  • Fixed a bug causing recent data from a partitioned table with a primary key to be discarded if the transformation job was filtering by at least one partition field

  • Fixed an issue preventing users from being able to create compute clusters

  • Fixed the replay cluster not being shut down in some situations

  • Fixed an issue of a job reading from a system table

  • Fixed an issue of high CPU in Amazon S3 outputs while viewing Datasets

2023.11.19-08.51

Enhancements

  • CDC Replication Group Jobs:

    • Support new write mode option OPERATION_TYPE_COLUMN for replication group APPEND write mode

Bug Fixes

  • Minor bug fixes

2023.11.14-13.42

Enhancements

  • New status for Failed (Retrying) jobs: If a job is currently failing, for example, it encountered fatal errors that currently prevent or will prevent it from proceeding, the job status will be changed to Failed (Retrying)

  • S3 With SQS: Limit the size of bulk reads from SQS to ensure data is distributed evenly

Bug Fixes

  • Fixed a bug on the jobs page where the Events Over Time graph shows the wrong range

  • Minor bug fixes

2023.11.07-12.30

Enhancements

  • Sidebar redesign and improvements

  • Introduce Datasets:

    • View outputs\destinations schema

    • Monitor data health, freshness, and volume when loaded into the destination

    • Monitor data quality violations

    • List all target tables

Bug Fixes

  • Filtered out the heartbeat table from the CDC status display for Postgres CDC

  • Fixed a bug causing loading of all tasks to fail if a task was created with a start time way in the future

  • Select a.* (without an alias) will behave the same as select a.* as *, directly extracting the nested fields from object a, and returning them as separate columns instead of one object

2023.10.30-11.38

Enhancements

  • Added KEEP_REPLICATION_SLOT option to ingestion and replication jobs that read from PostgreSQL. This preserves the replication slot created when the job is dropped

Bug Fixes

  • Minor bug fixes

2023.10.25-08.55

Enhancements

  • Running queries can now be stopped via the UI with a new Cancel button next to a running query in the Event Log

  • Running queries in Upsolver's Query Engine can now be stopped using the following new SQL syntax: ABORT QUERY <query_id>

  • You can now see what queries are running in Upsolver's Query Engine by querying the system.monitoring.running_queries table

  • Improved performance during replay/initial-processing for outputs to Redshift

Bug Fixes

  • Fixed inconsistent catalog names in system.insights tables for Snowflake outputs

  • Fixed an issue where FLATTEN_PATH wasn't case insensitive with columns from a SELECT * statement

2023.10.17-17.45

Bug Fixes

  • Jobs: Fixed an issue causing NULL values when casting raw columns to multiple different types in the same query

2023.10.11-11.26

Enhancements

Bug Fixes

  • Minor bug fixes

2023.10.04-10.37

Enhancements

  • Apache Kafka and Confluent Kafka connections: support setting SASL username and password with dedicated parameters SASL_USERNAME and SASL_PASSWORD

  • Support new function: REGEXP_EXTRACT

  • Allow the creation of a new cloud integration on a new organization

  • Ingestion wizard: now supports Confluent Cloud as a source

  • Ingestion jobs: now supports Amazon S3 via SQS

  • UI design improvements

Bug Fixes

  • When editing the number of shards in an Apache Kafka output job, Upsolver waits until the previous shards are completed before running the new shards

  • Fixed an issue of Query engine request timed out

2023.09.13-12.16

Enhancements

2023.09.05-11.12

Enhancements

  • Support is now available for using an external Hive Metastore as a catalog

  • PostgreSQL CDC:

    • Tables that aren't included in the publication will not be part of the snapshot

  • Apache Kafka Jobs:

    • When copying data from Kafka topics, names are now treated as globs (stars match any number of chars, and question marks match one char)

  • Elasticsearch Jobs:

    • Write timestamp and date types as ISO-8601 strings in jobs that write to Elasticsearch

  • Support added for il-central-1 region. This region is currently only supported with private VPC deployments

  • Reduced the number of Amazon S3 API calls to lower S3 costs

Bug Fixes

  • Synced transformation jobs with an interval smaller than one of the jobs writing to a source table, that did not read the respective data

  • Minor bug fixes

2023.08.29-15.15

Enhancements

  • Snowflake Jobs:

    • SELECT * will preserve the original case of field names in variant columns

  • SQL: Allow altering EXPOSE_IN_CATALOG property in tables

  • Performance Improvement: Reduce the number of file operations when coordinating future table operations

  • Write Timestamp and Date types as ISO-8601 strings in jobs that write to Elasticsearch

  • Ingestion wizard:

Bug Fixes

2023.08.17-13.30

Enhancements

  • Write Timestamp and Date types as ISO-8601 strings in string output jobs, for example: job to Amazon S3 with format JSON/CSV

  • Write Timestamp and Date types as ISO-8601 in RECORD_TO_JSON function

Bug Fixes

  • Performance improvements in CDC jobs

  • Performance improvements when querying the Upsolver Query Engine

  • Minor bug fixes

2023.08.16-07.42

Enhancements

  • Users can now omit the connection type when specifying a source or target in jobs (e.g. INSERT INTO S3 catalog LOCATION = '...' can be replaced in INSERT INTO catalog LOCATION = '...')

  • Improved the performance of CDC jobs reading from databases with a large number of tables

  • Querying the information schema tables now returns Jobs and Materialized Views in deleting state

  • Elasticsearch Jobs:

    • MERGE jobs now support deleting documents by using the WHEN MATCHED AND ... THEN DELETE syntax

  • Upgraded Avro and Parquet libraries to the latest versions

Bug Fixes

  • Major improvements when reading from a table with a large number of partitions

  • Minor bug fixes

2023.08.02-01.57

Enhancements

Bug Fixes

  • Minor bug fixes

2023.07.31-02.02

Enhancements

Bug Fixes

  • Minor bug fixes

2023.07.27-01.59

Enhancements

Bug Fixes

  • Snowflake Jobs:

    • Fixed an issue with a custom insert/update expression causing the job to fail if the field is also mapped in the select statement

    • On auto-managed tables, Upsolver will not create an extra column if the following conversion happens:

      • Original column is Double and got a value of type Long

      • Original column is Timestamp and got a value of type Date

      • Original column is Varchar

      • Original column is Variant In all other cases, we will create an extra column with the new type as the column name suffix. For example: if a column col was of type Bigint and got a Double value, we will create a column COL_DOUBLE in the Snowflake table

    • Fixed a delay in Materialized View on Job List/Index page

2023.07.19-02.34

Bug Fixes

  • Fixed tree on fields containing dots, e.g. turning {"a\.b": 1} to {"a.b": 1}.

  • Snowflake jobs: changed the file format to copy from Avro to JSON. This fixed an issue when ingesting records with sub-fields that have special characters.

2023.07.13-02.20

Enhancements

  • Added VALUE_INDEX_IN_ROW() - this function receives an element of an array of records and returns the 1-based index of the element position (incrementing regardless of whether the array is nested). Null values are not counted.

  • Added VALUE_INDEX_IN_ARRAY() - this function receives an element of an array of records and returns the 1-based index of the element position (index resets to 1 for each sub-array). Null values are not counted.

  • Ingestion wizard - support added for creating a heartbeat table within the wizard.

Bug Fixes

  • Reduce the frequency of metadata queries to Snowflake in order to reduce the cost of COMPUTE SERVICES charged by Snowflake.

  • [BREAKING CHANGE] Fixed RECORD_TO_JSON on fields containing dots, e.g. turning {"a.b": 1} to {"a\.b": 1}.

2023.07.06-02.18

Bug Fixes

  • Minor bug fixes and improvements

2023.07.04-14.42

Enhancements

  • New UUID() function returns a unique identifier (UUID) string.

  • PostgreSQL CDC: ignore rows from the heartbeat table

  • Upgraded Debezium version from 2.1.3 to 2.2.1

  • Ingestion wizard:

    • Supports compute cluster input (in case the organization has more than one compute cluster)

    • Supports basic expectation

  • The cluster version appears in the UI on the clusters page

  • Snowflake table: Show Variant columns statistics on field level

  • Sign-out is now available from the main screen

Bug Fixes

  • Fixed the conversion of float to double to preserve the perceived semantic value in CDC sources and in data sources that get Avro or Parquet

  • Minor bug fixes and improvements

2023.06.26-03.44

Enhancements

  • Added $row_number system column to transformation jobs

  • [BREAKING CHANGE] Changed $row_number system field from 0-based to 1-based

  • Added $item_index system column, representing the source batch's row index. For example, in S3 sources, it will be the row index in a file

  • S3 outputs now support the inclusion of the shard number in the target path. This allows the use of output shards without overwriting the output files

  • User information and organization name now displayed on the main pages with the ability to switch between organizations

Bug Fixes

  • Fixed a bug in the IS_DUPLICATE function that caused the wrong results when the job is running with an interval higher than 1 minute

  • Fixed a bug reading Avro and Parquet files that caused fields of type Date to be ignored

  • Minor bug fixes

2023.06.19-10.37

Bug Fixes

  • Fixed an issue reading from empty Kafka topics that contain empty partitions

  • Fixed a bug reading Avro files that use a named type more than once

  • Minor bug fixes

Enhancements

  • Snowflake table statistics are now available

2023.06.12-08.57

Bug Fixes

  • Snowflake Merge Jobs: enforce the ON clause expression to prevent creating an array

  • Minor bug fixes

Enhancements

2023.06.05-11.39

Bug Fixes

  • Job status page improvements

  • Minor bug fixes

Enhancements

  • CDC: PostgreSQL with partitioned tables - expose data.full_partition_table_name field specifying the name of the event's original partition

  • Error messages improvements

2023.05.28-18.43

Bug Fixes

  • CASE WHEN now handles NULL as input and returns the ELSE value

  • CDC: Fixed the bug that caused the ingestion of a decimal type column to be converted to binary base64 string

2023.05.17-13.45

Bug Fixes

  • COLUMN_TRANSFORMATIONS with dependencies between them created the wrong name for the nested column

  • Fixed target name column value for Snowflake outputs in the system.information_schema.jobs table

Enhancements

  • Validate that the first parameter in an ARRAY_JOIN is not a literal

  • Ingestion wizard now supports Amazon Kinesis source

2023.05.15-02.23

Bug Fixes

  • Fixed the bug where TABLE_DATA_RETENTION could be disabled by disabling compactions

  • Dropping a table while specifying DELETE_DATA = true did not delete data files written by jobs with RUN_PARALLELISM > 1

  • Parquet Files are now distributed more evenly when ingesting data from Amazon S3 with high execution parallelism

  • Fixed a bug when selecting from large Materialized Views with predicates on key columns would return "Query exceeded input row limit"

  • Fixed a bug where a job reading from information_schema.columns does not write data into a table

  • Fixed a bug where querying system.monitoring.jobs can result in an error

  • BYTES_SUBSTRING position now starts from 1 as SUBSTRING (previously started from 0)

Enhancements

  • New SQL syntax is now supported:

    • SHOW CREATE JOB "Job name"

    • SHOW CREATE TABLE "Table name"

    • SHOW CREATE MATERIALIZED VIEW "MV name"

    • SHOW CREATE CLUSTER "Cluster name"

  • PostgreSQL CDC: Support reading 14+ partitioned tables by the root table name instead of the underlying partition table names

  • Snowflake: Added query tag to queries executed by Upsolver for easier cost tracking

2023.05.04-07.39

Bug Fixes

  • Improved statistics in Job Status

  • Fixed the issue of inviting a member to the organization not working

  • Prevented the creation of sync jobs that read from system tables

  • Fixed a bug in jobs when writing to Amazon S3 with a dynamic location

  • Fixed a bug that caused some columns to be missing when reading from a table

Enhancements

  • Support querying all system tables using the syntax: SELECT $*

  • Information Schema: added atype_evolution column to the system table system.information_schema.columnsto show all previous types of the column

2023.04.27-07.52

Bug Fixes

  • Job Status page bug fixes

  • Improved error messages in the Ingestion Wizard

Enhancements

  • Added $event_date column to all transformation jobs that write to a Managed Upsolver table

  • SQL/AutoComplete: Show aggregation result fields

  • System Tables: added elastic IPs column to system.information_schema.clusters

2023.04.18-07.10

Bug Fixes

  • Fixed an issue collecting field statistics and metadata for large data files with a large number of unique field names

Enhancements

  • Snowflake output job now supports SELECT *: creating and managing the snowflake table.

    • CDC to Snowflake SELECT *: Support ingested JSONB as a variant

  • Allow syntax in JOB: START_FROM = NOW - INTERVAL '6' HOURS

  • Delete intermediate files after copy to Redshift

  • Copy From Features: Add a Deduplication option to the COPY FROM job

  • Added PARSE_JSON_COLUMNS option to CDC COPY FROM jobs. This will parse any JSON typed columns in the database as nested objects in the target table.

  • SQL/AutoComplete: Show aggregation result fields

  • Support partial flattening of arrays in jobs that write to Upsolver tables: FLATTEN_PATHS = (A)

  • Ingestion wizard - Easy Ingest to Snowflake:

    • Step-by-step wizard, no SQL, no data lake tables. Supports significant data quantities, streaming data, and strong ordering of data. Comes with deduplication and field hashing capabilities

  • Execution results and event log experience will store outside the worksheet page. Users can return to the worksheet later and start where they left off

  • Job Status (beta)

2023.03.26-19.27

Bug Fixes

  • Fixed the system catalog name from System to system

  • AvroRegistry content type: Support URL encoded authentication information

  • Snowflake: Support keeping old values on partial updates

  • Fixed "deleting" entities showing up in information_schema tables

Enhancements

  • Show all JDBC jobs on the tree

  • Show "Staging Location" in inspection panel of S3 Copy From jobs with enabled DELETE_AFTER_LOAD option

  • Add editor shortcuts to increase/decrease the font size CMD+/- on mac

  • S3 output file Type options (set delimiter for S3 outputs)

  • SQLake S3 output: Allow overwrite

  • Expose editor shortcuts in the help panel widget

  • Display the original file path in the "copy from job" info

  • Functions: New function: RECORD_TO_JSON

2023.03.15-10.04

Bug Fixes

  • JDBC Outputs: delete intermediate files after being written to the DB

Enhancements

  • The cluster catalog is now visible in the tree

  • Gather all system entities in the tree under a catalog named "System"

  • Improved AS OF syntax

  • Auto complete on Jobs

  • Daily usage graph and report are available

  • Ability to decide which query engine to use to run a select statement (Athena/Upsolver)

  • Put all System entities under a catalog named "System"

  • Support AS of syntax

  • Information Schema: Add a table for columns

2023.03.09-13.48

Bug Fixes

  • JDBC Outputs: delete intermediate files after being written to the DB

Enhancements

  • Cluster catalog is now visible in the tree

  • Gather all system entities in the tree under a catalog named "System"

  • Improved AS OF syntax

  • Auto complete on Jobs

2023.02.26-15.40

Bug Fixes

  • Fixed Kafka batcher tasks getting stuck when reading with a wildcard topic and deleting all the topics in Kafka

Enhancements

  • Show information schema catalog on the tree

  • Auto Complete on Information Schema tables and columns

  • Allow creating jobs from Information Schema tables

  • Add support for Timestamp, Date, and Decimal types in CDC and AVRO sources

  • Added support for bigserial in Postgres outputs

  • Support EXCLUDE_COLUMNS for a COPY FROM (ingestion) job

  • Memory allocation optimizations in Lookup Table Query servers

2023.02.19-15.15

Bug fixes

  • Fixed an issue when creating a Kafka Data Source with glob pattern that doesn't match any topics would cause no response in the API.

Enhancements

  • Upgrade debezium to V2.1.2

  • Support transformation job to PostgreSQL

  • Expose security information within the app to allow easier AWS configuration to connect your own data

  • Memory allocation optimizations in Lookup Table Query servers

  • Allow to alter Materialized View COMMENT.

  • Display managed entities in the tree even when can't connect to Athena.

  • Support ignoring fields in COPY FROM jobs by specifying the EXCLUDE_COLUMNS option.

2023.02.12-15.57

Bug fixes

  • Fixed a rare issue that can cause duplicate data to be loaded into Redshift after copy failures

Enhancements

  • Use regional STS endpoints if available

  • Indication on an executed statement in the editor, successs/failed

  • Column appears immediately on the tree on creating transformation job

  • Support CAST expression in the language

  • Renamed function TO_LONG to TO_BIGINT

  • SELECT * now returns columns from joined Materialized View

2023.01.31-15.19

  • Enhancements

    • Add support for information_schema queries

    • Add support for SKIP_VALIDATION and SKIP_ALL_VALIDATION options.

      • DEPRECATION: ALLOW_EMPTY_SOURCES will be deprecated in favor of the new options.

    • Added validation to prevent explicitly mapping fields with different data type to the defined output table columns.

    • Bug Fixes

    • Add support for hierarchical system columns in the tree

2023.01.22-16.42

  • Enhancements

    • Support using non fully qualified names for tables and materialized views

    • Improve error message when trying to create a table with the same name as existing one

    • Support querying without WHERE statement (Infinite Window)

    • S3 Output Job: Support split files to folders

    • Home page redesign

    • Cluster management tab in the UI

  • Bug Fixes

    • Alter Cluster: Fix Alter to null not set default values

    • Transformation Jobs: Fix missing columns mapping validations for partition and key columns

    • API: Support join materialized view with array

2023.01.16-14.02

  • Enhancements

    • Jobs monitoring will now show materialized views

    • API: Join with materialized view is no longer requiring alias for the mv

    • Auto completion for CREATE TABLE options

    • Added a new System Table jobs.transform_job_state that provides a summary status of all running transform jobs.

    • Remove RETENTION property from all transformation jobs

    • Improved query results tab

  • Bug Fixes

    • Fixed slow loading of Schemas under Athena connections in the tree

    • Job SQL Statement in Inspection Pane doesn't omit parenthesis when they're required

    • Monitoring: Fixed the 'job_name' of aggregation stages to be the original 'job_name' instead of "Output Aggregation". This means logs in the System Table 'logs.task_executions' will now have a correct 'job_name' for aggregation stages.

    • Fixed MAP_COLUMNS_BY_NAME is not needed for S3, Elasticsearch targets

2023.01.10-20.52

  • Enhancements

    • New system columns added: $source_id , $shard_number , $row_number

    • Support Time Travel in joins!

    • Support running a SELECT query without a FROM clause

    • Support running a SELECT query reading from an Upsolver Classic Data Source

    • Support running UNNEST queries

    • Support selecting columns by their fully qualified name (e.g. catalog.schema.table.column)

    • Support select System Columns with glob patterns

  • Bug Fixes

    • Fixed slow replay progress for Snowflake and PostgreSQL outputs

2023.01.03-339

  • Enhancements

    • Event log improvements + Present informative diagram for copy/transform jobs

    • Pipeline Monitoring: Expose filtered rows due to missing PK or Partition Column

    • CLI: Show only message on DDL commands success

    • Support using Classic Data Sources

2022.12.29-325

  • Enhancements

    • Inviting a user to an organization is now supported

    • Expose the column data type in the tree

    • Improved CLI experience

    • New SQLake templates added (CDC MySQL, CDC PostgreSQL, Elasticsearch, Snowflake, Redshift)

2022.12.18-235

  • Enhancements

    • Support Transformation Jobs to Elastic Search

2022.12.15-638

  • Enhancements

    • PostgreSQL CDC: Moved TABLE_INCLUDE_LIST and COLUMN_EXCLUDE_LIST from job options to source definition

    • MySQL CDC: Moved TABLE_INCLUDE_LIST and COLUMN_EXCLUDE_LIST from job options to source definition

    • API token management

2022.12.05-201

  • Enhancements

    • Support Transformation Jobs to S3

    • Support Copy From PostgreSQL Jobs

    • New Home Page

    • Support private VPC integration

2022.11.29-164

  • Enhancements

    • New system tables: running_tasks, failing_tasks

    • Support Copy From MySQL Jobs

    • Support Transformation Jobs to Redshift

2022.11.17-118

  • Features

    • New system tables were added:

      • running_tasks

      • failing_tasks

      • copy_from_job_status

  • Changes

    • Preview is now limited to a fixed amount of input rows. Queries that are too large for preview will be aborted

  • Bug Fixes

    • Jobs: transformation jobs with an interval larger than one minute did not handle cases where the start time or end time of the job was not fully aligned with that interval

Last updated