LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Reference
Reference
  • Reference
  • ✨Learning Paths
    • Amazon Kinesis
    • Amazon S3
    • Apache Kafka
    • Confluent Cloud
    • Microsoft SQL Server
    • MongoDB
    • MySQL
    • PostgreSQL
  • SQL COMMANDS
    • Clusters
      • CREATE CLUSTER
      • ALTER CLUSTER
      • ROLL CLUSTER
      • STOP/START CLUSTER
      • DROP CLUSTER
    • Connections
      • CREATE CONNECTION
        • Amazon Kinesis
        • Amazon Redshift
        • Amazon S3
        • Apache Kafka
        • AWS Glue Data Catalog
        • ClickHouse
        • Confluent Cloud
        • Elasticsearch
        • Hive Metastore
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • Polaris Catalog
        • Iceberg REST catalogs
        • PostgreSQL
        • Snowflake
      • ALTER CONNECTION
      • DROP CONNECTION
      • CDC Connections with SSL
    • Jobs
      • CREATE JOB
        • Ingestion
          • Amazon Kinesis
          • Amazon S3
          • Apache Kafka
          • Confluent Kafka
          • Content Types
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Replication
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Transformation
          • INSERT
            • MAP_COLUMNS_BY_NAME
            • UNNEST
          • MERGE
          • SELECT
          • Job Options
            • Amazon Redshift
            • Amazon S3
            • Apache Iceberg
            • ClickHouse
            • Data Lake Tables
            • Elasticsearch
            • PostgreSQL
            • Snowflake
        • Monitoring
          • Amazon CloudWatch
          • Datadog
          • Dynatrace
      • ALTER JOB
      • PAUSE / RESUME JOB
      • DROP JOB
    • Materialized Views
      • CREATE MATERIALIZED VIEW
      • ALTER MATERIALIZED VIEW
      • DROP MATERIALIZED VIEW
      • Join with a Materialized View
    • Tables
      • CREATE TABLE
      • ALTER TABLE
      • DROP TABLE
    • Iceberg Tables
      • Upsolver Managed Tables
        • CREATE ICEBERG TABLE
        • ALTER ICEBERG TABLE
        • DROP ICEBERG TABLE
        • OPTIMIZE ICEBERG TABLE
      • External Iceberg Tables
        • CREATE EXTERNAL ICEBERG TABLE
        • DROP EXTERNAL ICEBERG TABLE
      • Mirror Iceberg Tables
        • CREATE MIRROR
        • ALTER MIRROR
        • PAUSE / RESUME MIRROR
        • DROP MIRROR
  • FUNCTIONS & OPERATORS
    • Data Types
    • Functions
      • Aggregate
        • APPROX_COUNT_DISTINCT
        • APPROX_COUNT_DISTINCT_EACH
        • AVG
        • AVG_EACH
        • AVG_TIME_SERIES
        • COLLECT_SET
        • COLLECT_SET_EACH
        • COUNT
        • COUNT(*)
        • COUNT(DISTINCT ...)
        • COUNT_EACH
        • COUNT_IF
        • DYNAMIC_SESSIONS
        • FIRST
        • FIRST_ARRAY
        • FIRST_EACH
        • FIRST_TIME_SERIES
        • LAST
        • LAST_ARRAY
        • LAST_EACH
        • LAST_K
        • LAST_K_EACH
        • LAST_TIME_SERIES
        • MAX
        • MAX_BY
        • MAX_EACH
        • MAX_TIME_SERIES
        • MIN
        • MIN_BY
        • MIN_EACH
        • MIN_TIME_SERIES
        • SESSION_COUNT
        • STD_DEV
        • STD_DEV_EACH
        • STRING_MAX
        • STRING_MAX_EACH
        • STRING_MIN_EACH
        • SUM
        • SUM_EACH
        • SUM_TIME_SERIES
        • WEIGHTED_AVERAGE
      • Array
        • ARRAY_DISTINCT
        • ARRAY_JOIN
        • ARRAY_MAX
        • ARRAY_MIN
        • ARRAY_SORT
        • ARRAY_SORT_DESC
        • ARRAY_SUM
        • COUNT_VALUES_IF
        • COUNT_VALUES
        • ELEMENT_AT
        • FIRST_ELEMENT
        • LAST_ELEMENT
        • VALUE_INDEX_IN_ARRAY
        • VALUE_INDEX_IN_ROW
      • Comparison
        • GREATEST
        • LEAST
      • Conditional
        • COALESCE
        • IF_ELSE
        • NULL_IF
      • Date & Time
        • ADD_TIME_ZONE_OFFSET
        • DATE
        • DATE_ADD
        • DATE_DIFF
        • DATE_TRUNC
        • DAY
        • DAY_OF_WEEK
        • DAY_OF_YEAR
        • EXTRACT_TIMESTAMP
        • EXTRACT
        • FORMAT_DATETIME
        • FROM_ISO8601_DATE
        • FROM_UNIXTIME
        • HOUR
        • MILLISECOND
        • MINUTE
        • MONTH
        • QUARTER
        • RUN_END_TIME
        • RUN_START_TIME
        • SECOND
        • SUBTRACT_TIME_ZONE_OFFSET
        • TO_UNIX_EPOCH_MILLIS
        • TO_UNIX_EPOCH_SECONDS
        • TO_UNIXTIME
        • WEEK
        • YEAR_OF_WEEK
        • YEAR
      • Filter
        • IS_DUPLICATE
        • NOT
      • Interval
        • PARSE_DURATION
      • Mathematical
        • ABS
        • CBRT
        • CEIL
        • CEILING
        • DEGREES
        • EXP
        • FLOOR
        • GET_SHARD_NUMBER
        • LN
        • LOG
        • LOG2
        • LOG10
        • MOD
        • MODULO
        • POW
        • POWER
        • RADIANS
        • RAND
        • RANDOM
        • RECIPROCAL
        • ROUND
        • SIGN
        • SORT_VALUES
        • SQRT
        • TRUNCATE
      • Regular Expressions
        • REGEXP_EXTRACT
        • REGEXP_EXTRACT_ALL
        • REGEXP_LIKE
        • REGEX_MATCH_POSITION
        • REGEX_NAMED_GROUPS
        • REGEXP_REPLACE
      • Spatial
        • ST_DISTANCE
        • ST_WGS84_DISTANCE
        • WKT_SPATIAL_CONTAINS
        • WKT_SPATIAL_INTERSECT
      • String
        • BASE64_DECODE
        • BASE64_TO_HEX
        • BYTES_SUBSTRING
        • CONCAT
        • DATE
        • JOIN_ARRAYS
        • LENGTH
        • LOWER
        • LPAD
        • LTRIM
        • MD5
        • PARSE_DATETIME
        • REPLACE
        • REVERSE
        • RPAD
        • RTRIM
        • SHA1
        • SHA3_512
        • SHA256
        • SHA512
        • SORT_VALUES
        • SPLIT
        • SPLIT_TO_RECORD
        • STRING_FORMAT
        • STRIP_MARGIN
        • STRIP_PREFIX
        • STRIP_SUFFIX
        • STRPOS
        • SUBSTR
        • SUBSTRING
        • TRANSLATE
        • TRIM_CHARS
        • TRIM
        • UPPER
        • UUID_GENERATOR
        • XX_HASH
      • Structural
        • FROM_KEY_VALUE
        • GET_RANGE
        • JOIN_ALL_BY_KEY
        • JSON_PATH
        • JSON_TO_RECORD
        • MAP_WITH_INDEX
        • QUERY_STRING_TO_RECORD
        • RECORD_TO_JSON
        • SORT_BY
        • TO_ARRAY
        • ZIP_WITH_INDEX
        • ZIP
      • Trigonometric
        • COS
        • SIN
        • TAN
        • TANH
      • Type Conversion
        • CAST
        • CHR
        • DECIMAL_TO_HEX
        • HEX_TO_DECIMAL
        • TO_BIGINT
        • TO_DOUBLE
        • TO_STRING
      • URL
        • TOP_PRIVATE_DOMAIN
        • URL_DECODE
        • URL_ENCODE
        • URL_PARSER
    • Operators
      • Comparison
      • Conditional
        • CASE
      • Logical
      • Mathematical
      • String
  • MONITORING
    • Clusters
    • Datasets
      • Ingested Data
        • Column
      • Lineage
      • Data Violations
      • Statistics
      • Maintenance
        • Compactions
        • Expire Snapshots
        • Orphan Files
      • Columns
      • Partitions
      • Properties
    • Job Status
      • Stream & File Sources
        • Monitoring
        • Graphs
        • Lineage
        • Settings
      • CDC Sources
        • Monitoring
        • Replication Settings
        • Job Settings
    • System Catalog
      • Information Schema
        • Clusters
        • Columns
        • Connections
        • Jobs
        • Mirrors
        • Tables
        • Users
        • Views
      • Insights
        • job_output_column_stats
        • dataset_column_stats
      • Monitoring
        • CDC Status
        • Clusters
        • Expectations
        • Jobs
        • Partition Statistics
        • Recent Compactions
        • Running Queries
        • Table Statistics
      • Task Executions Table
  • GENERAL
    • Common SQL Syntax
    • View Entity Syntax
    • Keyboard Shortcuts
Powered by GitBook
On this page
  • Cluster Types
  • Compute Cluster
  • Query Cluster
  • Syntax
  • Cluster Options
  1. SQL COMMANDS
  2. Clusters

CREATE CLUSTER

Cluster Types

Compute Cluster

A compute cluster is a group of servers that is used to process and maintain the data that is held in tables and transformed or loaded in jobs.

Jobs are executed on compute clusters. You can create multiple compute clusters with different settings to separate different use cases and workloads.

Query Cluster

A query cluster is used for querying materialized views in real time. Query clusters host the materialized views in memory for millisecond response times.

Syntax

CREATE { COMPUTE | QUERY } CLUSTER <cluster_name>
  MIN_INSTANCES = <min_instance_count>
  MAX_INSTANCES = <max_instance_count>
  [ ALLOW_MAINTENANCE_ACCESS = { TRUE | FALSE } ]
  [ COMMENT = '<comment' ]
  [ INSTANCE_TYPE_FAMILY = '<instance_type_family>' ]
  [ ON_DEMAND_INSTANCES = <on_demand_instance_count> ]
  [ MAX_REPLAY_INSTANCES = <max_replay_instance_count> ]
  [ SCALING_STRATEGY = { 
      LOW_COST | LOW_LATENCY | CONSISTENT_LOW_LATENCY | NO_SCALING } ]
  [ STATIC_PUBLIC_IPS = <number_of_static_ips> ]
  [ VPC_CONNECTION = <connection_name> ]

Cluster Options

MIN_INSTANCES — editable

Type: int

The lower limit for the number of instances to use when autoscaling the cluster.

MAX_INSTANCES — editable

Type: int

The upper limit for the number of instances to use when autoscaling the cluster.

ALLOW_MAINTENANCE_ACCESS — editable

Type: Boolean

(Optional) When enabled it will allow Upsolver Support access to the instances for debugging purposes. This only enables SSH access. Network access is still controlled by the network settings of the VPC in which the cluster is running.

Defaults to FALSE.

COMMENT — editable

Type: text

(Optional) Adds a comment or description to the cluster for documentation or clarification purposes.

INSTANCE_TYPE_FAMILY — editable

Type: int

(Optional) The instance type family to use. Upsolver uses similar instance types from the same family and size to achieve better spot instance availability and server uptime. For example, if r5.xlarge is configured then both r5.xlarge and r5d.xlarge will be used.

Defaults to 'r5.xlarge'

MAX_REPLAY_INSTANCES — editable

Type: int

(Optional) Replay Instances are instances that are started automatically when new jobs are added to the cluster. They are used to process the backlog of data without interfering with the latency of production workloads already running on the cluster. After the new jobs are up-to-date, the replay instances will automatically be shut down and the job moved back to the main instances.

Replay cluster instances operate independently and have different IP addresses than the main cluster. Usually, tasks that write to external resources such as databases are executed on the main cluster to maintain IP whitelisting. If you run into access issues, consider either turning off the replay cluster or using a separate, dedicated cluster.

ON_DEMAND_INSTANCES — editable

Type: int

(Optional) Specifies the minimum number of guaranteed on-demand instances allocated to the cluster. While this sets a floor, there may sometimes be more on-demand instances if there's low spot availability.

By default, Upsolver uses ephemeral Spot Instances for data processing. Setting this value allows you to configure how many on-demand Instances are within the cluster.

SCALING_STRATEGY — editable

Type: Enum

(Optional) Determines the scaling strategy of the cluster. Options are:

  • LOW_COST: Prioritize cost savings.

  • LOW_LATENCY: Prioritize reducing latency.

  • CONSISTENT_LOW_LATENCY: Maintain consistently low latency.

  • NO_SCALING: No scaling, static number of instances.

Defaults to LOW_LATENCY.

STATIC_PUBLIC_IPS — editable

Type: int

(Optional) The number of static IPs to create and associate with the cluster's instances. This parameter is useful when the cluster needs to connect to external systems that are behind a firewall. Configuring static IPs will allow the opening of those IPs in the target system's firewall.

The value is the number of static IPs to use and can be lower than the total number of instances in the cluster. If configured this way, the instances that get the static IP will be used to make requests to external resources.

VPC_CONNECTION

Type: identifier

(Optional) Use the VPC_CONNECTION option when you have multiple VPC connections for the same organization, otherwise running this command results in an error.

Last updated 9 months ago

MIN_INSTANCES
MAX_INSTANCES
ALLOW_MAINTENANCE_ACCESS
COMMENT
INSTANT_TYPE_FAMILY
MAX_REPLAY_INSTANCES
ON_DEMAND_INSTANCES
SCALING_STRATEGY
STATIC_PUBLIC_IPS
VPC_CONNECTION