LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Reference
Reference
  • Reference
  • ✨Learning Paths
    • Amazon Kinesis
    • Amazon S3
    • Apache Kafka
    • Confluent Cloud
    • Microsoft SQL Server
    • MongoDB
    • MySQL
    • PostgreSQL
  • SQL COMMANDS
    • Clusters
      • CREATE CLUSTER
      • ALTER CLUSTER
      • ROLL CLUSTER
      • STOP/START CLUSTER
      • DROP CLUSTER
    • Connections
      • CREATE CONNECTION
        • Amazon Kinesis
        • Amazon Redshift
        • Amazon S3
        • Apache Kafka
        • AWS Glue Data Catalog
        • ClickHouse
        • Confluent Cloud
        • Elasticsearch
        • Hive Metastore
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • Polaris Catalog
        • Iceberg REST catalogs
        • PostgreSQL
        • Snowflake
      • ALTER CONNECTION
      • DROP CONNECTION
      • CDC Connections with SSL
    • Jobs
      • CREATE JOB
        • Ingestion
          • Amazon Kinesis
          • Amazon S3
          • Apache Kafka
          • Confluent Kafka
          • Content Types
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Replication
          • Microsoft SQL Server
          • MongoDB
          • MySQL
          • PostgreSQL
        • Transformation
          • INSERT
            • MAP_COLUMNS_BY_NAME
            • UNNEST
          • MERGE
          • SELECT
          • Job Options
            • Amazon Redshift
            • Amazon S3
            • Apache Iceberg
            • ClickHouse
            • Data Lake Tables
            • Elasticsearch
            • PostgreSQL
            • Snowflake
        • Monitoring
          • Amazon CloudWatch
          • Datadog
          • Dynatrace
      • ALTER JOB
      • PAUSE / RESUME JOB
      • DROP JOB
    • Materialized Views
      • CREATE MATERIALIZED VIEW
      • ALTER MATERIALIZED VIEW
      • DROP MATERIALIZED VIEW
      • Join with a Materialized View
    • Tables
      • CREATE TABLE
      • ALTER TABLE
      • DROP TABLE
    • Iceberg Tables
      • Upsolver Managed Tables
        • CREATE ICEBERG TABLE
        • ALTER ICEBERG TABLE
        • DROP ICEBERG TABLE
        • OPTIMIZE ICEBERG TABLE
      • External Iceberg Tables
        • CREATE EXTERNAL ICEBERG TABLE
        • DROP EXTERNAL ICEBERG TABLE
      • Mirror Iceberg Tables
        • CREATE MIRROR
        • ALTER MIRROR
        • PAUSE / RESUME MIRROR
        • DROP MIRROR
  • FUNCTIONS & OPERATORS
    • Data Types
    • Functions
      • Aggregate
        • APPROX_COUNT_DISTINCT
        • APPROX_COUNT_DISTINCT_EACH
        • AVG
        • AVG_EACH
        • AVG_TIME_SERIES
        • COLLECT_SET
        • COLLECT_SET_EACH
        • COUNT
        • COUNT(*)
        • COUNT(DISTINCT ...)
        • COUNT_EACH
        • COUNT_IF
        • DYNAMIC_SESSIONS
        • FIRST
        • FIRST_ARRAY
        • FIRST_EACH
        • FIRST_TIME_SERIES
        • LAST
        • LAST_ARRAY
        • LAST_EACH
        • LAST_K
        • LAST_K_EACH
        • LAST_TIME_SERIES
        • MAX
        • MAX_BY
        • MAX_EACH
        • MAX_TIME_SERIES
        • MIN
        • MIN_BY
        • MIN_EACH
        • MIN_TIME_SERIES
        • SESSION_COUNT
        • STD_DEV
        • STD_DEV_EACH
        • STRING_MAX
        • STRING_MAX_EACH
        • STRING_MIN_EACH
        • SUM
        • SUM_EACH
        • SUM_TIME_SERIES
        • WEIGHTED_AVERAGE
      • Array
        • ARRAY_DISTINCT
        • ARRAY_JOIN
        • ARRAY_MAX
        • ARRAY_MIN
        • ARRAY_SORT
        • ARRAY_SORT_DESC
        • ARRAY_SUM
        • COUNT_VALUES_IF
        • COUNT_VALUES
        • ELEMENT_AT
        • FIRST_ELEMENT
        • LAST_ELEMENT
        • VALUE_INDEX_IN_ARRAY
        • VALUE_INDEX_IN_ROW
      • Comparison
        • GREATEST
        • LEAST
      • Conditional
        • COALESCE
        • IF_ELSE
        • NULL_IF
      • Date & Time
        • ADD_TIME_ZONE_OFFSET
        • DATE
        • DATE_ADD
        • DATE_DIFF
        • DATE_TRUNC
        • DAY
        • DAY_OF_WEEK
        • DAY_OF_YEAR
        • EXTRACT_TIMESTAMP
        • EXTRACT
        • FORMAT_DATETIME
        • FROM_ISO8601_DATE
        • FROM_UNIXTIME
        • HOUR
        • MILLISECOND
        • MINUTE
        • MONTH
        • QUARTER
        • RUN_END_TIME
        • RUN_START_TIME
        • SECOND
        • SUBTRACT_TIME_ZONE_OFFSET
        • TO_UNIX_EPOCH_MILLIS
        • TO_UNIX_EPOCH_SECONDS
        • TO_UNIXTIME
        • WEEK
        • YEAR_OF_WEEK
        • YEAR
      • Filter
        • IS_DUPLICATE
        • NOT
      • Interval
        • PARSE_DURATION
      • Mathematical
        • ABS
        • CBRT
        • CEIL
        • CEILING
        • DEGREES
        • EXP
        • FLOOR
        • GET_SHARD_NUMBER
        • LN
        • LOG
        • LOG2
        • LOG10
        • MOD
        • MODULO
        • POW
        • POWER
        • RADIANS
        • RAND
        • RANDOM
        • RECIPROCAL
        • ROUND
        • SIGN
        • SORT_VALUES
        • SQRT
        • TRUNCATE
      • Regular Expressions
        • REGEXP_EXTRACT
        • REGEXP_EXTRACT_ALL
        • REGEXP_LIKE
        • REGEX_MATCH_POSITION
        • REGEX_NAMED_GROUPS
        • REGEXP_REPLACE
      • Spatial
        • ST_DISTANCE
        • ST_WGS84_DISTANCE
        • WKT_SPATIAL_CONTAINS
        • WKT_SPATIAL_INTERSECT
      • String
        • BASE64_DECODE
        • BASE64_TO_HEX
        • BYTES_SUBSTRING
        • CONCAT
        • DATE
        • JOIN_ARRAYS
        • LENGTH
        • LOWER
        • LPAD
        • LTRIM
        • MD5
        • PARSE_DATETIME
        • REPLACE
        • REVERSE
        • RPAD
        • RTRIM
        • SHA1
        • SHA3_512
        • SHA256
        • SHA512
        • SORT_VALUES
        • SPLIT
        • SPLIT_TO_RECORD
        • STRING_FORMAT
        • STRIP_MARGIN
        • STRIP_PREFIX
        • STRIP_SUFFIX
        • STRPOS
        • SUBSTR
        • SUBSTRING
        • TRANSLATE
        • TRIM_CHARS
        • TRIM
        • UPPER
        • UUID_GENERATOR
        • XX_HASH
      • Structural
        • FROM_KEY_VALUE
        • GET_RANGE
        • JOIN_ALL_BY_KEY
        • JSON_PATH
        • JSON_TO_RECORD
        • MAP_WITH_INDEX
        • QUERY_STRING_TO_RECORD
        • RECORD_TO_JSON
        • SORT_BY
        • TO_ARRAY
        • ZIP_WITH_INDEX
        • ZIP
      • Trigonometric
        • COS
        • SIN
        • TAN
        • TANH
      • Type Conversion
        • CAST
        • CHR
        • DECIMAL_TO_HEX
        • HEX_TO_DECIMAL
        • TO_BIGINT
        • TO_DOUBLE
        • TO_STRING
      • URL
        • TOP_PRIVATE_DOMAIN
        • URL_DECODE
        • URL_ENCODE
        • URL_PARSER
    • Operators
      • Comparison
      • Conditional
        • CASE
      • Logical
      • Mathematical
      • String
  • MONITORING
    • Clusters
    • Datasets
      • Ingested Data
        • Column
      • Lineage
      • Data Violations
      • Statistics
      • Maintenance
        • Compactions
        • Expire Snapshots
        • Orphan Files
      • Columns
      • Partitions
      • Properties
    • Job Status
      • Stream & File Sources
        • Monitoring
        • Graphs
        • Lineage
        • Settings
      • CDC Sources
        • Monitoring
        • Replication Settings
        • Job Settings
    • System Catalog
      • Information Schema
        • Clusters
        • Columns
        • Connections
        • Jobs
        • Mirrors
        • Tables
        • Users
        • Views
      • Insights
        • job_output_column_stats
        • dataset_column_stats
      • Monitoring
        • CDC Status
        • Clusters
        • Expectations
        • Jobs
        • Partition Statistics
        • Recent Compactions
        • Running Queries
        • Table Statistics
      • Task Executions Table
  • GENERAL
    • Common SQL Syntax
    • View Entity Syntax
    • Keyboard Shortcuts
Powered by GitBook
On this page
  • Syntax
  • Connection options
  • Minimum example
  • Full example
  1. SQL COMMANDS
  2. Connections
  3. CREATE CONNECTION

AWS Glue Data Catalog

Last updated 11 months ago

In order to create and work with data lake tables and Apache Iceberg tables within Upsolver, you first need to establish a connection with a metadata store such as AWS Glue Data Catalog.

The tables created from your AWS Glue Data Catalog connection can also be queried from Athena and from within the Upsolver UI.

Note that when you integrate Upsolver with your AWS account, there is a Glue Data Catalog connection created by default, but you may still wish to create your own connection for specific access configurations.

See the how-to guide to .

Syntax

CREATE GLUE_CATALOG CONNECTION <connection_identifier> 
 [ { AWS_ROLE = '<role_arn>' 
    EXTERNAL_ID = '<external_id>'
  | AWS_ACCESS_KEY_ID = '<key_id>' 
    AWS_SECRET_ACCESS_KEY = '<key>' } ]
 DEFAULT_STORAGE_CONNECTION = <identifier>
 DEFAULT_STORAGE_LOCATION = 's3://<bucket>/<folder-path>/'
 [ REGION = '<region>' ]  
 [ DATABASE_DISPLAY_FILTER[S] = { '<database_name>' | ('<database_name>' [, ...]) } ]
 [ COMMENT = '<comment>' ] 

Jump to

  • AWS_ROLE

  • EXTERNAL_ID

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

  • DEFAULT_STORAGE_CONNECTION

  • DEFAULT_STORAGE_LOCATION

  • DATABASE_DISPLAY_FILTER[S]

  • COMMENT

Connection options

AWS_ROLE — editable

Type: text

(Optional) The AWS IAM role ARN. Used in conjunction with EXTERNAL_ID.

If omitted, the role created when integrating Upsolver with the AWS account is used.

EXTERNAL_ID — editable

Type: text

(Optional) The external ID of the role to assume. Used in conjunction with AWS_ROLE.

If omitted, the role created when integrating Upsolver with the AWS account is used.

AWS_ACCESS_KEY_ID — editable

Type: text

(Optional) The AWS access key ID. Used in conjunction with AWS_SECRET_ACCESS_KEY.

If omitted, the role created when integrating Upsolver with the AWS account is used.

AWS_SECRET_ACCESS_KEY — editable

Type: text

(Optional) The AWS secret key corresponding to the provided AWS_ACCESS_KEY_ID.

If omitted, the role created when integrating Upsolver with the AWS account is used.

DEFAULT_STORAGE_CONNECTION

Type: identifier

An Amazon S3 connection with the appropriate credentials to write to the DEFAULT_STORAGE_LOCATION provided.

DEFAULT_STORAGE_LOCATION

Type: text

The Amazon S3 path that serves as the default storage location for the underlying files associated with tables created under this metastore connection.

REGION

Type: text

Default: Region in which Upsolver is deployed within your AWS account

(Optional) The region your Glue Catalog is in.

DATABASE_DISPLAY_FILTER[S] — editable

Type: text | list

(Optional) A single database or the list of databases to show. If left empty, all databases are visible.

COMMENT — editable

Type: text

(Optional) A description or comment regarding this connection.

Minimum example

CREATE GLUE_CATALOG CONNECTION my_glue_catalog_connection
    DEFAULT_STORAGE_CONNECTION = my_s3_storage_connection
    DEFAULT_STORAGE_LOCATION = 's3://sqlake/my_glue_catalog_table_files/';

This example uses the default credentials from Upsolver's integration with AWS.

Additionally, this example assumes that you have created the Amazon S3 connection my_s3_storage_connection with proper write permissions to the specified storage location.

Refer to Amazon S3 for more information on creating a connection using SQL.

Full example

CREATE GLUE_CATALOG CONNECTION my_glue_catalog_connection
    AWS_ROLE = 'arn:aws:iam::123456789012:role/upsolver-sqlake-role'
    DEFAULT_STORAGE_CONNECTION = my_s3_storage_connection
    DEFAULT_STORAGE_LOCATION = 's3://sqlake/my_glue_catalog_table_files/'
    REGION = 'us-east-1'
    DATABASE_DISPLAY_FILTERS = ('demo_db', 'prod_db')
    COMMENT = 'glue catalog connection example';

See the how-to guide to .

Deploy Upsolver on AWS
Deploy Upsolver on AWS