LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
How To Guides
How To Guides
  • How To Guides
  • SETUP
    • Deploy Upsolver on AWS
      • Deployment Guide
      • AWS Role Permissions
      • VPC Peering Guide
      • Role-Based AWS Credentials
    • Enable API Integration
    • Install the Upsolver CLI
  • CONNECTORS
    • Create Connections
      • Amazon Kinesis
      • Amazon Redshift
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • ClickHouse
      • Confluent Cloud
      • Elasticsearch
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • PostgreSQL
      • Snowflake
      • Tabular
    • Configure Access
      • Amazon Kinesis
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • Confluent Kafka
    • Enable CDC
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • PostgreSQL
  • JOBS
    • Basics
      • Real-time Data Ingestion — Amazon Kinesis to ClickHouse
      • Real-time Data Ingestion — Amazon S3 to Amazon Athena
      • Real-time Data Ingestion — Apache Kafka to Amazon Athena
      • Real-time Data Ingestion — Apache Kafka to Snowflake
    • Advanced Use Cases
      • Build a Data Lakehouse
      • Enriching Data - Amazon S3 to ClickHouse
      • Joining Data — Amazon S3 to Amazon Athena
      • Upserting Data — Amazon S3 to Amazon Athena
      • Aggregating Data — Amazon S3 to Amazon Athena
      • Managing Data Quality - Ingesting Data with Expectations
    • Database Replication
      • Replicate CDC Data into Snowflake
      • Replicate CDC Data to Multiple Targets in Snowflake
      • Ingest Your Microsoft SQL Server CDC Data to Snowflake
      • Ingest Your MongoDB CDC Data to Snowflake
      • Handle PostgreSQL TOAST Values
    • VPC Flow Logs
      • Data Ingestion — VPC Flow Logs
      • Data Analytics — VPC Flow Logs
    • Job Monitoring
      • Export Metrics to a Third-Party System
    • Data Observability
      • Observe Data with Datasets
  • DATA
    • Query Upsolver Iceberg Tables from Snowflake
  • APACHE ICEBERG
    • Analyze Your Iceberg Tables Using the Upsolver CLI
    • Optimize Your Iceberg Tables
Powered by GitBook
On this page
  • Create an AWS Glue Data Catalog connection
  • Alter a Glue Catalog connection
  • Drop a Glue Catalog connection
  1. CONNECTORS
  2. Create Connections

AWS Glue Data Catalog

This page describes how to create and maintain connections to your AWS Glue Data Catalog.

Last updated 12 months ago

AWS Glue Data Catalog connections serve as a metadata store connection type needed to create Upsolver-managed tables in Upsolver. When a table is created in Upsolver using an AWS Glue Data Catalog connection, its underlying files are stored in Amazon S3 and a pointer to the table is created in your AWS Glue Data Catalog.

AWS Glue Data Catalog connections also double as Athena connections in Upsolver, so if your goal is to write your transformed data into an Athena table, you should first ensure you have an AWS Glue Data Catalog connection with the correct credentials to write to your intended location.

Note that an AWS Glue Data Catalog connection is created by default when you deploy Upsolver on your AWS account.

See for more information.

Create an AWS Glue Data Catalog connection

Simple example

An AWS Glue Data Catalog connection can be created as follows:

CREATE GLUE_CATALOG CONNECTION my_glue_catalog_connection
    DEFAULT_STORAGE_CONNECTION = my_s3_storage_connection
    DEFAULT_STORAGE_LOCATION = 's3://sqlake/my_glue_catalog_table_files/';

Note that the connection in this example is created based on the default credentials derived from Upsolver's integration with your AWS account.

Additionally, you need an connection with write permissions in order to create any Glue Catalog connection.

Full example

The following example also creates an AWS Glue Data Catalog connection but additionally configures credentials by providing a specific role:

CREATE GLUE_CATALOG CONNECTION my_glue_catalog_connection
    AWS_ROLE = 'arn:aws:iam::123456789012:role/upsolver-sqlake-role'
    DEFAULT_STORAGE_CONNECTION = my_s3_storage_connection
    DEFAULT_STORAGE_LOCATION = 's3://sqlake/my_glue_catalog_table_files/'
    REGION = 'us-east-1'
    DATABASE_DISPLAY_FILTERS = ('demo_db', 'prod_db')
    COMMENT = 'glue catalog connection example';

To establish a connection with specific permissions, you can configure the AWS_ROLE and EXTERNAL_ID options like in the example above or you can configure the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY options to provide the credentials to work with your AWS Glue Data Catalog.

Additionally, the REGION option can be used to provide the region of your AWS Glue Data Catalog.

You can also limit the list of databases displayed within your catalog by providing the list of database names using DATABASE_DISPLAY_FILTER[S].

Finally using the COMMENT option, you can add a description for your connection.

Alter a Glue Catalog connection

Some connection options are considered mutable, so in some cases, you can run a SQL command to alter an existing AWS Glue Data Catalog connection rather than create a new one.

For example, take the AWS Glue Data Catalog connection we created previously, based on default credentials:

CREATE GLUE_CATALOG CONNECTION my_glue_catalog_connection
    DEFAULT_STORAGE_CONNECTION = my_s3_storage_connection
    DEFAULT_STORAGE_LOCATION = 's3://sqlake/my_glue_catalog_table_files/';

To change the connection's permissions and keep everything else the same without creating a new connection, you can run the following command:

ALTER GLUE_CATALOG CONNECTION my_glue_catalog_connection
    SET AWS_ROLE = 'arn:aws:iam::123456789012:role/new-sqlake-role'; 

Note that some options such as REGION cannot be altered once the connection has been created.

Drop a Glue Catalog connection

If you no longer need a connection, you can easily drop it with the following SQL command:

DROP CONNECTION my_glue_catalog_connection; 

However, be aware that the connection cannot be deleted if existing tables or jobs depend upon the connection.


Learn More

To discover which connection options are mutable, and to learn more about the options, please see the SQL command reference for .

Deploy Upsolver on AWS
Amazon S3
AWS Glue Data Catalog