AWS Glue Data Catalog
This page describes how to create and maintain connections to your AWS Glue Data Catalog.
Glue Catalog connections serve as a metadata store connection type needed to create Upsolver-managed tables.
When a table is created in Upsolver using a Glue Catalog connection, its underlying files are stored in Amazon S3 and a pointer to the table is created in your Glue Catalog.
To learn more about tables in Upsolver, see Tables
Glue Catalog connections also double as Athena connections in Upsolver, so if your goal is to write your transformed data into an Athena table, you should first ensure you have a Glue Catalog connection with the correct credentials to write to your intended location.
Note that a Glue Catalog connection is created by default when you deploy Upsolver on your AWS account.
Create a Glue Catalog connection
Simple example
A Glue Catalog connection can be created as follows:
Note that the connection in this example is created based on the default credentials derived from Upsolver's integration with your AWS account.
Additionally, you need an Amazon S3 connection with write permissions in order to create any Glue Catalog connection.
See: Connect to your Amazon S3 bucket
Full example
The following example also creates a Glue Catalog connection but additionally configures credentials by providing a specific role:
To establish a connection with specific permissions, you can configure the AWS_ROLE
and EXTERNAL_ID
options as per the example above, or you can configure the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
options to provide the credentials to work with your Glue Catalog.
Additionally, the REGION
option can be used to provide the region of your Glue Catalog.
You can also limit the list of databases displayed within your catalog by providing the list of database names using DATABASE_DISPLAY_FILTER[S]
.
Finally, using the COMMENT
option, you can add a description for your connection.
For the full list of connection options with syntax and detailed descriptions, see Glue Catalog connection with SQL.
Once you've created your connection, you are ready to move onto the next step of building your data pipeline: reading your data into Upsolver with an ingestion job.
Alter a Glue Catalog connection
Numerous connection options are considered mutable, so in some cases, you can run a SQL command to alter an existing Glue Catalog connection rather than create a new one.
For example, take the Glue Catalog connection we created previously, based on default credentials:
To change the connection's permissions and keep everything else the same without creating a new connection, you can run the following command:
Note that some options such as REGION
cannot be altered once the connection has been created.
To check which specific connection options are mutable, see Glue Catalog connection with SQL.
Drop a Glue Catalog connection
If you no longer need a connection, you can easily drop it with the following SQL command:
However, be aware that the connection cannot be deleted if existing tables or jobs depend upon the connection.
For more details, see DROP CONNECTION.
Last updated