AWS Glue Data Catalog setup guide

Follow these steps to use Amazon S3 as your source.

Step 1 - Connect to AWS Glue Data Catalog

Select an existing AWS Glue Data Catalog connection, or create a new one.

Create a new AWS Glue Data Catalog connection

Authentication Method

It is recommended to use Role-based access.

To define the correct permissions for the role, follow the S3 access configuration guide and the AWS Glue Data Catalog configuration guide to create an IAM policy.

If using access key ID and secret access key, follow the AWS Account and Access Keys guide.

Set your Amazon S3 connection where tables will be stored

Encryption Key

By default, Upsolver uses the default encryption defined in the AWS bucket to read the files. Alternatively, you can provide the Base64 text representation of the encryption key to use or an ARN for an existing AWS KMS key.

Default Storage Location

Set the storage location where target tables will be stored.

Example: S3://<bucket>/<data_storage_prefix>/

Region

Select the Regionwhere your AWS Glue Data Catalog is hosted.

Step 2 - Select table format

Choose whether the data written to the target will be stored in "Upsolver managed Iceberg" format OR "Upsolver managed Parquet" which is a Hive like table structure.

Step 3 - Select where to ingest the data

Select an existing schema for the ingested data.

If you are ingesting into a single table, provide a name for the new table. Table names in AWS Glue must be in lowercase.

If the source is a database (MySQL or PostgreSQL) Upsolver will create new tables in the selected schema.

When ingesting multiple source schemas into AWS Glue Data Catalog you have the following options:

  1. Ingest all tables into a single AWS Glue Data Catalog schema and add the source schema name to every new table created in AWS Glue Data Catalog.

  2. Map every source schema into a target schema in AWS Glue Data Catalog.

Last updated