AWS Glue Data Catalog setup guide
Follow these steps to use Amazon S3 as your source.
Step 1 - Connect to AWS Glue Data Catalog
Select an existing AWS Glue Data Catalog connection, or create a new one.
Create a new AWS Glue Data Catalog connection
Authentication Method
It is recommended to use Role-based access
.
To define the correct permissions for the role, follow the S3 access configuration guide and the AWS Glue Data Catalog configuration guide to create an IAM policy.
If using access key ID and secret access key, follow the AWS Account and Access Keys guide.
Set your Amazon S3 connection where tables will be stored
Encryption Key
By default, Upsolver uses the default encryption defined in the AWS bucket to read the files. Alternatively, you can provide the Base64 text representation of the encryption key to use or an ARN for an existing AWS KMS key.
Default Storage Location
Set the storage location where target tables will be stored.
Example: S3://<bucket>/<data_storage_prefix>/
Region
Select the Region
where your AWS Glue Data Catalog is hosted.
Step 2 - Select table format
Choose whether the data written to the target will be stored in "Upsolver managed Iceberg" format OR "Upsolver managed Parquet" which is a Hive like table structure.
Step 3 - Select where to ingest the data
Select an existing schema for the ingested data.
If you are ingesting into a single table, provide a name for the new table. Table names in AWS Glue must be in lowercase.
If the source is a database (MySQL or PostgreSQL) Upsolver will create new tables in the selected schema.
When ingesting multiple source schemas into AWS Glue Data Catalog you have the following options:
Ingest all tables into a single AWS Glue Data Catalog schema and add the source schema name to every new table created in AWS Glue Data Catalog.
Map every source schema into a target schema in AWS Glue Data Catalog.
Last updated