LogoLogo
OverviewQuickstartsHow To GuidesReferenceArticlesSupport
Quickstarts
Quickstarts
  • Quickstarts
  • DATA INGESTION WIZARD
    • Using the Wizard
      • Source Set-up
        • Amazon Kinesis
        • Amazon S3
        • Apache Kafka
        • Confluent Cloud
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • PostgreSQL
      • Target Set-up
        • Amazon Redshift
        • AWS Glue Data Catalog
        • ClickHouse
        • Polaris Catalog
        • Snowflake
      • Job Configuration
        • Job Configuration
        • Job Configuration for CDC
      • Review and Run Job
  • CONNECTORS
    • Connectors
      • Amazon Kinesis
      • Amazon Redshift
      • Amazon S3
      • Apache Kafka
      • AWS Glue Data Catalog
      • ClickHouse
      • Confluent Cloud
      • Elasticsearch
      • Microsoft SQL Server
      • MongoDB
      • MySQL
      • Polaris Catalog
      • PostgreSQL
      • Snowflake
  • JOBS
    • Ingestion
      • Job Basics
        • Ingest to a Staging Table
        • Output to a Target Table
      • Stream and File Sources
        • Amazon Kinesis
        • Amazon S3
        • Apache Kafka
        • Confluent Kafka
      • CDC Sources
        • Microsoft SQL Server
        • MongoDB
        • MySQL
        • PostgreSQL
    • Transformation
      • Updating Data
        • Upsert Data to a Target Table
        • Delete Data from a Target Table
        • Aggregate and Output Data
        • Join Two Data Streams
      • Data Targets
        • Output to Amazon Athena
        • Output to Amazon Redshift
        • Output to Amazon S3
        • Output to Elasticsearch
        • Output to Snowflake
  • APACHE ICEBERG
    • Optimize Your Iceberg Tables
    • Install the Iceberg Table Analyzer
Powered by GitBook
On this page
  • Step 1 - Connect to AWS Glue Data Catalog
  • Create a new connection
  • Use an existing connection
  • Step 2 - Configure AWS access
  • Step 3 - Select table format
  • Step 4 - Select where to ingest the data
  1. DATA INGESTION WIZARD
  2. Using the Wizard
  3. Target Set-up

AWS Glue Data Catalog

Follow these steps to use AWS Glue Data Catalog as your target.

Step 1 - Connect to AWS Glue Data Catalog

Create a new connection

Click Create a new connection, if it is not already selected.

In the Name your connection field, type in the name for this connection. Please note this connection will be available to other users in your organization.

Set the storage location where target tables will be stored in the S3 Target Bucket field, using the format:

S3:///<data_storage_prefix>

Select the region where your AWS Glue Data Catalog is hosted in the Catalog Region select list.

Use an existing connection

By default, if you have already created a connection, Upsolver selects Use an existing connection, and your AWS Glue Data Catalog connection is populated in the list.

For organizations with multiple connections, select the target connection you want to use.

Step 2 - Configure AWS access

In order for Upsolver to access the catalog and write to the target bucket, follow the AWS configuration assistant link.

For the Authentication Method, we recommend to use Role-based access. Paste the ARN from the role you created in AWS IAM into the Role ARN field, as explained in the configuration guide.

If using AccessKey/Secret Key, ensure the user provided to Upsolver has the necessary permissions to access AWS Glue Data Catalog and Amazon S3, as explained in the configuration guide.

Step 3 - Select table format

Choose the target format to stored your data:

  • Upsolver managed Iceberg

  • Upsolver managed Hive (compatibility mode)

Step 4 - Select where to ingest the data

In this step, you need to configure the mapping of source schemas to target schemas. Upsolver will automatically create new tables in the selected target schemas.

Firstly, define a default target schema. This will be the schema to which all tables from all schemas will be replicated, unless specific manual mappings are defined.

When ingesting multiple source schemas into the AWS Glue Data Catalog, you have the following options:

  1. Ingest all tables into a single AWS Glue Data Catalog schema and prepend the source schema name to every new table created in the AWS Glue Data Catalog. Use the prefix {source\_schema}\_ for this purpose.

  2. Map each source schema to a corresponding target schema in the AWS Glue Data Catalog.

Create a new connection to AWS Glue Data Catalog to use as the target for your ingestion job.
Select your AWS Glue Data Catalog to use as the target for your ingestion job.