Jobs

Jobs enable you to ingest and transform your data. They can read from external sources and write data into a table, or read from a table and write to another table or an external target.

In Upsolver, there are three types of data movement jobs: ingestion, replication, and transformation. Each of these jobs may be created as either synchronized or non-synchronized jobs. You can also create a monitoring job to export job metrics to a third-party monitoring platform.

  1. Ingestion jobs copy from a supported source and insert the raw data into a staging table in your Amazon S3-based data lake, or directly ingest the data into supported targets. These jobs automatically detect and infer the schema and, when writing to data lake tables, update the AWS Glue Data Catalog with column names, types, and partition information. Furthermore, you can apply transformations if your data requires alteration prior to loading into your target.

  2. Replication jobs copy change data capture (CDC) from enabled databases into one or more target schemas in Snowflake. Sharing a single data source, multiple replication groups can be individually configured to write to multiple schemas using different intervals and options.

  3. Transformation jobs insert and merge data into tables in your data lake, data warehouse, and other targets. You can transform, model, join, and aggregate data before inserting the results into the target system. Using a primary key, transformation jobs can insert, update, and delete rows automatically.

  4. Monitoring jobs enable you to stream job metrics to Amazon CloudWatch and Datadog, enabling you to monitor your pipeline jobs from a centralized dashboard.

Before creating a job, it is important to understand the difference between synchronized and non-synchronized jobs.

Last updated