INSERTjob defines a query that pulls in a set of data based on the given
SELECTstatement and inserts it into the designated target. This query is then run periodically based on the
RUN_INTERVALdefined within the job.
By default the
INSERTjob simply appends the new data to the target table. For data lake tables, those created using
CREATE TABLEin SQLake and managed by AWS Glue Data Catalog, when a primary key is defined the job will perform an upsert operation (update if PK exist, insert if it does not).
When writing to non-data lake targets, like Snowflake and Redshift, the
INSERTjob will not recognize a primary key, if defined on the target table, and will only perform an
INSERToperation. If you want to upsert, refer to the
Additionally, if two jobs writing to the same table rewrite the same primary key, it is nondeterministic as to which job's data ends up in the table and is thus not recommended.
Valid identifiers match the following format:
identifier = "([^"]|"")*"|[A-Za-z_][A-Za-z0-9_]*;
| SNOWFLAKE <catalog_name>.<schema_name>.<table_name>
| REDSHIFT <catalog_name>.<schema_name>.<table_name>
| S3 BUCKET = <bucket_name>
MAP_COLUMNS_BY_NAMEkeyword maps columns from the
SELECTstatement to the table by the names of the returned columns in the query. Columns listed after
EXCEPTare excluded from the final table.
MAP_COLUMNS_BY_NAMEis set, the
SELECTstatement must provide an alias for each field or they must be a simple column reference or a star expression. The columns are then mapped to the fields of the same name in the table.
Otherwise columns are mapped by ordinal positions with reference to the provided column list or the order of the columns in the table.
When using a star query in the
MAP_COLUMNS_BY_NAMEis required. Additionally, all primary key and partition columns must be mapped.
Columns listed after the
EXCEPTkeyword are not written to your target table.