Problem Ingesting Amazon S3 Data

If you are trying to ingest data from an Amazon S3 bucket not partitioned by date, note that the START_FROM option is set to NOW.

This means that if no new data has arrived since you started running the job, no data has been ingested into your staging table.

To resolve this, you can set the START_FROM option to BEGINNING or to a specific timestamp when there is data, but this is only possible when reading from a bucket partitioned by DATE_PATTERN.

Example:

If the list of files is:

s3://bucket/input/a/2019/01/01/00/00/file.json
s3://bucket/input/a/2019/01/01/00/01/file.json
s3://bucket/input/a/2019/01/01/00/02/file.json
s3://bucket/input/a/2019/01/01/00/03/file.json

You can read your data from these files as follows:

CREATE JOB copy_from_s3
    CONTENT_TYPE = JSON
    START_FROM = timestamp '2019-01-01'
    DATE_PATTERN = 'yyyy/MM/dd/HH/mm'
AS COPY FROM S3 my_s3_connection 
    BUCKET = 'bucket' 
    PREFIX = 'input/a'
INTO default_glue_catalog.schema_name.table_name;

If you are still experiencing issues, please raise a ticket via the Upsolver Support Portal.

Last updated 1 year ago