Comment on page
Amazon S3 setup guide
Follow these steps to use Amazon S3 as your source.
Select an existing S3 connection, or create a new one.
It is recommended to use
- If your S3 bucket runs on a different AWS account than the one running Upsolver, you need to create trust between the role and the account running Upsolver. Follow the Role-based setup guide to create a trusted
AWS Roleand find your
By default, Upsolver uses the default encryption defined in the AWS bucket to read the files. Alternatively, you can provide the Base64 text representation of the encryption key to use or an ARN for an existing AWS KMS key.
Select a bucketto get started. Upsolver will attempt to list your buckets if the s3:ListAllMyBuckets permission was provided by the connection above. As an alternative, you can specify the name of your bucket (e.g. s3://upsolver-samples).
Select a folder to ingestor leave empty to ingest the entire bucket.
Upsolver ingests all files in the selected location by default. You can
use a regular expression to define which files will be ingestedif you want to ingest only some of them.
If your source files are partitioned by a date pattern, Upsolver can load existing and new files using the pattern. This affects the order of files loaded and avoids delays when many changes occur across the bucket.
By default, Upsolver will list and ingest files in the ingest job’s bucket and folder as soon as they are discovered. When you set a date pattern, Upsolver uses the date in the folder path to understand when new files are added. The date in the path is used to process data in order of arrival. If files are added to a folder named with a future date, these files will not be ingested until that date becomes the present.
Delete the source files following ingestion
To discover new files, when a date pattern is not set, Upsolver lists the top-level prefix and performs a diff to detect newly created files. It then lists the paths adjacent to these newly added files and assumes that if a file was added here, others will be as well. This process is performed at regular intervals to ensure files are not missed.
For buckets with few files and predictable changes, this works well. However, for buckets with many changes across millions of files and hundreds of prefixes, the scanning and diffing process may result in ingestion and processing delays.
To optimize this process, consider setting the
Delete filesoption to
TRUE. This moves ingested files to another staging location, leaving the source folder empty and making it easier and faster for Upsolver to discover new files. Be aware that configuring Upsolver to move ingested files could impact other systems if they depend on the same raw files.
When you select a bucket and folder, Upsolver will attempt to load a sample of the files.
If Upsolver did not load any sample files, try the following:
- 1.Verify that the location on your bucket contains files.
- 2.Select a
Content typethat matches the content type of your stream.