Data Ingestion — VPC Flow Logs
This how-to guide shows you how to ingest, retrieve, and view data for your VPC Flow Logs.
Last updated
This how-to guide shows you how to ingest, retrieve, and view data for your VPC Flow Logs.
Last updated
VPC Flow Logs is a feature that enables you to capture information about the IP traffic going to and from network interfaces in your customer VPC. You can publish flow log data to Amazon CloudWatch Logs or Amazon S3. After you create a flow log, you can retrieve and view its data in your chosen destination.
VPC flow logs can help you:
Diagnose overly restrictive security group rules
Monitor traffic that is reaching your instance
Determine the direction of traffic to and from network interfaces
Visit the Amazon AWS documentation to .
Upsolver helps you ingest various VPC flow logs and perform minor transformations before loading your data into Amazon Athena for analysis. What sets Upsolver apart from other tools is its SQL-only solution and its scalable, robust streaming capabilities.
You ingest your VPC flow logs in Upsolver in five steps:
Connect Upsolver to your Amazon S3 bucket
Connect to your AWS Glue Data Catalog
Create an S3 storage connection
Create a staging table for your VPC flow logs
Ingest data from S3 into your staging table
To transfer your data, you must create an Upsolver connection. This connection gives you the ability to configure the AWS IAM credentials that Upsolver needs to access the data.
Here's the code:
A Glue Catalog connection in Upsolver serves as a metadata store connection. It enables you to create Upsolver-managed tables that also double as Athena tables.
Here's the code:
The tables you create in Upsolver all have underlying files stored in a specified storage location:
If you deploy Upsolver Upsolver in your customer VPC, you can find an Upsolver bucket created during the integration process that serves as a default storage location.
if you use Upsolver with Upsolver Cloud, you should create an additional S3 connection that serves as an underlying storage location. This ensures your data stays within your account.
Here's the code:
Before you can transform and output your data, you must ingest it into Upsolver. To do this, copy your data into an Upsolver-managed staging table.
Note that staging tables cannot have primary keys and can only be partitioned on time-based columns, as shown below:
You must set STORAGE_CONNECTION
and STORAGE_LOCATION
together to configure the storage location of the table's underlying files.
Note that:
If you deploy Upsolver in your Customer VPC, you can omit these options, as there's an Upsolver bucket created during the integration process that serves as a default storage location.
If you use Upsolver with Upsolver Cloud, you must define these two parameters to ensure your data stays within your account.
If you're using Upsolver with Upsolver Cloud, create the staging table explicitly to define the storage location for the table's underlying files.
Here's the code:
Next, query your table to ensure everything is working properly.
By adopting and implementing familiar SQL syntax, you can use Upsolver to create data pipelines and organize your data to easily perform analytics and ML.
As your business needs evolve, so can your data. In the future, you can create additional jobs that use the same staging table as the source of creativity and innovation, while your pipelines indefinitely keep your data fresh.
For future reference, you can copy your AWS_ROLE
from your user page.
You might notice this code looks similar to the code you used to create an S3 connection in . Both connections are important. The first one gives you access to your account in S3 and permission to take specific actions. You use the second connection to load your data into a staging table — that is, it provides you with a direct connection to your data.
Using the COPY FROM
statement, you copy the data from your S3 connection in and load it into your staging table by specifying the location in your code. Be sure to note the bucket from which you draw your data to ensure you only process the data you wish to see.
At this point, you have a connection to your raw data of VPC Logs and have ingested them into a staging table. The next step is to perform data analytics; see the guide where we walk you through the various transformations you can apply to your data.