Comment on page
Use Upsolver to index less data into Splunk
This article provides a guide on how to use Upsolver to index less data into Splunk.
This guide provides an example of how to index less data into Splunk and thereby reduce your Splunk cost.
The Upsolver architecture liberates your data from vendor lock-in; it allows many ways of analyzing data including SQL engine, Machine Learning, and Searching. Many Upsolver users utilize Athena to run SQL on log data.
1. Click on Outputs on the left and then click New on the right upper corner.
2. Select Amazon S3 as your data output.
3. Give the data output a name and define your output format (tabular or hierarchical).
5. Click Next to continue.
1. Select the SQL window from the upper right hand corner. Keep in mind that everything that you do on the UI will be reflected in SQL and vice versa.
2. The sample SQL below aggregates multiple values together for a given period of time, reducing the amount of data being sent to Splunk.
SELECT data."account-id" AS ACCOUNT_ID,
data.action AS action,
SUM(TO_NUMBER(data.bytes)) AS SUM_BYTES,
SUM(TO_NUMBER(data.packets)) AS SUM_PACKETS,
COUNT(*) AS count
GROUP BY data."account-id", data.action
3. Click on Properties on the upper right hand corner.
4. Under Scheduling, change the Output Interval to your desired length.
This property defines how frequently Upsolver outputs the aggregated data, with the default being 1 minute.
5. Click Run on the upper right hand corner.
1. Define the Output Format and the S3 Connection information; then click Next.
Keep in mind that Upsolver supports all file types.
2. Define the compute cluster that you would like to use and the time range of the data you would like to output.
Keep in mind that setting Ending At to Never means the output will be a continuous stream.
3. Click Deploy.
While waiting for the data to write to the output, configure the Splunk environment to read from S3; this guide uses a size
1. After logging in, click on Find More Apps.
2. Find the Splunk Add-on for Amazon Web Services app and click Install.
4. The installation might take a few seconds and Splunk will prompt you to restart. Click Restart Now.
5. Login to your Splunk environment again and click on the Splunk Enterprise logo. Then click Splunk Add-on for AWS.
6. Click on the Configuration tab and then click Add on the right.
7. Give your Account a name (make sure to remember this name, we will use it for the data input next). Fill out your AWS Access Key (Key ID) and Secret Key information then click Add.
8. Click on Settings > Data inputs on your Splunk UI’s upper right hand corner.
9. Find and select AWS S3 data input (most likely on page 2).
10. Give the data input a name and fill out your AWS Account information. It should be the same Account Name from step 5.
11. Give it a bucket name. This must match the bucket name on your AWS account where the output data is being stored.
12. Change the Polling interval to 10. Define Key prefix as your S3 folder path.
13. Scroll down and check More settings to configure additional setting options.
14. Change Set sourcetype to From list, and select json_no_timestamp from the Select sourcetype from list dropdown. Then click Next.
15. Click Start searching.
1. Click on Data Summary under What to Search.
2. Click on Sourcetype and json_no_timestamp.
3. Verify your indexed data is the same as the aggregated data from Upsolver. Success!