Splunk SmartStore lets your Splunk indexers use remote object storage like S3 for your hot and warm buckets, rather than local disk.
Here’s how it looks in practice. Imagine you have a Splunk Enterprise Security environment with a single indexer.
# On your Splunk indexer, check the bucket status
/opt/splunk/bin/splunk cmd splunkd fsck --audit-path /opt/splunk/var/lib/splunk/audit --buckets-to-audit 1000
# You'll see output like this for buckets on local disk:
# ...
# Bucket: /opt/splunk/var/lib/splunk/defaultdb/db/db_1678886400_1678880000_0_00001
# ID: 00001
# State: Hot
# Size: 1.2GB
# Timestamp: 2023-03-15 12:00:00 UTC
# Path: /opt/splunk/var/lib/splunk/defaultdb/db/db_1678886400_1678880000_0_00001
# ...
# Now, imagine you've configured SmartStore with an S3 bucket.
# Your Splunk configuration will have a stanza like this in server.conf:
[smartstore]
# This is your S3 bucket name
remote_storage_type = s3
# This is the S3 endpoint URL if you're not using AWS public S3
# For example, for MinIO: s3Endpoint = http://minio.example.com:9000
# For AWS S3: s3Endpoint = s3.us-east-1.amazonaws.com
s3Endpoint = s3.amazonaws.com
# Your S3 bucket name
s3BucketName = my-splunk-smartstore-bucket
# The S3 region
s3Region = us-east-1
# If using IAM roles or instance profiles, you don't need these.
# Otherwise, provide your S3 access key and secret key.
# s3AccessKey = YOUR_ACCESS_KEY
# s3SecretKey = YOUR_SECRET_KEY
# And in indexes.conf, for each index you want to use SmartStore:
[my_index]
# This tells Splunk to use SmartStore for this index
enableRemoteStorage = true
# This is the base path within your S3 bucket for this index
remoteStoragePath = my_index/
# After applying these configurations and restarting Splunk,
# running the fsck command again will show different bucket paths:
# ...
# Bucket: /opt/splunk/var/lib/splunk/defaultdb/db/db_1678886400_1678880000_0_00001
# ID: 00001
# State: Hot
# Size: 1.2GB
# Timestamp: 2023-03-15 12:00:00 UTC
# Path: s3://my-splunk-smartstore-bucket/my_index/db_1678886400_1678880000_0_00001
# ...
The core problem SmartStore solves is the cost and operational overhead of massive amounts of local storage for hot and warm data. Traditionally, Splunk keeps all hot and warm buckets on fast, local NVMe or SSDs. As your data volume grows, this means buying more disks, managing RAID arrays, and dealing with disk failures. SmartStore offloads this to S3 (or S3-compatible object storage), which is far cheaper per terabyte and offers virtually infinite scalability and durability.
When Splunk writes new data, it still writes it to local disk first (to the hot path). Once a bucket rolls from hot to warm, SmartStore’s indexer process determines if it should be "rolled" to remote storage. By default, Splunk rolls warm buckets to S3. The indexer then stages the bucket data in S3 and marks the bucket as "remote." When Splunk needs to access a bucket in the warm state, it first checks if it’s local. If not, it retrieves it from S3. Cold buckets are always stored on remote storage, so if you’re using SmartStore, cold buckets will also reside in S3.
The key levers you control are in indexes.conf and server.conf. enableRemoteStorage = true on an index tells Splunk to consider remote storage for that index’s buckets. remoteStoragePath defines a logical directory within your S3 bucket for that index, keeping data organized. In server.conf, remote_storage_type, s3BucketName, s3Region, and s3Endpoint configure the connection to your object store. You can also control how aggressively Splunk rolls buckets to remote storage using remoteStorageMaxBucketAge and remoteStorageMaxBucketSizeBytes in indexes.conf.
What people often miss is that when a bucket is rolled to remote storage, Splunk doesn’t immediately delete the local copy. It maintains a local cache. This cache is crucial for performance, as it allows Splunk to serve frequently accessed warm buckets directly from local SSDs without hitting S3. The indexer manages this cache, evicting older or less-used buckets to make space for new ones. You can influence this behavior with remoteStorageMaxCacheSize and remoteStorageMinFreeSpace in indexes.conf. The eviction process is an asynchronous background task, so you won’t see immediate disk space reclamation when a bucket is marked as remote; it happens over time as the cache fills up.
The next thing you’ll want to understand is how to manage the lifecycle of data in S3, including setting up S3 Lifecycle Policies to move data to cheaper tiers like Glacier or delete it entirely.