Summary indexes are the fastest way to speed up repeated searches in Splunk.
Let’s say you’re running a daily report on failed login attempts across your entire environment. Without a summary index, Splunk has to re-scan gigabytes or even terabytes of raw event data every single time you run that report. This can take minutes, or worse, hours. A summary index pre-computes the results of your search and stores them in a smaller, more manageable index. When you run your report, Splunk queries this tiny summary index instead of the massive raw data, making it near-instantaneous.
Here’s how it works in practice. Imagine you want to track the count of error events per hour for the last 7 days.
First, you’d create a saved search that runs periodically (e.g., hourly) to populate your summary index. This search would look something like this:
index=main sourcetype=web_logs status=500
| bin _time span=1h
| stats count by _time
This search finds all 500 status codes in your web_logs sourcetype within the main index, bins them into hourly buckets, and counts them.
Now, you need to configure this search to write to a summary index. In Splunk Web, go to Settings > Searches, Reports, and Alerts. Find your saved search, click Edit > Edit Definition. Under the "Advanced Event Generation" section, check the box for "Summarize this event". Then, in the "Destination for summarized events" dropdown, select or create a new summary index. Let’s call it summary_web_errors.
You can also configure the schedule for this saved search to run, say, every hour at 5 minutes past the hour.
Once this summary search has run for a while, you’ll have data in your summary_web_errors index. To query it, your search becomes incredibly simple:
index=summary_web_errors
| timechart span=1h sum(count)
Notice how we’re now querying index=summary_web_errors and looking for a field named count (which was the result of the stats count command in our original search). The timechart command then aggregates these hourly counts to give you the daily trend. This query will run in seconds, regardless of how much raw data your original search was processing.
The key components you control are:
- The Summary Search: This is the core logic that defines what data you want to pre-compute. It should be as specific as possible to reduce the amount of data it needs to process and store.
- The Schedule: How often the summary search runs. This dictates the freshness of your summarized data. For hourly reports, an hourly schedule is typical. For daily reports, a daily schedule might suffice.
- The Summary Index: A dedicated index where the pre-computed results are stored. This keeps your raw data indexes clean and your summary index lean.
- The Reporting Search: The query you run to view your summarized data. This search should be simple and directly query the summary index.
It’s crucial to understand that the summary search doesn’t just store the raw events; it stores the results of the search. If your summary search uses stats, chart, top, or rare, those aggregated results are what get indexed. If your summary search is just index=foo, then the raw events are indexed into the summary index, which defeats the purpose. Always include an aggregation command in your summary search.
The most common pitfall is forgetting to include an aggregation command in the summary search. If you simply run index=my_app and direct its output to a summary index, Splunk will re-index all raw events from my_app into the summary index. This doesn’t save you any storage or search time because you’re just moving data around. The power comes from pre-aggregating and reducing the cardinality of the data.
The next step is to explore how to use summary indexes for more complex analytical queries and how to manage their growth.