Splunk Workload Management lets you guarantee search performance for your most critical dashboards and alerts, even when your environment is swamped.
Let’s see it in action. Imagine you have a Splunk environment with two heavy-duty search jobs running: a scheduled report that takes 30 minutes and a user-initiated search that’s also 30 minutes. Both are competing for the same resources. Splunk’s default behavior is to just queue them up. Now, let’s say you have a critical dashboard that must load within 5 minutes. Without Workload Management, that dashboard might get stuck behind those long-running searches, frustrating users and potentially causing missed alerts if it’s tied to an incident response workflow.
Here’s how we fix that. We’ll use Splunk’s Workload Management to create a "priority" workload that gets preferential treatment.
First, we need to define our workloads. This is done in workload_manager.conf. We’ll create a [workload:critical_dashboards] and a [workload:default] (which will catch everything else).
# $SPLUNK_HOME/etc/system/local/workload_manager.conf
[workload:critical_dashboards]
# This workload will get 70% of the CPU and 70% of the memory
max_cpu_percent = 70
max_mem_percent = 70
# We'll set a strict concurrency limit for critical searches
max_concurrency = 2
[workload:default]
# This workload gets the remaining resources
max_cpu_percent = 30
max_mem_percent = 30
# Default concurrency
max_concurrency = 10
Next, we need to tell Splunk which searches belong to which workload. We do this by mapping search properties (like user, app, or search_id) to a workload. This is also configured in workload_manager.conf. For our critical dashboards, we’ll say any search initiated by the dashboard_user role, running in the search app, and originating from a dashboard context should go into the critical_dashboards workload.
# $SPLUNK_HOME/etc/system/local/workload_manager.conf
[mapping:critical_dashboards_mapping]
# Map searches to the critical_dashboards workload
search_app = search
search_role = dashboard_user
search_context = dashboard
[mapping:default_mapping]
# All other searches go to the default workload
search_app = *
You’ll need to restart Splunk for these changes to take effect.
Now, when dashboard_user runs a search from a dashboard in the search app, it will be assigned to the critical_dashboards workload. If that workload is already at its max_concurrency of 2, new critical dashboard searches will be held back. However, if the critical_dashboards workload has capacity (i.e., fewer than 2 searches are running), and the default workload is saturated, the critical dashboard search will still be prioritized. It gets first dibs on the 70% of CPU and memory we allocated. The other two 30-minute searches, if they don’t meet the mapping criteria for critical_dashboards, will fall into the default workload, which has access to the remaining 30% of resources. This ensures that even if the default workload is heavily utilized, your critical dashboard searches will still have a good chance of completing within their SLA.
The actual resource allocation isn’t just about percentages; Splunk’s scheduler dynamically adjusts resource allocation based on these defined workload limits and the current system load. If your critical_dashboards workload is using less than its allocated 70% CPU, the default workload can temporarily borrow from it. But as soon as a critical search needs more, the default workload’s access will be throttled.
The key levers you control are the max_cpu_percent, max_mem_percent, and max_concurrency settings for each workload. These define the "hard limits" and "soft limits" of your resource partitioning. You also control which searches fall into which workload via the [mapping] stanzas. The search_context attribute is particularly powerful for differentiating between ad-hoc user searches, scheduled reports, and dashboard searches.
One common pitfall is not understanding how max_concurrency interacts with resource limits. If you set max_concurrency = 1 for a critical workload but that single search is incredibly resource-intensive (e.g., it needs 90% CPU), it will still be throttled by the max_cpu_percent = 70 setting. The system respects both constraints simultaneously.
The next thing you’ll likely want to configure is workload-based search result limits to prevent runaway queries from consuming all resources within a specific workload.