The eval command in Splunk’s Search Processing Language (SPL) isn’t just for creating new fields; it’s the engine that lets you perform arbitrary calculations on your data, transforming raw logs into actionable insights.
Let’s see eval in action. Imagine you have web server logs with fields like bytes_sent and bytes_received. You want to calculate the total data transferred for each transaction.
index=web sourcetype=apache
| eval total_bytes = bytes_sent + bytes_received
| stats sum(total_bytes) by clientip
Here, eval total_bytes = bytes_sent + bytes_received creates a new field total_bytes by summing the values of bytes_sent and bytes_received for each event. Then, stats sum(total_bytes) by clientip aggregates this new field, showing the total data transferred per client IP.
The core problem eval solves is the gap between raw log data and the structured metrics you need for analysis, alerting, and reporting. Logs often contain pieces of information scattered across different fields, or they might be in formats that aren’t directly usable for calculations. eval bridges this by allowing you to:
- Perform Arithmetic Operations: Addition, subtraction, multiplication, division.
- Apply String Functions: Concatenation, substring extraction, case conversion.
- Utilize Conditional Logic:
if(),case()for branching calculations. - Leverage Mathematical Functions:
abs(),round(),pow(),log(),sqrt(). - Work with Dates and Times: Extracting components, calculating durations.
Consider a scenario where you’re tracking API request latency. Your logs might have request_time (a Unix timestamp) and response_time (also a Unix timestamp). You want to know how long each request took.
index=api sourcetype=myapi
| eval duration_ms = (response_time - request_time) * 1000
| stats avg(duration_ms) as avg_latency_ms, max(duration_ms) as max_latency_ms by endpoint
In this example, (response_time - request_time) gives the duration in seconds. Multiplying by 1000 converts it to milliseconds. The stats command then calculates the average and maximum latency for each API endpoint.
The eval command supports a rich set of operators and functions, detailed in the Splunk documentation. You can chain multiple eval commands or perform several calculations within a single eval statement using commas.
index=network sourcetype=firewall
| eval src_port_str = tostring(src_port)
| eval dest_port_str = tostring(dest_port)
| eval connection_id = src_ip + ":" + src_port_str + " -> " + dest_ip + ":" + dest_port_str
| stats count by connection_id
Here, we first convert numeric port fields to strings using tostring() so they can be concatenated. Then, we construct a unique connection_id string by combining source and destination IP addresses and ports. This allows us to count unique network connections.
When dealing with floating-point numbers, be mindful of potential precision issues. While Splunk generally handles these well, for highly sensitive financial calculations, you might consider rounding or using specific formatting functions. For instance, to round avg_latency_ms to two decimal places:
index=api sourcetype=myapi
| eval duration_ms = (response_time - request_time) * 1000
| stats avg(duration_ms) as avg_latency_ms, max(duration_ms) as max_latency_ms by endpoint
| eval avg_latency_ms = round(avg_latency_ms, 2)
This eval avg_latency_ms = round(avg_latency_ms, 2) ensures that the reported average latency is presented with a maximum of two decimal places, improving readability.
The most surprising aspect of eval is its ability to perform complex string manipulation and pattern matching using regular expressions directly within the SPL, often obviating the need for external parsing tools before data ingestion. For example, extracting a specific piece of information from a free-form log message:
index=app sourcetype=myapp
| eval user_id = mvindex(split(log_message, "user="), 1)
| eval user_id = mvindex(split(user_id, " "), 0)
| stats count by user_id
This sequence of eval commands first splits the log_message by "user=" and takes the second part (mvindex(..., 1)). Then, it splits that result by space and takes the first part (mvindex(..., 0)), effectively isolating the user ID that might be embedded like "INFO Processing request for user=jsmith processing_id=123".
Once you’ve mastered eval for calculations, your next step is often using these calculated fields in more sophisticated statistical analyses or for creating visualizations in dashboards.