transforms.conf and props.conf are where Splunk gets serious about understanding your data, but they’re also where things get messy fast if you don’t know the hidden rules.
Let’s watch Splunk ingest some raw web server logs and see how transforms.conf and props.conf shape them into something useful.
Imagine this raw Apache access log line:
192.168.1.100 - - [10/Oct/2023:10:00:00 +0000] "GET /index.html HTTP/1.1" 200 1234 "http://example.com/referrer" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
Without any configuration, Splunk sees this as a single, unparsed event. We want to break this down.
First, transforms.conf. This is all about transforming the raw data before it’s even indexed. Think of it as pre-processing. We’ll use it to extract fields.
Here’s a stanza in transforms.conf to pull out some key info:
[web_log_parser]
REGEX = ^(?<ip>\S+) \S+ \S+ \[(?<timestamp>\S+ \S+)\] \"(?<method>\S+) (?<uri>\S+) (?<protocol>\S+)\" (?<status>\d{3}) (?<bytes>\d+) \"(?<referrer>\S+)\" \"(?<user_agent>\S+)\"
FORMAT = ip::$ip::method::$method::uri::$uri::status::$status::bytes::$bytes::referrer::$referrer::user_agent::$user_agent
MV_ADD = true
When Splunk processes data that matches this configuration (we’ll link it in props.conf later), it applies the REGEX. The (?<fieldname>...) syntax captures parts of the line and names them. The FORMAT line then tells Splunk how to output these captured fields. MV_ADD = true is crucial: it tells Splunk that if a field is captured multiple times (which won’t happen with this specific regex, but is good practice for more complex ones), it should treat them as a multi-value field rather than overwriting.
Now, props.conf. This is where you tell Splunk how to process data as it’s being indexed. It’s about defining source types, setting up field extraction rules, and more.
We’ll link our transforms.conf stanza here and define a source type:
[source::...access_log.txt]
TRANSFORMS-web = web_log_parser
sourcetype = apache_access
This stanza tells Splunk: "For any data coming from a file ending in access_log.txt, apply the web_log_parser transformations defined in transforms.conf, and assign it the sourcetype apache_access."
So, after these configurations, our single raw log line becomes a Splunk event with individual fields:
ip="192.168.1.100" method="GET" uri="/index.html" protocol="HTTP/1.1" status=200 bytes=1234 referrer="http://example.com/referrer" user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
Notice the timestamp: 10/Oct/2023:10:00:00 +0000. Splunk sees this as a string because we extracted it as a single chunk. To make it a proper time field that Splunk can use for time-based searches, we need one more props.conf stanza:
[apache_access]
TIME_PREFIX = \[(.*?)
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
SHOULD_LINEMERGE = false
TRUNCATE = 0
Here’s the breakdown:
[apache_access]: This stanza applies to data with thesourcetypeapache_access.TIME_PREFIX = \[(.*?): This tells Splunk to look for the timestamp after an opening square bracket[. The(.*?)is a non-greedy capture that grabs everything until the next part of the pattern.TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z: This is the critical part. It defines the exact format of the timestamp string we captured.%dis the day,%bis the abbreviated month name (Oct),%Yis the year,%His the hour,%Mis the minute,%Sis the second, and%zis the UTC offset.SHOULD_LINEMERGE = false: For simple logs like this, each raw line is a single event. This prevents Splunk from trying to combine multiple lines into one event.TRUNCATE = 0: This ensures that the entire raw event is kept, which is useful for debugging and if you need to re-parse or extract more fields later.
With this final props.conf stanza, Splunk will correctly parse 10/Oct/2023:10:00:00 +0000 into its internal timestamp format. You can now search using _time and filter by date and time ranges.
The actual mechanism at play is that transforms.conf defines reusable extraction patterns, and props.conf ties those patterns to specific data inputs (based on source or sourcetype) and defines how Splunk should interpret event boundaries and timestamps. The FORMAT in transforms.conf is effectively a way to "flatten" the captured regex groups into key-value pairs that Splunk can then index. The timestamp parsing in props.conf uses a specific syntax to map the raw string to a usable time object, enabling all time-based operations.
The one thing most people don’t realize about transforms.conf is that the FORMAT directive isn’t just about creating key-value pairs; it’s also the mechanism by which Splunk knows which captured groups from the REGEX to actually index as separate fields. If a group isn’t mentioned in the FORMAT string, it’s effectively discarded after the transformation, even if it was captured by the REGEX.
Once you’ve got your sourcetype and time parsing set up, the next logical step is to explore how to use props.conf for more advanced event segmentation, like breaking multi-line stack traces into single events.