inputs.conf’s monitor stanza is the workhorse for collecting log files in Splunk, but its simplicity hides a few surprising behaviors.
Let’s see it in action. Imagine you have a web server generating access logs in /var/log/apache2/access.log. To get Splunk to watch this file and index its contents, you’d add a stanza like this to your Splunk Universal Forwarder’s inputs.conf:
[monitor:///var/log/apache2/access.log]
index = weblogs
sourcetype = apache_access
When Splunk starts, it sees this configuration. It checks /var/log/apache2/access.log. If it’s a new file, it starts reading from the beginning. If it’s an existing file, it checks a special file called splunk- (_<index>_<sourcetype>_<filename_hash>).state (e.g., splunk-weblogs_apache_access_1234567890.state). This state file stores the last byte offset read from the source. Splunk then resumes reading from that exact point. New lines appended to the file are read and indexed.
The core problem inputs.conf with monitor solves is continuous, reliable log ingestion. Instead of manually tailing files or writing custom scripts for every log source, Splunk abstracts this. It handles file rotation, network interruptions, and ensures no data is missed by maintaining its state.
Internally, Splunk’s file monitoring mechanism uses operating system calls to watch for file changes. When a file is modified, the monitor input process wakes up, checks the current file size against its recorded state, and reads any new data. This is a polling mechanism, but it’s highly efficient, minimizing CPU overhead.
Here’s a typical inputs.conf for monitoring multiple log files:
[monitor:///var/log/apache2/access.log]
index = weblogs
sourcetype = apache_access
disabled = false
[monitor:///var/log/apache2/error.log]
index = weblogs
sourcetype = apache_error
disabled = false
[monitor:///var/log/syslog]
index = os
sourcetype = syslog
disabled = false
[monitor:///opt/myapp/logs/*.log]
index = myapp
sourcetype = myapp_log
The index parameter specifies which Splunk index the data should be sent to. The sourcetype parameter is crucial for Splunk to parse and categorize the data correctly, enabling specific search and reporting capabilities. disabled = false is often implied but explicitly stating it ensures the input is active.
When Splunk restarts, it re-reads its .state files. If a monitored file was deleted and then recreated, Splunk will treat it as a new file and start from the beginning, potentially re-indexing old data. This is a common point of confusion. To prevent this, Splunk has a built-in mechanism: crcSaltedHash. If you add crcSaltedHash = true to your monitor stanza, Splunk will calculate a CRC32 hash of the file’s content. If the hash changes, Splunk treats it as a new file. If only the file inode changes (like after a rotation or deletion/recreation), but the content is the same, the hash will also be the same, and Splunk will continue from its last known position.
[monitor:///var/log/apache2/access.log]
index = weblogs
sourcetype = apache_access
crcSaltedHash = true
This crcSaltedHash parameter is key for robust log rotation handling. Without it, a common scenario is when a log file is rotated (e.g., access.log becomes access.log.1 and a new access.log is created). Splunk might lose its position or re-index. With crcSaltedHash = true, Splunk can better discern between a truly new file and a rotated one, ensuring data integrity.
Another important, often overlooked aspect is the ignoreOlderThan setting. This parameter tells Splunk to stop monitoring files that are older than a specified time. For example, ignoreOlderThan = 1d will prevent Splunk from indexing any log entries in a file that was last modified more than one day ago. This is useful for preventing accidental indexing of historical archives that might be present in a monitored directory.
[monitor:///var/log/archive/*.log]
index = archive_logs
sourcetype = old_log
ignoreOlderThan = 7d
This configuration ensures that only relatively recent logs within the archive directory are ingested, keeping your Splunk index cleaner and more relevant.
The one thing that often trips up new users is how Splunk handles file permissions and ownership. The user account running the Splunk Universal Forwarder needs read access to the log files and execute access to the directories containing them. If Splunk cannot read a file, it will log an error like "permission denied" and simply stop monitoring that specific file until permissions are corrected. It doesn’t stop the entire monitoring process, but it can lead to data gaps for that particular source.
The next common challenge is configuring sourcetypes effectively for complex log formats, especially when dealing with multiline events.