Splunk scripted inputs are a powerful way to pull in data that doesn’t fit into Splunk’s standard collection methods.

Let’s say you have a custom application spitting out metrics to a unique endpoint, or a legacy system that only exposes data via a proprietary API. Splunk’s built-in forwarders and log collection might not cut it. That’s where scripted inputs shine. They allow you to run any script – Python, Bash, PowerShell, you name it – on your Splunk instance, and whatever that script outputs to standard output (stdout) gets ingested by Splunk as events.

Imagine a scenario where you need to monitor the health of a custom microservice. This service exposes a /healthz endpoint that returns JSON. We can write a simple Python script to fetch this data and print it out.

Here’s a Python script, let’s call it microservice_health.py:

import requests
import json
import sys
import datetime

try:
    response = requests.get("http://your-microservice-host:8080/healthz")
    response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
    health_data = response.json()

    # Add a timestamp to the data
    health_data['timestamp'] = datetime.datetime.utcnow().isoformat()

    print(json.dumps(health_data))

except requests.exceptions.RequestException as e:
    # Log the error to stderr, which Splunk can capture separately if configured
    print(f"Error fetching microservice health: {e}", file=sys.stderr)
    sys.exit(1) # Exit with a non-zero status to indicate failure
except json.JSONDecodeError:
    print("Error decoding JSON response from microservice.", file=sys.stderr)
    sys.exit(1)

To make this work, you’d save this script on your Splunk Search Head (or Heavy Forwarder, depending on your architecture) in a directory like /opt/splunk/etc/apps/my_custom_app/bin/.

Now, within Splunk, you need to configure this script as an input. Go to Settings > Data inputs. Click Add new next to Scripted Inputs.

You’ll see fields like:

  • Script name: microservice_health.py
  • Script path: /opt/splunk/etc/apps/my_custom_app/bin/
  • Run on: Choose where the script should run. For this example, "Search Head" is appropriate.
  • Interval: How often to run the script. Let’s set it to 60 seconds.
  • Output mode: Crucially, set this to json. This tells Splunk to expect JSON output and parse it accordingly, creating fields automatically.
  • Source type: Assign a source type, e.g., my:microservice:health.
  • Index: Specify the index to send the data to, e.g., main.

Once saved, Splunk will execute /opt/splunk/etc/apps/my_custom_app/bin/microservice_health.py every 60 seconds. If the script outputs valid JSON to stdout, Splunk will ingest it.

Let’s look at what happens internally. When Splunk runs a scripted input, it essentially executes the script as a subprocess. The interval dictates how frequently this subprocess is spawned. The standard output of that subprocess is captured. If output_mode is json, Splunk parses each line of stdout as a JSON object. For non-JSON outputs, it treats each line as a raw event. The sourcetype is then applied, and the event is sent to the specified index. Errors written to standard error (stderr) by the script are typically logged by Splunk in its internal logs (_internal index), which is invaluable for debugging.

The real power comes from the output_mode=json. If your script outputs something like:

{"status": "ok", "latency_ms": 55, "dependencies": {"db": "connected"}}

Splunk will automatically create fields like status, latency_ms, and dependencies.db. You can then search for status=ok or dependencies.db=connected.

If you don’t specify output_mode=json, Splunk treats the output as raw text. You’d then need to use Splunk’s extraction mechanisms (like rex or kvm) in your searches to parse the data. For example, if your script outputted health=ok latency=55ms db=connected, without JSON mode, you’d search like index=main sourcetype=my:microservice:health | rex "latency=(?<latency_ms>\\d+)ms" to get the latency. JSON mode is almost always preferred for structured data.

The flexibility extends to what the script can do. It’s not just about fetching data. Your script could:

  • Query a database and output results.
  • Interact with cloud APIs (AWS, Azure, GCP).
  • Process files and extract specific information.
  • Run system commands and parse their output.

The key is that the script’s primary output for Splunk must be directed to stdout. Anything sent to stderr is generally treated as an error message by Splunk and logged internally, which is a good practice for debugging.

A common pitfall is forgetting to set output_mode=json when your script is outputting JSON. This leads to events being ingested as raw text, and you’ll spend time trying to extract fields that Splunk could have created automatically. Always ensure your script’s stdout is valid JSON if you’re using that output mode.

Once you have your scripted inputs running, the next logical step is often to build dashboards and alerts based on the custom data you’re now collecting.

Want structured learning?

Take the full Splunk course →