Datadog’s Agent scrapes metrics from a source when it’s configured to do so.
Let’s watch it happen. Imagine we have a simple Python Flask application running locally. We want Datadog to collect its request count and latency.
First, we need to tell the Agent where to look for these metrics. This is done in the datadog.yaml configuration file, typically found in /etc/datadog-agent/ or /opt/datadog-agent/etc/. We’ll add a metrics.source section:
init_config:
instances:
- url: http://localhost:5000/metrics
namespace: flask_app
tags:
- environment:dev
- service:my-flask-service
Now, let’s run our Flask app. A minimal example would look like this:
from flask import Flask
import time
import prometheus_client
app = Flask(__name__)
# Create some Prometheus metrics
REQUEST_COUNT = prometheus_client.Counter('requests_total', 'Total number of requests received', ['method', 'endpoint'])
REQUEST_LATENCY = prometheus_client.Histogram('request_latency_seconds', 'Request latency in seconds', ['method', 'endpoint'])
# Decorator for metrics
def metrics_decorator(func):
def wrapper(*args, **kwargs):
start_time = time.time()
# Extract method and endpoint from incoming request
method = kwargs.get('methods', request.method) # Assuming request is available
endpoint = kwargs.get('endpoint', request.path) # Assuming request is available
# Increment request count
REQUEST_COUNT.labels(method=method, endpoint=endpoint).inc()
# Execute the actual function
result = func(*args, **kwargs)
latency = time.time() - start_time
# Record latency
REQUEST_LATENCY.labels(method=method, endpoint=endpoint).observe(latency)
return result
return wrapper
@app.route('/')
@metrics_decorator
def index():
# Simulate some work
time.sleep(0.1)
return "Hello, World!"
@app.route('/about')
@metrics_decorator
def about():
# Simulate some work
time.sleep(0.05)
return "About Page"
# Expose metrics at /metrics endpoint
@app.route('/metrics')
def metrics():
return prometheus_client.generate_latest(), 200, {'Content-Type': 'text/plain'}
if __name__ == '__main__':
# Need to import request for the decorator to work correctly
from flask import request
app.run(debug=True, port=5000)
When the Agent starts or is reloaded (sudo datadog-agent reload), it reads this metrics.source configuration. It sees url: http://localhost:5000/metrics. The Agent will then make an HTTP GET request to that URL at a regular interval (defaulting to every 15 seconds, configurable via interval in the instances block).
The response from http://localhost:5000/metrics will be in Prometheus exposition format. The Agent parses this, applying the namespace: flask_app and tags defined in the configuration to each metric it discovers. So, a metric like requests_total{method="GET",endpoint="/"} scraped from the source will appear in Datadog with the name flask_app.requests_total and the tags environment:dev and service:my-flask-service.
The namespace is crucial for organizing your metrics. It acts as a prefix, preventing naming collisions and providing a clear hierarchy. The tags allow you to filter, group, and alert on your metrics later in Datadog.
The core problem metrics.source solves is remote metric collection. Instead of the Agent needing to understand the internal workings of every application (like querying a database or a custom API endpoint directly), it simply asks for a standardized Prometheus output. The application is responsible for exposing its metrics in that format. This decouples the Agent from application-specific logic.
The url can point to any endpoint exposing Prometheus metrics. This could be a dedicated /metrics endpoint in your web application, or it could be a Prometheus agent like node_exporter or redis_exporter running on a different host. The datadog.yaml configuration is where you define all these sources and how the Agent should interpret them.
The scrape_timeout parameter, also configurable within the instances block, determines how long the Agent will wait for a response from the metric source before giving up on that particular scrape. A common value to set might be scrape_timeout: 5s. This prevents a slow or unresponsive metric endpoint from blocking the Agent’s collection loop for too long.
A lesser-known aspect is how the Agent handles multiple instances. If you define multiple instances blocks under metrics.source, the Agent will attempt to scrape each one independently. This means you can monitor several different applications or even different endpoints of the same application from a single Agent configuration, each with its own namespace and tags.
The next concept you’ll likely dive into is how to effectively query and visualize these scraped metrics within the Datadog platform.