Traefik’s Prometheus metrics endpoint is enabled by default, but it’s not actually scraped by Prometheus unless you configure it to do so.

Let’s see Traefik spitting out metrics in real-time. First, make sure Traefik is running and has some routes configured. You can then curl its metrics endpoint:

curl http://localhost:8080/metrics

You’ll see a stream of text like this, which is Prometheus’s exposition format:

# HELP traefik_api_request_duration_seconds Traefik API request duration.
# TYPE traefik_api_request_duration_seconds histogram
traefik_api_request_duration_seconds_bucket{handler="/api/rawdata",le="0.001"} 0
traefik_api_request_duration_seconds_bucket{handler="/api/rawdata",le="0.005"} 0
traefik_api_request_duration_seconds_bucket{handler="/api/rawdata",le="0.01"} 0
...
# HELP traefik_entrypoint_flux_duration_seconds Traefik entrypoint flux duration.
# TYPE traefik_entrypoint_flux_duration_seconds histogram
traefik_entrypoint_flux_duration_seconds_bucket{entrypoint="web",le="0.001"} 0
traefik_entrypoint_flux_duration_seconds_bucket{entrypoint="web",le="0.005"} 0
traefik_entrypoint_flux_duration_seconds_bucket{entrypoint="web",le="0.01"} 0
...
# HELP traefik_router_requests_total Total number of requests handled by a router.
# TYPE traefik_router_requests_total counter
traefik_router_requests_total{entrypoint="web",router="my-router",service="my-service"} 123

This output represents the internal state of Traefik: request counts, durations, error rates, and more, all tagged with useful labels like entrypoint, router, and service.

The core problem Traefik’s metrics solve is providing deep visibility into your ingress traffic before it even hits your application services. You can see how Traefik itself is performing: Is it overloaded? Are specific routers failing? Are there network issues between Traefik and your backends? This allows you to debug ingress-level problems without needing to deploy instrumentation within every single application.

Traefik exposes these metrics via an HTTP endpoint, typically on port 8080 (the Traefik API port) at the /metrics path. This endpoint is enabled by default in recent Traefik versions. When Prometheus scrapes this endpoint, it collects time-series data for each metric. Each metric has a name (e.g., traefik_router_requests_total) and can have multiple labels (key-value pairs that categorize the metric, like router="my-router"). Prometheus stores this data and makes it queryable via PromQL.

To visualize these metrics, you’ll typically use Grafana. You add Prometheus as a data source in Grafana and then create dashboards. For Traefik, common metrics to visualize include:

  • traefik_router_requests_total: To see overall request volume per router.
  • traefik_router_response_duration_seconds: To monitor request latency handled by Traefik.
  • traefik_router_response_bytes_total: To track bandwidth usage.
  • traefik_entrypoint_flux_duration_seconds: To understand latency at the entry point level.

The key to effectively using these metrics is understanding the labels. For instance, traefik_router_requests_total{router="api", service="api"} tells you the total requests to the api router, which is serving the api service. If you have multiple entrypoints (e.g., web for HTTP and websecure for HTTPS), you’ll see metrics broken down by entrypoint.

A subtle but powerful aspect of Traefik’s metrics is how they track the health of the connection between Traefik and your backend services, not just the application’s response. Metrics like traefik_service_request_duration_seconds (if enabled and configured) can reveal if Traefik is experiencing slow responses from a particular backend service, even if the application itself is healthy. This distinction is crucial for pinpointing the layer of failure.

The most surprising thing most users don’t realize is that Traefik’s metrics are generated before the request is fully processed by the router and sent to a service. This means metrics like traefik_router_requests_total increment as soon as Traefik decides to route the request, not after the backend has responded. This is why you might see a high count for traefik_router_requests_total even if the backend service is timing out, as Traefik has already registered the "request" at its own level.

Once you have Prometheus scraping Traefik and Grafana visualizing the data, the next logical step is to set up alerting on these metrics.

Want structured learning?

Take the full Traefik course →