SRE Performance Budgets: Cap Latency and Error Rates (2026)

Performance budgets are a surprisingly effective way to force engineering teams to confront the reality of their system’s performance before it causes user-facing problems.

Let’s imagine we’re running an e-commerce site. A core user journey is "add to cart." We want to measure how long this takes and how often it fails.

# prometheus.yml
scrape_configs:
  - job_name: 'frontend'
    static_configs:
      - targets: ['localhost:9090'] # Assuming Prometheus is running locally for this example

- job_name: 'backend_add_to_cart'
  static_configs:
    - targets: ['backend-service:8080'] # Your actual backend service address

Here’s a simplified Go service that simulates an "add to cart" operation, exposing Prometheus metrics:

package main

import (
	"math/rand"
	"net/http"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
	addToCartLatency = promauto.NewHistogramVec(prometheus.HistogramOpts{
		Name: "add_to_cart_latency_seconds",
		Help: "Latency of the add to cart operation.",
		Buckets: []float64{0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 15.0, 30.0},
	}, []string{"status"})

	addToCartRequests = promauto.NewCounterVec(prometheus.CounterOpts{
		Name: "add_to_cart_requests_total",
		Help: "Total number of add to cart requests.",
	}, []string{"status"})
)

func addToCartHandler(w http.ResponseWriter, r *http.Request) {
	start := time.Now()
	status := "success"
	defer func() {
		duration := time.Since(start).Seconds()
		addToCartLatency.WithLabelValues(status).Observe(duration)
		addToCartRequests.WithLabelValues(status).Inc()
	}()

	// Simulate work
	if rand.Float64() < 0.05 { // 5% chance of error
		status = "error"
		http.Error(w, "Internal Server Error", http.StatusInternalServerError)
		return
	}

	time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond) // Simulate latency up to 1 second

	w.WriteHeader(http.StatusOK)
	w.Write([]byte("Item added to cart"))
}

func main() {
	http.HandleFunc("/add-to-cart", addToCartHandler)
	http.Handle("/metrics", promhttp.Handler())
	http.ListenAndServe(":8080", nil)
}

This service exposes /metrics which Prometheus scrapes. The add_to_cart_latency_seconds histogram tracks latencies, and add_to_cart_requests_total counts requests, both labeled by status (success/error).

The core problem performance budgets solve is the "it works on my machine" or "it’s fast enough" syndrome. Without a concrete target, teams optimize for perceived performance or convenience, not user experience. A performance budget turns subjective goals into objective, measurable SLOs (Service Level Objectives).

Here’s how we’d define a budget in Prometheus Alertmanager, targeting the "add to cart" journey. We want 99% of requests to complete in under 2 seconds, and an error rate below 0.5%.

# alertmanager.yml
route:
  group_by: ['alertname']
  receiver: 'slack-notifications'

receivers:
- name: 'slack-notifications'
  slack_configs:
  - api_url: '<your_slack_webhook_url>'
    channel: '#alerts'

groups:
- name: performance-budgets
  rules:
  - alert: HighAddToCartLatency
    expr: |
      histogram_quantile(0.99, sum(rate(add_to_cart_latency_seconds_bucket[5m])) by (le, status)) > 2.0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Add to cart latency is too high."
      description: "99th percentile latency for add to cart is > 2.0 seconds for the last 5 minutes."

  - alert: HighAddToCartErrorRate
    expr: |
      sum(rate(add_to_cart_requests_total{status="error"}[5m])) / sum(rate(add_to_cart_requests_total[5m])) * 100 > 0.5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Add to cart error rate is too high."
      description: "Error rate for add to cart requests is > 0.5% for the last 5 minutes."

This configuration tells Alertmanager:

HighAddToCartLatency: If the 99th percentile of add-to-cart request durations (calculated using histogram_quantile over a 5-minute rate) exceeds 2.0 seconds for 5 consecutive minutes, fire a warning.
HighAddToCartErrorRate: If the rate of errors (requests where status="error") divided by the total rate of requests exceeds 0.5% for 5 consecutive minutes, fire a warning.

The "why it works" is simple: when these alerts fire, it’s not a vague "users are complaining." It’s a precise, data-driven indicator that a specific user journey is degraded, and the system is configured to automatically notify the team responsible. This creates a feedback loop where performance is a first-class citizen, not an afterthought. Teams are incentivized to optimize their code, infrastructure, and dependencies to stay within these defined boundaries.

The most surprising thing about performance budgets is how effectively they shift the culture around performance. Instead of performance being a reactive firefighting activity or a feature team’s "nice-to-have," it becomes a proactive, non-negotiable requirement for shipping code. Teams will start making conscious decisions during design and implementation: "Will this change violate our latency budget?" or "Can we afford this extra API call if it impacts our error rate budget?"

The histogram_quantile function is where the magic happens for latency budgets. It uses the pre-aggregated histogram buckets to estimate the quantile, which is far more efficient than trying to store and process every single latency observation. This is crucial for high-throughput systems where storing raw data would be infeasible. The rate() function then converts these counts into a per-second rate, allowing us to measure the current performance trend, not just a historical average. The for: 5m clause ensures we’re not reacting to transient blips, but to sustained degradation.

The next concept to explore is how to use these budgets to drive automated remediation, rather than just alerts.

More Deep Dives in Sre