The most surprising thing about Tempo’s gossip-based memberlist is how it prioritizes availability over absolute consistency, making it resilient to network partitions.

Let’s see this in action. Imagine a Tempo cluster with three instances: tempo-01, tempo-02, and tempo-03. They’re configured to use gossip for memberlist coordination.

# tempo.yaml
distributor:
  # ... other distributor config ...
  memberlist:
    bind_addr: "0.0.0.0:7946" # Gossip port
    join_as_wan: false
    # Pointing to at least one other member is sufficient for bootstrapping
    # In a real cluster, you'd have multiple initial peers for redundancy
    join:
      - "tempo-01:7946"
      - "tempo-02:7946"

ingester:
  # ... other ingester config ...
  memberlist:
    bind_addr: "0.0.0.0:7946"
    join_as_wan: false
    join:
      - "tempo-01:7946"
      - "tempo-02:7946"

querier:
  # ... other querier config ...
  memberlist:
    bind_addr: "0.0.0.0:7946"
    join_as_wan: false
    join:
      - "tempo-01:7946"
      - "tempo-02:7946"

Here, each Tempo component (distributor, ingester, querier) is configured with its own memberlist section. The bind_addr is the interface and port Tempo listens on for gossip messages. The join list provides initial "seed" nodes that a new member will contact to discover others in the cluster. join_as_wan is false because this is an internal cluster communication.

When tempo-03 starts, it first tries to connect to tempo-01:7946 and tempo-02:7946. Once connected, it receives a list of all known members in the cluster. Every member periodically gossips its state (alive, dead, last seen) to a random subset of other members. This "swim" protocol ensures that information about cluster membership eventually propagates throughout the network.

The beauty of this is how it handles failures. If tempo-02 suddenly becomes unreachable (e.g., network cable pulled, instance crashes), its neighbors will eventually mark it as dead after a timeout. This "dead" status will then be gossiped to other members. Tempo doesn’t wait for tempo-02 to explicitly say goodbye; it infers its departure. This is crucial for maintaining an up-to-date view of the cluster without requiring a centralized coordination service that could become a single point of failure.

Tempo uses this memberlist to know which ingesters are available to receive traces, which distributors can accept spans, and which queriers can answer queries. If tempo-02 goes down, distributors will stop sending data to it, and queriers will stop including it in their search scope. The system continues to operate, albeit with reduced capacity.

A key aspect of this gossip protocol is that it doesn’t guarantee that every node has the exact same view of the memberlist at any given millisecond. Instead, it guarantees eventual consistency. This means that over time, all nodes will converge on the same membership list. For a distributed tracing system, this probabilistic convergence is usually sufficient. A distributor might briefly send a span to an ingester that has just crashed but hasn’t yet been marked as dead by its neighbors. That span might be lost. Similarly, a querier might briefly query a node that is about to go offline. These are acceptable trade-offs for avoiding a centralized bottleneck.

The effective range of the gossip protocol is influenced by the gossip_interval and gossip_to settings within the memberlist configuration. gossip_interval (e.g., "100ms") determines how often a node sends gossip messages. gossip_to (e.g., 3) dictates how many other random members a node gossips to in each interval. Increasing gossip_to can speed up convergence but increases network traffic. Decreasing gossip_interval also speeds up convergence but increases traffic.

When you inspect the memberlist, you typically do so via the Tempo API, often by querying a specific instance’s /debug/pprof/heap or a dedicated /members endpoint (if exposed). For example, using curl on a Tempo instance:

curl http://localhost:3200/debug/pprof/heap > heap.pprof # Note: This is for general debugging, not direct memberlist inspection.

A more direct way to see the memberlist state is through Prometheus metrics. Tempo exposes metrics like tempo_distributor_members, tempo_ingester_members, and tempo_querier_members, which indicate the number of known members for each component. You can also query the internal state using the runtime.debug.Members() function if you were to build a custom debug endpoint, but typically you’d rely on the metrics or the /debug/members endpoint if available.

If you are experiencing issues where nodes are not being discovered, it’s often due to network connectivity problems between the bind_addr and join addresses. Ensure that the gossip port (default 7946) is open and reachable between all Tempo nodes. A common mistake is configuring the join addresses to be internal IPs that are not routable from where other nodes are starting.

The next challenge you’ll often encounter is managing the lifecycle of these members, particularly during upgrades or scaling events, where you need to gracefully bring nodes online and offline without disrupting tracing availability.

Want structured learning?

Take the full Tempo course →