Skip to content

Leader Election & Daemons

When you run multiple instances of your application, background work like outbox processing must not run on every node simultaneously. Emit handles this through two cooperating systems: leader election decides which node is in charge, and daemon coordination distributes the actual work across the cluster.

Leader election

Emit uses a heartbeat-based lease model. Every node periodically writes a heartbeat to the database and attempts to acquire a leadership lease using an atomic compare-and-swap. The database provides the consistency guarantee; no external consensus service is needed.

Each node is identified by a stable Guid generated once at startup, exposed through INodeIdentity. The heartbeat worker, distributed traces, and metrics all use this same identity, so you can correlate behavior back to a specific instance. Inject INodeIdentity to read the current node’s ID. Register a custom implementation before AddEmit() if you need to control the value (useful in environments where you want a stable identity tied to a pod name or container ID rather than a random GUID).

The lease model works like this:

  • Every live node writes a heartbeat on a configurable interval (default: 15 seconds).
  • The leader holds a time-limited lease (default: 60 seconds). It must renew the lease before it expires or another node can claim it.
  • If a node stops heartbeating (crash, network partition, clean shutdown), its lease expires and the next node to attempt renewal acquires leadership.

There is no separate failure detector or quorum calculation. If the database can be reached, leadership can be acquired. If it cannot, nothing runs, which is the safe outcome.

Observing leadership changes

Register an ILeaderElectionObserver to react when leadership changes on the current node:

  • OnLeaderElectedAsync: this node just became the leader.
  • OnLeaderLostAsync: this node just lost leadership.
  • OnNodeRegisteredAsync: a node has joined the cluster (visible to all nodes).
  • OnNodeRemovedAsync: a node has left the cluster (visible to all nodes).

The first two are useful for triggering leader-only work outside the daemon system. The latter two give you visibility into cluster membership if you want to log or monitor it.

Querying leadership status

Inject ILeaderElectionService to check the current node’s role at any point:

public class MyService(ILeaderElectionService leaderElection)
{
public void DoWork()
{
if (leaderElection.IsLeader)
{
// Only the leader reaches this branch
}
// This token cancels when the node loses leadership.
// Pass it to long-running leader-only operations.
var token = leaderElection.LeadershipToken;
}
}

LeadershipToken is particularly useful for cooperative cancellation: pass it into any long-running operation so it stops naturally when this node is no longer the leader, rather than having to poll IsLeader in a loop.

Daemon coordination

The leader does not run all background work itself. Instead, it acts as a coordinator that assigns work units called daemons to nodes in the cluster. This distributes load and handles the reality that nodes come and go.

A daemon is any background task that implements IDaemonAgent. The outbox worker (emit:outbox) is the primary built-in daemon; additional daemons may be registered for other recurring work.

How assignment works

On each heartbeat cycle, the leader reviews all daemon assignments: assigning unassigned daemons to the least-loaded live node and reclaiming assignments from nodes that have stopped heartbeating.

Every node (leader included) polls for assignments directed at it. When a node receives an assignment, it acknowledges and starts the daemon. When an assignment is revoked, the node drains in-flight work and confirms it has stopped before the leader reassigns elsewhere.

Assignment lifecycle

StateMeaning
AssigningLeader assigned the daemon; waiting for the node to acknowledge
ActiveNode acknowledged; daemon is running
RevokingLeader requested shutdown; waiting for the node to drain and confirm
DeletedNode confirmed shutdown; assignment is cleaned up

If a node fails to acknowledge within AcknowledgeTimeout (default: 30 seconds) or fails to drain within DrainTimeout (default: 30 seconds), the leader treats it as unresponsive and force-reassigns the daemon to another node.

Configuration

The key options and their defaults:

OptionDefaultNotes
HeartbeatInterval15 secondsHow often each node writes a heartbeat
LeaseDuration60 secondsHow long a leadership lease is valid before expiry
QueryTimeout5 secondsTimeout for individual database operations in the election system
NodeRegistrationTtl90 secondsHow long a node’s registration persists after its last heartbeat
InstanceIdnull (uses MachineName)Override to assign a stable, human-readable identity to this node
AcknowledgeTimeout30 secondsTime allowed for a node to acknowledge a daemon assignment
DrainTimeout30 secondsTime allowed for a node to drain in-flight work when a daemon is revoked

LeaseDuration should be meaningfully larger than HeartbeatInterval. The default ratio of 4:1 gives the leader four missed heartbeats before another node takes over, which is enough tolerance for transient database latency without leaving work uncoordinated for long after a real failure.

See Configuration Reference for the full option set.