Leader Election & Daemons

When you run multiple instances of your application, background work like outbox processing must not run on every node simultaneously. Emit handles this through two cooperating systems: leader election decides which node is in charge, and daemon coordination distributes the actual work across the cluster.

Leader election

Emit uses a heartbeat-based lease model. Every node periodically writes a heartbeat to the database and attempts to acquire a leadership lease using an atomic compare-and-swap. The database provides the consistency guarantee; no external consensus service is needed.

Each node is identified by a stable Guid generated once at startup, exposed through INodeIdentity. The heartbeat worker, distributed traces, and metrics all use this same identity, so you can correlate behavior back to a specific instance. Inject INodeIdentity to read the current node’s ID. Register a custom implementation before AddEmit() if you need to control the value (useful in environments where you want a stable identity tied to a pod name or container ID rather than a random GUID).

The lease model works like this:

Every live node writes a heartbeat on a configurable interval (default: 15 seconds).
The leader holds a time-limited lease (default: 60 seconds). It must renew the lease before it expires or another node can claim it.
If a node stops heartbeating (crash, network partition, clean shutdown), its lease expires and the next node to attempt renewal acquires leadership.

There is no separate failure detector or quorum calculation. If the database can be reached, leadership can be acquired. If it cannot, nothing runs, which is the safe outcome.

Observing leadership changes

OnLeaderElectedAsync: this node just became the leader.
OnLeaderLostAsync: this node just lost leadership.
OnNodeRegisteredAsync: a node has joined the cluster (visible to all nodes).
OnNodeRemovedAsync: a node has left the cluster (visible to all nodes).

The first two are useful for triggering leader-only work outside the daemon system. The latter two give you visibility into cluster membership if you want to log or monitor it.

Querying leadership status

Inject ILeaderElectionService to check the current node’s role at any point:

public class MyService(ILeaderElectionService leaderElection)
{
    public void DoWork()
    {
        if (leaderElection.IsLeader)
        {
            // Only the leader reaches this branch
        }

        // This token cancels when the node loses leadership.
        // Pass it to long-running leader-only operations.
        var token = leaderElection.LeadershipToken;
    }
}

LeadershipToken is particularly useful for cooperative cancellation: pass it into any long-running operation so it stops naturally when this node is no longer the leader, rather than having to poll IsLeader in a loop.

Daemon coordination

The leader does not run all background work itself. Instead, it acts as a coordinator that assigns work units called daemons to nodes in the cluster. This distributes load and handles the reality that nodes come and go.

A daemon is any background task that implements IDaemonAgent. The outbox worker (emit:outbox) is the primary built-in daemon; additional daemons may be registered for other recurring work.

How assignment works

On each heartbeat cycle, the leader reviews all daemon assignments: assigning unassigned daemons to the least-loaded live node and reclaiming assignments from nodes that have stopped heartbeating.

Every node (leader included) polls for assignments directed at it. When a node receives an assignment, it acknowledges and starts the daemon. When an assignment is revoked, the node drains in-flight work and confirms it has stopped before the leader reassigns elsewhere.

Assignment lifecycle

State	Meaning
Assigning	Leader assigned the daemon; waiting for the node to acknowledge
Active	Node acknowledged; daemon is running
Revoking	Leader requested shutdown; waiting for the node to drain and confirm
Deleted	Node confirmed shutdown; assignment is cleaned up

If a node fails to acknowledge within AcknowledgeTimeout (default: 30 seconds) or fails to drain within DrainTimeout (default: 30 seconds), the leader treats it as unresponsive and force-reassigns the daemon to another node.

Configuration

The key options and their defaults:

Option	Default	Notes
`HeartbeatInterval`	15 seconds	How often each node writes a heartbeat
`LeaseDuration`	60 seconds	How long a leadership lease is valid before expiry
`QueryTimeout`	5 seconds	Timeout for individual database operations in the election system
`NodeRegistrationTtl`	90 seconds	How long a node’s registration persists after its last heartbeat
`InstanceId`	`null` (uses `MachineName`)	Override to assign a stable, human-readable identity to this node
`AcknowledgeTimeout`	30 seconds	Time allowed for a node to acknowledge a daemon assignment
`DrainTimeout`	30 seconds	Time allowed for a node to drain in-flight work when a daemon is revoked

LeaseDuration should be meaningfully larger than HeartbeatInterval. The default ratio of 4:1 gives the leader four missed heartbeats before another node takes over, which is enough tolerance for transient database latency without leaving work uncoordinated for long after a real failure.

See Configuration Reference for the full option set.