Skip to content

Reliability

This document defines the reliability conventions for idempotent HTTP commands, durable event dispatch, retries, and failure handling. Read it before implementing any operation where a retry could create duplicate state or where losing a side effect would harm users or the business.

This convention extends docs/decisions/outbox-pattern-as-reliability-escalation.md (Outbox pattern), docs/decisions/transaction-pipeline-behaviors.md (Transaction pipeline behaviors), and docs/conventions/backend/observability.md.


1. Reliability Decision Ladder

Use the smallest reliable pattern that satisfies the business requirement.

RequirementPattern
A user retry must not create duplicatesIdempotency key
A domain event may be handled in-process and can be retried manuallyIn-process Application.Reactions handler
A domain event must not be lost after commitOutbox
Work must run later or on a scheduleBackground job
Work must survive process restarts and scale across instancesDurable background job backed by PostgreSQL
Work must coordinate across servicesOutbox plus an external broker, documented in a project ADR

Do not use an external queue, scheduler, or broker without a project ADR. The default stack is ASP.NET Core hosted services plus PostgreSQL-backed tables owned by Infrastructure.


2. Idempotency Keys

State-changing POST and PATCH endpoints that can be retried by a browser, mobile app, worker, payment provider, or reverse proxy MUST support the Idempotency-Key header.

The endpoint requires idempotency when any of these are true:

  • The operation creates a user-visible resource.
  • The operation triggers an external side effect.
  • The operation may be retried automatically by a client or gateway.
  • The operation is expensive enough that duplicate execution matters.

The Idempotency-Key value is generated by the client. It is unique per logical operation and reused only for retries of the same operation.

// GOOD: endpoint requires an idempotency key for a create command
private static async Task<IResult> HandleAsync(
CreateOrderRequest request,
[FromHeader(Name = "Idempotency-Key")] string idempotencyKey,
ICommandMediator commandMediator,
CancellationToken cancellationToken)
{
var command = request.ToCommand(idempotencyKey);
var result = await commandMediator.SendAsync(command, cancellationToken);
return Results.Created($"/orders/{result.OrderId.Value}", result.ToResponse());
}
// BAD: retriable POST has no idempotency key
private static async Task<IResult> HandleAsync(
CreateOrderRequest request,
ICommandMediator commandMediator,
CancellationToken cancellationToken)
{
var result = await commandMediator.SendAsync(request.ToCommand(), cancellationToken);
return Results.Created($"/orders/{result.OrderId.Value}", result.ToResponse());
}

Storage Contract

Idempotency is enforced in Infrastructure with a table such as IdempotencyRecords.

ColumnPurpose
KeyThe client-provided idempotency key
UserId or TenantIdScopes keys so different users cannot collide
RequestHashHash of method, path, and normalized body
StatusStarted, Completed, or Failed
ResponseStatusCodeStatus code returned on the first successful execution
ResponseBodySerialized response body for replay
CreatedAtUtcCleanup and audit
ExpiresAtUtcRetention boundary

The (scope, key) pair MUST be unique. A repeated key with a different RequestHash MUST return 409 Conflict.

Idempotency records are written in the same command transaction as the aggregate change. The SaveChangesCommandPostHandler remains the only place that commits. See docs/blueprints/backend/idempotency.md for the pipeline integration (no separate SaveChangesAsync in the endpoint or service).

Background jobs

Background jobs that dispatch commands MUST reuse the same idempotency infrastructure as HTTP endpoints. Construct a deterministic IdempotencyKey from the job’s own identifier and the command’s logical operation.

// GOOD: job idempotency key is deterministic from job ID and command type
var idempotencyKey = $"job:{jobRunId}:send-reminder:{orderId.Value}";
var command = new SendOrderReminderCommand
{
OrderId = orderId,
IdempotencyKey = idempotencyKey
};
await _commandMediator.SendAsync(command, cancellationToken);
// BAD: job dispatches command with no idempotency key on a retriable operation
await _commandMediator.SendAsync(new SendOrderReminderCommand { OrderId = orderId }, cancellationToken);

See docs/blueprints/backend/idempotency.md for the HTTP and Infrastructure wiring.


3. Outbox Escalation

Use the Outbox pattern when a domain event has a documented delivery requirement. Examples:

  • Payment, billing, compliance, audit, or security notifications.
  • Cross-system state changes.
  • User notifications that must survive process restarts.
  • Any event where manual reconstruction from logs is not acceptable.

Ownership

Infrastructure owns the outbox storage and dispatcher. Application.Reactions owns the event handler intent and narrow interfaces. Command handlers do not write directly to the outbox.

The transaction post-handler collects domain events from tracked aggregates, serializes them, and writes outbox rows before calling SaveChangesAsync. A hosted service or worker process dispatches outbox rows after commit.

Table Shape

ColumnPurpose
IdUnique outbox message ID
EventTypeStable event type name
PayloadSerialized event payload
OccurredAtUtcWhen the aggregate raised the event
ProcessedAtUtcNull until dispatch succeeds
AttemptCountRetry tracking
NextAttemptAtUtcBackoff scheduling
LastErrorLast failure summary

Consumer Idempotency

Every outbox consumer MUST be idempotent. The outbox dispatcher may deliver an event more than once after a crash between external delivery and marking the row processed.

// GOOD: consumer ignores duplicate event IDs
public async Task NotifySubscribersAsync(
OutboxMessageId messageId,
PostId postId,
CancellationToken cancellationToken)
{
if (await _sentNotifications.ExistsAsync(messageId, cancellationToken))
{
return;
}
await _emailSender.SendPostPublishedAsync(postId, cancellationToken);
await _sentNotifications.RecordAsync(messageId, cancellationToken);
}
// BAD: duplicate delivery sends duplicate email
public async Task NotifySubscribersAsync(
PostId postId,
CancellationToken cancellationToken)
{
await _emailSender.SendPostPublishedAsync(postId, cancellationToken);
}

4. Migration Path From In-Process to Outbox

When a project escalates from in-process event handling to outbox-backed dispatch:

  1. Add an ADR naming the events that require durable delivery.
  2. Add the outbox table migration.
  3. Update the transaction post-handler to write outbox rows before SaveChangesAsync.
  4. Add an Infrastructure dispatcher hosted service or a separate worker.
  5. Make every consumer idempotent.
  6. Add health checks and metrics for pending, failed, and oldest outbox message age.
  7. Keep the Application.Reactions handler signatures stable unless the event payload itself changes.

Do not dispatch outbox messages inside the request transaction. The request commits the outbox row. The dispatcher handles delivery after commit.


5. Retry Rules

Retry only transient infrastructure failures. Do not retry validation failures, domain exceptions, not-found exceptions, or authorization failures.

Retries MUST use bounded exponential backoff with jitter. Infinite tight loops are forbidden.

FailureRetry?
HTTP 408, 429, 502, 503, 504 from an external serviceYes
Database deadlock or transient network failureYes
CommandValidationException or QueryValidationExceptionNo
DomainExceptionNo
AggregateNotFoundExceptionNo
HTTP 400, 401, 403, 404 from an external serviceNo

Every retry loop MUST log the operation name, attempt count, next attempt time, and correlation ID. See docs/conventions/backend/observability.md.