Reliability
This document defines the reliability conventions for idempotent HTTP commands, durable event dispatch, retries, and failure handling. Read it before implementing any operation where a retry could create duplicate state or where losing a side effect would harm users or the business.
This convention extends
docs/decisions/outbox-pattern-as-reliability-escalation.md(Outbox pattern),docs/decisions/transaction-pipeline-behaviors.md(Transaction pipeline behaviors), anddocs/conventions/backend/observability.md.
1. Reliability Decision Ladder
Use the smallest reliable pattern that satisfies the business requirement.
| Requirement | Pattern |
|---|---|
| A user retry must not create duplicates | Idempotency key |
| A domain event may be handled in-process and can be retried manually | In-process Application.Reactions handler |
| A domain event must not be lost after commit | Outbox |
| Work must run later or on a schedule | Background job |
| Work must survive process restarts and scale across instances | Durable background job backed by PostgreSQL |
| Work must coordinate across services | Outbox plus an external broker, documented in a project ADR |
Do not use an external queue, scheduler, or broker without a project ADR. The default stack is ASP.NET Core hosted services plus PostgreSQL-backed tables owned by Infrastructure.
2. Idempotency Keys
State-changing POST and PATCH endpoints that can be retried by a browser, mobile app, worker, payment provider, or reverse proxy MUST support the Idempotency-Key header.
The endpoint requires idempotency when any of these are true:
- The operation creates a user-visible resource.
- The operation triggers an external side effect.
- The operation may be retried automatically by a client or gateway.
- The operation is expensive enough that duplicate execution matters.
The Idempotency-Key value is generated by the client. It is unique per logical operation and reused only for retries of the same operation.
// GOOD: endpoint requires an idempotency key for a create commandprivate static async Task<IResult> HandleAsync( CreateOrderRequest request, [FromHeader(Name = "Idempotency-Key")] string idempotencyKey, ICommandMediator commandMediator, CancellationToken cancellationToken){ var command = request.ToCommand(idempotencyKey); var result = await commandMediator.SendAsync(command, cancellationToken);
return Results.Created($"/orders/{result.OrderId.Value}", result.ToResponse());}// BAD: retriable POST has no idempotency keyprivate static async Task<IResult> HandleAsync( CreateOrderRequest request, ICommandMediator commandMediator, CancellationToken cancellationToken){ var result = await commandMediator.SendAsync(request.ToCommand(), cancellationToken); return Results.Created($"/orders/{result.OrderId.Value}", result.ToResponse());}Storage Contract
Idempotency is enforced in Infrastructure with a table such as IdempotencyRecords.
| Column | Purpose |
|---|---|
Key | The client-provided idempotency key |
UserId or TenantId | Scopes keys so different users cannot collide |
RequestHash | Hash of method, path, and normalized body |
Status | Started, Completed, or Failed |
ResponseStatusCode | Status code returned on the first successful execution |
ResponseBody | Serialized response body for replay |
CreatedAtUtc | Cleanup and audit |
ExpiresAtUtc | Retention boundary |
The (scope, key) pair MUST be unique. A repeated key with a different RequestHash MUST return 409 Conflict.
Idempotency records are written in the same command transaction as the aggregate change. The SaveChangesCommandPostHandler remains the only place that commits. See docs/blueprints/backend/idempotency.md for the pipeline integration (no separate SaveChangesAsync in the endpoint or service).
Background jobs
Background jobs that dispatch commands MUST reuse the same idempotency infrastructure as HTTP endpoints. Construct a deterministic IdempotencyKey from the job’s own identifier and the command’s logical operation.
// GOOD: job idempotency key is deterministic from job ID and command typevar idempotencyKey = $"job:{jobRunId}:send-reminder:{orderId.Value}";var command = new SendOrderReminderCommand{ OrderId = orderId, IdempotencyKey = idempotencyKey};await _commandMediator.SendAsync(command, cancellationToken);// BAD: job dispatches command with no idempotency key on a retriable operationawait _commandMediator.SendAsync(new SendOrderReminderCommand { OrderId = orderId }, cancellationToken);See docs/blueprints/backend/idempotency.md for the HTTP and Infrastructure wiring.
3. Outbox Escalation
Use the Outbox pattern when a domain event has a documented delivery requirement. Examples:
- Payment, billing, compliance, audit, or security notifications.
- Cross-system state changes.
- User notifications that must survive process restarts.
- Any event where manual reconstruction from logs is not acceptable.
Ownership
Infrastructure owns the outbox storage and dispatcher. Application.Reactions owns the event handler intent and narrow interfaces. Command handlers do not write directly to the outbox.
The transaction post-handler collects domain events from tracked aggregates, serializes them, and writes outbox rows before calling SaveChangesAsync. A hosted service or worker process dispatches outbox rows after commit.
Table Shape
| Column | Purpose |
|---|---|
Id | Unique outbox message ID |
EventType | Stable event type name |
Payload | Serialized event payload |
OccurredAtUtc | When the aggregate raised the event |
ProcessedAtUtc | Null until dispatch succeeds |
AttemptCount | Retry tracking |
NextAttemptAtUtc | Backoff scheduling |
LastError | Last failure summary |
Consumer Idempotency
Every outbox consumer MUST be idempotent. The outbox dispatcher may deliver an event more than once after a crash between external delivery and marking the row processed.
// GOOD: consumer ignores duplicate event IDspublic async Task NotifySubscribersAsync( OutboxMessageId messageId, PostId postId, CancellationToken cancellationToken){ if (await _sentNotifications.ExistsAsync(messageId, cancellationToken)) { return; }
await _emailSender.SendPostPublishedAsync(postId, cancellationToken); await _sentNotifications.RecordAsync(messageId, cancellationToken);}// BAD: duplicate delivery sends duplicate emailpublic async Task NotifySubscribersAsync( PostId postId, CancellationToken cancellationToken){ await _emailSender.SendPostPublishedAsync(postId, cancellationToken);}4. Migration Path From In-Process to Outbox
When a project escalates from in-process event handling to outbox-backed dispatch:
- Add an ADR naming the events that require durable delivery.
- Add the outbox table migration.
- Update the transaction post-handler to write outbox rows before
SaveChangesAsync. - Add an Infrastructure dispatcher hosted service or a separate worker.
- Make every consumer idempotent.
- Add health checks and metrics for pending, failed, and oldest outbox message age.
- Keep the
Application.Reactionshandler signatures stable unless the event payload itself changes.
Do not dispatch outbox messages inside the request transaction. The request commits the outbox row. The dispatcher handles delivery after commit.
5. Retry Rules
Retry only transient infrastructure failures. Do not retry validation failures, domain exceptions, not-found exceptions, or authorization failures.
Retries MUST use bounded exponential backoff with jitter. Infinite tight loops are forbidden.
| Failure | Retry? |
|---|---|
| HTTP 408, 429, 502, 503, 504 from an external service | Yes |
| Database deadlock or transient network failure | Yes |
CommandValidationException or QueryValidationException | No |
DomainException | No |
AggregateNotFoundException | No |
| HTTP 400, 401, 403, 404 from an external service | No |
Every retry loop MUST log the operation name, attempt count, next attempt time, and correlation ID. See docs/conventions/backend/observability.md.