Runbook: Rollback a Release
This runbook covers reverting to the previous production image after a failed deployment.
Schema rollback is not covered here. If the failed deployment included a migration, coordinate with the engineering lead before rolling back the application image — rolling back the app without rolling back the schema may cause incompatibilities.
Prerequisites
- The previous known-good image digest is available (from the deployment notes or CI pipeline history).
- You have write access to the deployment target.
- The engineering lead has assessed whether the migration is safe to leave in place.
When to Roll Back
Roll back when:
- Health checks fail after the new image is deployed and do not recover within 5 minutes.
- Error rates spike significantly above baseline within 10 minutes of deployment.
- A critical regression is confirmed in production monitoring.
- The team decides the risk of the new version outweighs the cost of rollback.
Steps
1. Identify the previous image digest
# From deployment notes, or from CI pipeline history for the previous deploymentPREVIOUS_IMAGE=ghcr.io/your-org/your-project/api@sha256:<previous-digest>2. Deploy the previous image
ssh deploy@your-vps << 'EOF' cd /opt/your-app export API_IMAGE="ghcr.io/your-org/your-project/api@sha256:<previous-digest>" export WORKER_IMAGE="ghcr.io/your-org/your-project/worker@sha256:<previous-digest>" export WEB_IMAGE="ghcr.io/your-org/your-project/web@sha256:<previous-digest>" docker compose -f infra/docker-compose.prod.yml pull docker compose -f infra/docker-compose.prod.yml up -dEOF3. Verify health checks
for i in $(seq 1 12); do STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://api.yourdomain.com/health) echo "Health check: $STATUS" [ "$STATUS" = "200" ] && break sleep 10done4. Confirm error rate returns to baseline
Monitor the error rate dashboard for 5 minutes after rollback. Confirm the spike has resolved.
5. Assess migration state
If a migration was applied as part of the failed deployment:
- Check whether the previous application version is compatible with the current schema.
- Forward-compatible migrations (adding columns with defaults, adding tables) are usually safe to leave in place.
- Destructive or breaking schema changes require an additional migration to restore the previous state. This is a database restore operation — see
docs/runbooks/restore-database-backup.mdif needed.
6. Record the rollback
Post a note to the team channel with:
- Date and time of rollback.
- Image rolled back to.
- Image rolled back from.
- Reason for rollback.
- Migration state (left in place or reverted).
- Owner of the follow-up investigation.
After Rollback
The failed release must be investigated before re-deployment. Do not re-deploy the same image without identifying and fixing the cause of the failure.
Open a postmortem ticket with:
- Timeline of events.
- What broke (based on logs and metrics).
- What was deployed.
- How it was detected.
- Immediate remediation.
- Preventative actions.