Release, Rollout, and Rollback Runbook¶
This document covers the full lifecycle of a code change from development to production: versioning scheme, CI pipeline steps, staged rollout, zero-downtime deployment, rollback procedure, and post-release verification.
Versioning¶
The project uses Semantic Versioning (MAJOR.MINOR.PATCH):
| Increment | When |
|---|---|
PATCH (e.g. 1.0.1) |
Bug fix, security patch, config-only change |
MINOR (e.g. 1.1.0) |
New feature, new optional field, new API endpoint |
MAJOR (e.g. 2.0.0) |
Breaking API change, major migration, architecture change |
Version is tracked in two places:
- Git tag:
git tag v1.1.0 - Docker image tag:
denbi-registry:v1.1.0
The SPECTACULAR_SETTINGS["VERSION"] in config/settings.py must match the
API's own version (updated independently from the application version when
API contracts change):
Branch and Release Workflow¶
- Feature branch: all development in
feature/*orfix/*branches. - Pull Request: CI runs tests, linting, audit before merge.
- Merge to main: triggers CI build of
denbi-registry:main(unstable tag). - Tag a release:
git tag v1.1.0 && git push origin v1.1.0triggers production build. - Deploy to staging: automated or manual after CI passes on the tag.
- Deploy to production: manual gate after staging verification.
CI Pipeline (GitHub Actions / GitLab CI)¶
GitHub Actions example¶
Tests, linting, and audits run natively on the GitHub runner (no Docker needed for the test stage). The build job validates the Docker image separately.
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main]
tags: ['v*']
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip
cache-dependency-path: requirements/development.txt
- name: Install dependencies
run: pip install -r requirements/development.txt
- name: Run tests
run: pytest tests/
env:
DJANGO_SETTINGS_MODULE: config.settings_test
SECRET_KEY: ci-only-not-a-real-key
DB_PASSWORD: ci
REDIS_PASSWORD: ci
- name: Lint
run: |
ruff check apps/ config/ tests/
ruff format --check apps/ config/ tests/
- name: Audit
run: pip-audit -r requirements/production.txt
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker compose build
- name: Push image (on tag)
if: startsWith(github.ref, 'refs/tags/v')
env:
IMAGE_TAG: ${{ github.ref_name }}
run: |
docker build -t ghcr.io/denbi/service-registry:${IMAGE_TAG} .
docker push ghcr.io/denbi/service-registry:${IMAGE_TAG}
docker tag ghcr.io/denbi/service-registry:${IMAGE_TAG} \
ghcr.io/denbi/service-registry:latest
docker push ghcr.io/denbi/service-registry:latest
GitLab CI example¶
# .gitlab-ci.yml
stages: [test, build, deploy-staging, deploy-production]
test:
stage: test
image: python:3.12-slim
before_script:
- pip install -r requirements/development.txt
script:
- pytest tests/
- ruff check apps/ config/ tests/
- pip-audit -r requirements/production.txt
variables:
DJANGO_SETTINGS_MODULE: config.settings_test
SECRET_KEY: ci-only
DB_PASSWORD: ci
REDIS_PASSWORD: ci
build:
stage: build
only: [tags]
script:
- docker build -t registry.gitlab.com/$CI_PROJECT_PATH:$CI_COMMIT_TAG .
- docker push registry.gitlab.com/$CI_PROJECT_PATH:$CI_COMMIT_TAG
deploy-staging:
stage: deploy-staging
only: [tags]
environment: staging
script:
- ssh $DEPLOY_USER@$STAGING_HOST "IMAGE_TAG=$CI_COMMIT_TAG /opt/denbi/scripts/deploy.sh"
deploy-production:
stage: deploy-production
only: [tags]
environment: production
when: manual # Requires explicit click in GitLab UI
script:
- ssh $DEPLOY_USER@$PROD_HOST "IMAGE_TAG=$CI_COMMIT_TAG /opt/denbi/scripts/deploy.sh"
Deployment Script¶
Place this on the server at /opt/denbi/scripts/deploy.sh:
#!/usr/bin/env bash
# /opt/denbi/scripts/deploy.sh
# Usage: IMAGE_TAG=v1.1.0 ./deploy.sh
set -euo pipefail
IMAGE_TAG=${IMAGE_TAG:-latest}
COMPOSE_DIR=/opt/denbi/service-registry
COMPOSE="docker compose -f docker-compose.yml -f docker-compose.prod.yml"
echo "=== Deploying denbi-registry:${IMAGE_TAG} ==="
cd "$COMPOSE_DIR"
# 1. Pull the new image
docker pull "ghcr.io/denbi/service-registry:${IMAGE_TAG}"
docker tag "ghcr.io/denbi/service-registry:${IMAGE_TAG}" denbi-registry:current
# 2. Rolling restart — start new containers before stopping old
# The container entrypoint runs migrations automatically on startup.
# Static files are baked into the image at build time — no collectstatic needed.
echo "--- Restarting web, worker, beat ---"
IMAGE_TAG="${IMAGE_TAG}" $COMPOSE up -d --no-deps web worker beat
# 5. Wait for health check to pass
echo "--- Waiting for health check ---"
for i in $(seq 1 12); do
STATUS=$(curl -sf http://localhost:8000/health/ready/ | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('status','error'))" 2>/dev/null || echo "error")
if [ "$STATUS" = "ok" ]; then
echo "Health check passed."
break
fi
echo " Waiting ($i/12)..."
sleep 5
done
if [ "$STATUS" != "ok" ]; then
echo "ERROR: Health check failed after 60s. Check logs:" >&2
$COMPOSE logs web --tail 50
exit 1
fi
echo "=== Deployment complete: ${IMAGE_TAG} ==="
Make it executable: chmod +x /opt/denbi/scripts/deploy.sh
Zero-Downtime Deployment (Standard)¶
For regular releases with non-destructive migrations (adding optional fields, adding tables, adding indices):
This is zero-downtime because:
- Migrations run before new code starts (new schema is backward-compatible)
docker compose up -d --no-depsstarts new containers before removing old- Nginx continues serving requests throughout
Maintenance Window Deployment¶
Required when migrations are destructive (renaming columns, dropping columns, changing column types). These cannot be backward-compatible.
# 1. Enable maintenance page on host Nginx
sudo cp /var/www/denbi-registry/errors/upstream_down.html \
/var/www/denbi-registry/maintenance.html
# (Configure host Nginx to serve this file instead of proxying)
# 2. Stop the application (keep DB and Redis running)
docker compose stop web worker beat
# 3. Apply the migration (use --run --rm to bypass the normal entrypoint auto-migrate
# so you can verify the migration manually before bringing traffic back)
docker compose run --rm web python manage.py migrate
# 4. Deploy the new image (entrypoint will detect no pending migrations and proceed)
IMAGE_TAG=v2.0.0 /opt/denbi/scripts/deploy.sh
# 5. Remove maintenance page / re-enable proxy
Rollback Procedure¶
Rollback application code (no migration rollback needed)¶
If the new release has a bug but no schema changes:
# Restart with the previous image — no migration step needed
IMAGE_TAG=v1.0.0 docker compose \
-f docker-compose.yml -f docker-compose.prod.yml \
up -d --no-deps web worker beat
# Verify
curl https://service-registry.bi.denbi.de/health/ready/
Rollback including a migration¶
Only possible if the migration is reversible. Check with:
docker compose run --rm web python manage.py sqlmigrate submissions 0003 --backwards
# If this fails, the migration is not reversible — restore from backup instead.
If reversible:
# 1. Roll back to the previous migration
docker compose run --rm web python manage.py migrate submissions 0002
# 2. Deploy previous image
IMAGE_TAG=v1.0.0 docker compose \
-f docker-compose.yml -f docker-compose.prod.yml \
up -d --no-deps web worker beat
Full rollback from database backup¶
If the migration cannot be reversed and the new code is broken:
# 1. Stop application
docker compose stop web worker beat
# 2. Restore database
docker compose exec db psql -U denbi postgres -c "DROP DATABASE denbi_registry;"
docker compose exec db psql -U denbi postgres -c "CREATE DATABASE denbi_registry;"
docker compose exec -T db psql -U denbi denbi_registry < /path/to/backup.sql
# 3. Deploy previous image
IMAGE_TAG=v1.0.0 /opt/denbi/scripts/deploy.sh
Staging Verification Checklist¶
Run through this on staging after every deployment, before promoting to production.
Functional checks:
- [ ]
GET /health/ready/returns{"status": "ok"} - [ ]
GET /loads the home page without errors - [ ]
GET /register/loads the full form - [ ] Section B of the form shows EDAM Topics and EDAM Operations searchable fields
- [ ] Typing "prote" in the EDAM Topics field filters to proteomics-related terms
- [ ] Entering
https://bio.tools/blastin the bio.tools URL field and tabbing out triggers the prefill banner - [ ] Clicking "Apply prefill" populates name, description, and EDAM fields from bio.tools
- [ ]
GET /captcha/returns200with JSON containingalgorithm,challenge,salt,signature, andmaxNumberfields - [ ]
GET /captcha/response hasCache-Control: no-storeheader - [ ] The ALTCHA widget appears on
/register/(checkbox or spinner visible below Section G) - [ ] Clicking the Submit button on
/register/triggers the ALTCHA proof-of-work solve (spinner, then checkmark) before the form posts - [ ] Submit a test registration with EDAM terms selected → confirm redirect to success page with API key
- [ ] Copy the API key → go to
/update/→ enter key → form pre-populates including EDAM selections - [ ] Submit an update → confirm notification email received
- [ ]
GET /api/docs/loads Swagger UI - [ ]
GET /api/schema/returns 200 with valid OpenAPI YAML - [ ]
POST /api/v1/submissions/with valid JSON payload returns 201 withapi_key - [ ]
GET /api/v1/submissions/{id}/response includesedam_topics,edam_operations, andbiotoolsrecordfields - [ ]
GET /api/v1/edam/?branch=topicreturns list of EDAM topic terms (no auth required) - [ ]
GET /api/v1/edam/topic_0121/returns full Proteomics term with definition and parent - [ ] After bio.tools sync runs:
GET /api/v1/biotools/blast/returns structured record with functions - [ ] Admin portal at
/<ADMIN_URL_PREFIX>/loads and shows submissions list - [ ] Admin → EDAM Ontology → EDAM Terms shows ~4000 terms
- [ ] Admin → bio.tools Integration shows sync status for submissions with bio.tools URLs
- [ ] Approve a submission via admin → status email sent
Security checks:
- [ ]
http://redirects tohttps://(301) - [ ]
Strict-Transport-Securityheader present in response - [ ]
X-Frame-Options: DENYpresent - [ ] API call without auth returns 403 (not 401, not 500)
- [ ] Invalid API key returns same 403 as revoked key
- [ ]
GET /api/v1/edam/returns 200 without anyAuthorizationheader (public endpoint) - [ ]
GET /api/v1/biotools/without admin API key returns 403
Monitoring After Release¶
# Watch live logs for errors
docker compose logs -f web | grep -E "ERROR|WARNING|CRITICAL"
# Check Celery task queue
docker compose exec worker celery -A config inspect active
# Check that beat is running scheduled tasks (should include sync-biotools-daily)
docker compose exec worker celery -A config inspect scheduled
# Check for failed tasks
docker compose exec worker celery -A config inspect reserved
bio.tools Sync Health¶
After releasing a version that adds or changes bio.tools integration:
# Count records with sync errors
docker compose exec web python manage.py shell -c "
from apps.biotools.models import BioToolsRecord
errors = BioToolsRecord.objects.exclude(sync_error='')
print(f'{errors.count()} records with sync errors:')
for r in errors: print(f' {r.biotools_id}: {r.sync_error[:80]}')
"
# Manually trigger a full sync if the scheduled task missed
docker compose exec web python manage.py sync_biotools
EDAM Term Count¶
After any deployment that updates EDAM (or after running sync_edam):
docker compose exec web python manage.py shell -c "
from apps.edam.models import EdamTerm
from django.db.models import Count
qs = EdamTerm.objects.values('branch').annotate(n=Count('id')).order_by('branch')
for row in qs: print(f" {row['branch']:12s}: {row['n']}")
print(f" {'TOTAL':12s}: {EdamTerm.objects.count()}")
"
Set up an alert if /health/ready/ returns non-200 for more than 60 seconds.
Tools: Uptime Kuma (self-hosted), Healthchecks.io, or your institution's monitoring stack.
EDAM Ontology Releases¶
EDAM publishes new releases several times a year. When a new release is out:
- Check the EDAM changelog for any deprecated terms your submissions may be using.
- Run the sync on staging first:
docker compose exec web python manage.py sync_edam --dry-run - Apply on staging and verify term counts look correct.
- Apply on production during a low-traffic period:
- No migration, no restart, no downtime needed — terms upsert in place.
This is a PATCH-level release (no code change, data-only) and does not require
going through the full CI/deploy pipeline.