Release, Rollout, and Rollback Runbook¶

This document covers the full lifecycle of a code change from development to production: versioning scheme, CI pipeline steps, staged rollout, zero-downtime deployment, rollback procedure, and post-release verification.

Versioning¶

The project uses Semantic Versioning (MAJOR.MINOR.PATCH):

Increment	When
`PATCH` (e.g. 1.0.1)	Bug fix, security patch, config-only change
`MINOR` (e.g. 1.1.0)	New feature, new optional field, new API endpoint
`MAJOR` (e.g. 2.0.0)	Breaking API change, major migration, architecture change

Version is tracked in two places:

Git tag: git tag v1.1.0
Docker image tag: denbi-registry:v1.1.0

The SPECTACULAR_SETTINGS["VERSION"] in config/settings.py must match the API's own version (updated independently from the application version when API contracts change):

SPECTACULAR_SETTINGS = {
    "VERSION": "1.0.0",   # Update when API surface changes
    ...
}

Branch and Release Workflow¶

feature/xyz  →  main  →  tag v1.1.0  →  Docker image  →  staging  →  production

Feature branch: all development in feature/* or fix/* branches.
Pull Request: CI runs tests, linting, audit before merge.
Merge to main: triggers CI build of denbi-registry:main (unstable tag).
Tag a release: git tag v1.1.0 && git push origin v1.1.0 triggers production build.
Deploy to staging: automated or manual after CI passes on the tag.
Deploy to production: manual gate after staging verification.

CI Pipeline (GitHub Actions / GitLab CI)¶

GitHub Actions example¶

Tests, linting, and audits run natively on the GitHub runner (no Docker needed for the test stage). The build job validates the Docker image separately.

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
          cache: pip
          cache-dependency-path: requirements/development.txt
      - name: Install dependencies
        run: pip install -r requirements/development.txt
      - name: Run tests
        run: pytest tests/
        env:
          DJANGO_SETTINGS_MODULE: config.settings_test
          SECRET_KEY: ci-only-not-a-real-key
          DB_PASSWORD: ci
          REDIS_PASSWORD: ci
      - name: Lint
        run: |
          ruff check apps/ config/ tests/
          ruff format --check apps/ config/ tests/
      - name: Audit
        run: pip-audit -r requirements/production.txt

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker compose build
      - name: Push image (on tag)
        if: startsWith(github.ref, 'refs/tags/v')
        env:
          IMAGE_TAG: ${{ github.ref_name }}
        run: |
          docker build -t ghcr.io/denbi/service-registry:${IMAGE_TAG} .
          docker push ghcr.io/denbi/service-registry:${IMAGE_TAG}
          docker tag ghcr.io/denbi/service-registry:${IMAGE_TAG} \
                     ghcr.io/denbi/service-registry:latest
          docker push ghcr.io/denbi/service-registry:latest

GitLab CI example¶

# .gitlab-ci.yml
stages: [test, build, deploy-staging, deploy-production]

test:
  stage: test
  image: python:3.12-slim
  before_script:
    - pip install -r requirements/development.txt
  script:
    - pytest tests/
    - ruff check apps/ config/ tests/
    - pip-audit -r requirements/production.txt
  variables:
    DJANGO_SETTINGS_MODULE: config.settings_test
    SECRET_KEY: ci-only
    DB_PASSWORD: ci
    REDIS_PASSWORD: ci

build:
  stage: build
  only: [tags]
  script:
    - docker build -t registry.gitlab.com/$CI_PROJECT_PATH:$CI_COMMIT_TAG .
    - docker push registry.gitlab.com/$CI_PROJECT_PATH:$CI_COMMIT_TAG

deploy-staging:
  stage: deploy-staging
  only: [tags]
  environment: staging
  script:
    - ssh $DEPLOY_USER@$STAGING_HOST "IMAGE_TAG=$CI_COMMIT_TAG /opt/denbi/scripts/deploy.sh"

deploy-production:
  stage: deploy-production
  only: [tags]
  environment: production
  when: manual # Requires explicit click in GitLab UI
  script:
    - ssh $DEPLOY_USER@$PROD_HOST "IMAGE_TAG=$CI_COMMIT_TAG /opt/denbi/scripts/deploy.sh"

Deployment Script¶

Place this on the server at /opt/denbi/scripts/deploy.sh:

#!/usr/bin/env bash
# /opt/denbi/scripts/deploy.sh
# Usage: IMAGE_TAG=v1.1.0 ./deploy.sh
set -euo pipefail

IMAGE_TAG=${IMAGE_TAG:-latest}
COMPOSE_DIR=/opt/denbi/service-registry
COMPOSE="docker compose -f docker-compose.yml -f docker-compose.prod.yml"

echo "=== Deploying denbi-registry:${IMAGE_TAG} ==="
cd "$COMPOSE_DIR"

# 1. Pull the new image
docker pull "ghcr.io/denbi/service-registry:${IMAGE_TAG}"
docker tag  "ghcr.io/denbi/service-registry:${IMAGE_TAG}" denbi-registry:current

# 2. Rolling restart — start new containers before stopping old
#    The container entrypoint runs migrations automatically on startup.
#    Static files are baked into the image at build time — no collectstatic needed.
echo "--- Restarting web, worker, beat ---"
IMAGE_TAG="${IMAGE_TAG}" $COMPOSE up -d --no-deps web worker beat

# 5. Wait for health check to pass
echo "--- Waiting for health check ---"
for i in $(seq 1 12); do
  STATUS=$(curl -sf http://localhost:8000/health/ready/ | python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('status','error'))" 2>/dev/null || echo "error")
  if [ "$STATUS" = "ok" ]; then
    echo "Health check passed."
    break
  fi
  echo "  Waiting ($i/12)..."
  sleep 5
done

if [ "$STATUS" != "ok" ]; then
  echo "ERROR: Health check failed after 60s. Check logs:" >&2
  $COMPOSE logs web --tail 50
  exit 1
fi

echo "=== Deployment complete: ${IMAGE_TAG} ==="

Make it executable: chmod +x /opt/denbi/scripts/deploy.sh

Zero-Downtime Deployment (Standard)¶

For regular releases with non-destructive migrations (adding optional fields, adding tables, adding indices):

# On the server
IMAGE_TAG=v1.1.0 /opt/denbi/scripts/deploy.sh

This is zero-downtime because:

Migrations run before new code starts (new schema is backward-compatible)
docker compose up -d --no-deps starts new containers before removing old
Nginx continues serving requests throughout

Maintenance Window Deployment¶

Required when migrations are destructive (renaming columns, dropping columns, changing column types). These cannot be backward-compatible.

# 1. Enable maintenance page on host Nginx
sudo cp /var/www/denbi-registry/errors/upstream_down.html \
        /var/www/denbi-registry/maintenance.html
# (Configure host Nginx to serve this file instead of proxying)

# 2. Stop the application (keep DB and Redis running)
docker compose stop web worker beat

# 3. Apply the migration (use --run --rm to bypass the normal entrypoint auto-migrate
#    so you can verify the migration manually before bringing traffic back)
docker compose run --rm web python manage.py migrate

# 4. Deploy the new image (entrypoint will detect no pending migrations and proceed)
IMAGE_TAG=v2.0.0 /opt/denbi/scripts/deploy.sh

# 5. Remove maintenance page / re-enable proxy

Rollback Procedure¶

Rollback application code (no migration rollback needed)¶

If the new release has a bug but no schema changes:

# Restart with the previous image — no migration step needed
IMAGE_TAG=v1.0.0 docker compose \
  -f docker-compose.yml -f docker-compose.prod.yml \
  up -d --no-deps web worker beat

# Verify
curl https://service-registry.bi.denbi.de/health/ready/

Rollback including a migration¶

Only possible if the migration is reversible. Check with:

docker compose run --rm web python manage.py sqlmigrate submissions 0003 --backwards
# If this fails, the migration is not reversible — restore from backup instead.

If reversible:

# 1. Roll back to the previous migration
docker compose run --rm web python manage.py migrate submissions 0002

# 2. Deploy previous image
IMAGE_TAG=v1.0.0 docker compose \
  -f docker-compose.yml -f docker-compose.prod.yml \
  up -d --no-deps web worker beat

Full rollback from database backup¶

If the migration cannot be reversed and the new code is broken:

# 1. Stop application
docker compose stop web worker beat

# 2. Restore database
docker compose exec db psql -U denbi postgres -c "DROP DATABASE denbi_registry;"
docker compose exec db psql -U denbi postgres -c "CREATE DATABASE denbi_registry;"
docker compose exec -T db psql -U denbi denbi_registry < /path/to/backup.sql

# 3. Deploy previous image
IMAGE_TAG=v1.0.0 /opt/denbi/scripts/deploy.sh

Staging Verification Checklist¶

Run through this on staging after every deployment, before promoting to production.

Functional checks:

[ ] GET /health/ready/ returns {"status": "ok"}
[ ] GET / loads the home page without errors
[ ] GET /register/ loads the full form
[ ] Section B of the form shows EDAM Topics and EDAM Operations searchable fields
[ ] Typing "prote" in the EDAM Topics field filters to proteomics-related terms
[ ] Entering https://bio.tools/blast in the bio.tools URL field and tabbing out triggers the prefill banner
[ ] Clicking "Apply prefill" populates name, description, and EDAM fields from bio.tools
[ ] GET /captcha/ returns 200 with JSON containing algorithm, challenge, salt, signature, and maxNumber fields
[ ] GET /captcha/ response has Cache-Control: no-store header
[ ] The ALTCHA widget appears on /register/ (checkbox or spinner visible below Section G)
[ ] Clicking the Submit button on /register/ triggers the ALTCHA proof-of-work solve (spinner, then checkmark) before the form posts
[ ] Submit a test registration with EDAM terms selected → confirm redirect to success page with API key
[ ] Copy the API key → go to /update/ → enter key → form pre-populates including EDAM selections
[ ] Submit an update → confirm notification email received
[ ] GET /api/docs/ loads Swagger UI
[ ] GET /api/schema/ returns 200 with valid OpenAPI YAML
[ ] POST /api/v1/submissions/ with valid JSON payload returns 201 with api_key
[ ] GET /api/v1/submissions/{id}/ response includes edam_topics, edam_operations, and biotoolsrecord fields
[ ] GET /api/v1/edam/?branch=topic returns list of EDAM topic terms (no auth required)
[ ] GET /api/v1/edam/topic_0121/ returns full Proteomics term with definition and parent
[ ] After bio.tools sync runs: GET /api/v1/biotools/blast/ returns structured record with functions
[ ] Admin portal at /<ADMIN_URL_PREFIX>/ loads and shows submissions list
[ ] Admin → EDAM Ontology → EDAM Terms shows ~4000 terms
[ ] Admin → bio.tools Integration shows sync status for submissions with bio.tools URLs
[ ] Approve a submission via admin → status email sent

Security checks:

[ ] http:// redirects to https:// (301)
[ ] Strict-Transport-Security header present in response
[ ] X-Frame-Options: DENY present
[ ] API call without auth returns 403 (not 401, not 500)
[ ] Invalid API key returns same 403 as revoked key
[ ] GET /api/v1/edam/ returns 200 without any Authorization header (public endpoint)
[ ] GET /api/v1/biotools/ without admin API key returns 403

Monitoring After Release¶

# Watch live logs for errors
docker compose logs -f web | grep -E "ERROR|WARNING|CRITICAL"

# Check Celery task queue
docker compose exec worker celery -A config inspect active

# Check that beat is running scheduled tasks (should include sync-biotools-daily)
docker compose exec worker celery -A config inspect scheduled

# Check for failed tasks
docker compose exec worker celery -A config inspect reserved

bio.tools Sync Health¶

After releasing a version that adds or changes bio.tools integration:

# Count records with sync errors
docker compose exec web python manage.py shell -c "
from apps.biotools.models import BioToolsRecord
errors = BioToolsRecord.objects.exclude(sync_error='')
print(f'{errors.count()} records with sync errors:')
for r in errors: print(f'  {r.biotools_id}: {r.sync_error[:80]}')
"

# Manually trigger a full sync if the scheduled task missed
docker compose exec web python manage.py sync_biotools

EDAM Term Count¶

After any deployment that updates EDAM (or after running sync_edam):

docker compose exec web python manage.py shell -c "
from apps.edam.models import EdamTerm
from django.db.models import Count
qs = EdamTerm.objects.values('branch').annotate(n=Count('id')).order_by('branch')
for row in qs: print(f"  {row['branch']:12s}: {row['n']}")
print(f"  {'TOTAL':12s}: {EdamTerm.objects.count()}")
"

Set up an alert if /health/ready/ returns non-200 for more than 60 seconds. Tools: Uptime Kuma (self-hosted), Healthchecks.io, or your institution's monitoring stack.

EDAM Ontology Releases¶

EDAM publishes new releases several times a year. When a new release is out:

Check the EDAM changelog for any deprecated terms your submissions may be using.
Run the sync on staging first: docker compose exec web python manage.py sync_edam --dry-run
Apply on staging and verify term counts look correct.

Apply on production during a low-traffic period:

docker compose exec web python manage.py sync_edam

No migration, no restart, no downtime needed — terms upsert in place.

This is a PATCH-level release (no code change, data-only) and does not require going through the full CI/deploy pipeline.