Skip to main content

Control Center Deployment Runbook

Historical Runbook

This runbook reflects the retired engine-era operating model and is preserved for reference only. It is not part of the current golden path.

Complexity: MEDIUM Time Required: 20-40 minutes Owner: Platform Team


Pre-flight Checklist

  • Latest changes merged into main
  • python tools/export_dashboard_data.py executed (commit updated JSON)
  • npm run lint and npm run build succeed in frontend/control-center-ui
  • pytest apps/control-center/tests passes (run from control-center-venv)
  • SKUs verified (tools/sku_audit.py)
  • Documentation updated (docs/control-center/*, docs/runbooks/*)
  • CI pipeline (.github/workflows/control-center-ci.yml) passing
  • Post deployment notice in Slack #alerts-info

Package & Push

1. Build Control Center Image

# Build image
docker build -t ghcr.io/egintegrations/control-center:<tag> apps/control-center

# Push to registry
docker push ghcr.io/egintegrations/control-center:<tag>

# Verify image
docker pull ghcr.io/egintegrations/control-center:<tag>

2. Update Helm Chart

Edit: charts/control-center/values.yaml

image:
tag: "<tag>"
dashboardData:
enabled: true

Note: The ConfigMap is sourced from resources/dashboard-data.json. Ensure it's regenerated before deployment.

3. Commit Changes

git add charts/control-center/values.yaml
git commit -m "feat(control-center): update to version <tag>

- Updated dashboard data
- [List other changes]

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"

git push origin main

Promote via Argo CD

# Set context to HQ cluster
kubectl config use-context <hq-cluster>

# Apply application manifest
kubectl apply -f k8s/argocd/application-control-center.yaml

# Sync application
argocd app sync control-center

# Wait for health check
argocd app wait control-center --health --timeout 600

Validation

1. UI Verification

Open Control Center UI:

https://control-center.egintegrations.com/ui/#/

Check:

  • Dashboard metrics load correctly
  • Pricing tab renders without errors
  • SKU tab displays properly
  • Finance tab shows data
  • No console errors in browser developer tools

2. API Verification

# Test key endpoints
curl https://control-center.egintegrations.com/api/skus
curl https://control-center.egintegrations.com/api/engines
curl https://control-center.egintegrations.com/api/clients
curl https://control-center.egintegrations.com/api/finance/summary

# Test POST endpoint (dry-run)
curl -X POST https://control-center.egintegrations.com/api/engines/<id>/modules \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'

3. Argo CD Status

# Check application status
argocd app get control-center

# Should show:
# Health Status: Healthy
# Sync Status: Synced

4. Monitoring Verification

  • Uptime Kuma shows Control Center as UP
  • PostHog receiving events from Control Center
  • Slack alerts configured and tested
  • CrowdSec monitoring Control Center endpoints

5. Post Deployment

Post in Slack #alerts-info:

✅ Control Center deployed successfully
Version: <tag>
Status: Healthy
Dashboard: https://control-center.egintegrations.com/ui

Rollback

If deployment fails or issues are discovered:

1. Revert Changes

# Option A: Git revert
git revert <commit-hash>
git push origin main

# Option B: Checkout previous tag
git checkout <previous-tag>

2. Re-run Promotion

# Rebuild previous version
docker build -t ghcr.io/egintegrations/control-center:<previous-tag> apps/control-center
docker push ghcr.io/egintegrations/control-center:<previous-tag>

# Update Helm values to previous tag
# Edit charts/control-center/values.yaml

# Sync with Argo CD
argocd app sync control-center
argocd app wait control-center --health --timeout 600

3. Revalidate

Run validation steps above to ensure rollback successful.

4. Post Rollback Notice

Post in Slack #alerts-critical:

⚠️ Control Center rolled back
From: <failed-tag>
To: <previous-tag>
Reason: [Brief description]
Status: [Current status]

Troubleshooting

Build Fails

Check:

# Verify dependencies
cd apps/control-center
pip install -r requirements.txt

# Run tests
pytest tests/

# Check linting
ruff check .

UI Not Loading

Check:

# Frontend build
cd frontend/control-center-ui
npm install
npm run build

# Check console errors in browser
# Check API endpoints responding

API Returning Errors

Check:

# Check pod logs
kubectl logs -n hq deployment/control-center --tail=100

# Check database connection
kubectl exec -n hq deployment/control-center -- env | grep DATABASE

# Test database
kubectl exec -n hq $DB_POD -- psql -U postgres -c "SELECT 1"

Argo CD Sync Fails

Check:

# Check application status
argocd app get control-center

# Check for sync errors
argocd app logs control-center

# Manual sync with prune
argocd app sync control-center --prune

Notes

Development Environment

  • Use control-center-venv (Python 3.10) for backend work to match production runtime
  • Keep resources/Pricing Calculator.xlsx and EGI NomenClature.xlsx under resources/
  • Regenerate JSON from Excel files before each release:
    python tools/export_dashboard_data.py

Module Management

When adding new modules/engines:

  • Update MANIFEST file
  • Trigger module toggle updates in Control Center
  • Update documentation
  • Add to SKU validator

Monitoring Integration

Uptime Kuma:

  • Monitor: Control Center UI availability
  • Monitor: Control Center API health endpoint
  • Alert: Service down >2 minutes → Slack #alerts-critical

PostHog:

  • Track: User interactions with Control Center
  • Track: API endpoint usage
  • Dashboard: Control Center Analytics

CrowdSec:

  • Monitor: API abuse attempts
  • Monitor: Unauthorized access attempts
  • Action: Auto-ban suspicious IPs

Deployment Checklist

Before Deployment:

  • All tests passing
  • Documentation updated
  • CI pipeline green
  • Backup database
  • Post deployment notice in Slack

During Deployment:

  • Image built and pushed
  • Helm values updated
  • Argo CD synced
  • Health checks pass

After Deployment:

  • UI verification complete
  • API verification complete
  • Monitoring active
  • Post success notice in Slack
  • Document any issues

Emergency Rollback Procedure

If critical issues discovered immediately after deployment:

# 1. Scale down new deployment
kubectl scale deployment control-center --replicas=0 -n hq

# 2. Restore database from backup (if needed)
# See: runbooks/03-database-backup-restore.md

# 3. Deploy previous version
argocd app rollback control-center

# 4. Scale up previous deployment
kubectl scale deployment control-center --replicas=3 -n hq

# 5. Verify service restored
curl https://control-center.egintegrations.com/api/engines

# 6. Post emergency rollback notice in Slack #alerts-critical

Last Updated: 2026-03-25 Version: 1.1