Control Center Deployment Runbook
This runbook reflects the retired engine-era operating model and is preserved for reference only. It is not part of the current golden path.
Complexity: MEDIUM Time Required: 20-40 minutes Owner: Platform Team
Pre-flight Checklist
- Latest changes merged into
main -
python tools/export_dashboard_data.pyexecuted (commit updated JSON) -
npm run lintandnpm run buildsucceed infrontend/control-center-ui -
pytest apps/control-center/testspasses (run fromcontrol-center-venv) - SKUs verified (
tools/sku_audit.py) - Documentation updated (
docs/control-center/*,docs/runbooks/*) - CI pipeline (
.github/workflows/control-center-ci.yml) passing - Post deployment notice in Slack
#alerts-info
Package & Push
1. Build Control Center Image
# Build image
docker build -t ghcr.io/egintegrations/control-center:<tag> apps/control-center
# Push to registry
docker push ghcr.io/egintegrations/control-center:<tag>
# Verify image
docker pull ghcr.io/egintegrations/control-center:<tag>
2. Update Helm Chart
Edit: charts/control-center/values.yaml
image:
tag: "<tag>"
dashboardData:
enabled: true
Note: The ConfigMap is sourced from resources/dashboard-data.json. Ensure it's regenerated before deployment.
3. Commit Changes
git add charts/control-center/values.yaml
git commit -m "feat(control-center): update to version <tag>
- Updated dashboard data
- [List other changes]
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
git push origin main
Promote via Argo CD
# Set context to HQ cluster
kubectl config use-context <hq-cluster>
# Apply application manifest
kubectl apply -f k8s/argocd/application-control-center.yaml
# Sync application
argocd app sync control-center
# Wait for health check
argocd app wait control-center --health --timeout 600
Validation
1. UI Verification
Open Control Center UI:
https://control-center.egintegrations.com/ui/#/
Check:
- Dashboard metrics load correctly
- Pricing tab renders without errors
- SKU tab displays properly
- Finance tab shows data
- No console errors in browser developer tools
2. API Verification
# Test key endpoints
curl https://control-center.egintegrations.com/api/skus
curl https://control-center.egintegrations.com/api/engines
curl https://control-center.egintegrations.com/api/clients
curl https://control-center.egintegrations.com/api/finance/summary
# Test POST endpoint (dry-run)
curl -X POST https://control-center.egintegrations.com/api/engines/<id>/modules \
-H "Content-Type: application/json" \
-d '{"dry_run": true}'
3. Argo CD Status
# Check application status
argocd app get control-center
# Should show:
# Health Status: Healthy
# Sync Status: Synced
4. Monitoring Verification
- Uptime Kuma shows Control Center as UP
- PostHog receiving events from Control Center
- Slack alerts configured and tested
- CrowdSec monitoring Control Center endpoints
5. Post Deployment
Post in Slack #alerts-info:
✅ Control Center deployed successfully
Version: <tag>
Status: Healthy
Dashboard: https://control-center.egintegrations.com/ui
Rollback
If deployment fails or issues are discovered:
1. Revert Changes
# Option A: Git revert
git revert <commit-hash>
git push origin main
# Option B: Checkout previous tag
git checkout <previous-tag>
2. Re-run Promotion
# Rebuild previous version
docker build -t ghcr.io/egintegrations/control-center:<previous-tag> apps/control-center
docker push ghcr.io/egintegrations/control-center:<previous-tag>
# Update Helm values to previous tag
# Edit charts/control-center/values.yaml
# Sync with Argo CD
argocd app sync control-center
argocd app wait control-center --health --timeout 600
3. Revalidate
Run validation steps above to ensure rollback successful.
4. Post Rollback Notice
Post in Slack #alerts-critical:
⚠️ Control Center rolled back
From: <failed-tag>
To: <previous-tag>
Reason: [Brief description]
Status: [Current status]
Troubleshooting
Build Fails
Check:
# Verify dependencies
cd apps/control-center
pip install -r requirements.txt
# Run tests
pytest tests/
# Check linting
ruff check .
UI Not Loading
Check:
# Frontend build
cd frontend/control-center-ui
npm install
npm run build
# Check console errors in browser
# Check API endpoints responding
API Returning Errors
Check:
# Check pod logs
kubectl logs -n hq deployment/control-center --tail=100
# Check database connection
kubectl exec -n hq deployment/control-center -- env | grep DATABASE
# Test database
kubectl exec -n hq $DB_POD -- psql -U postgres -c "SELECT 1"
Argo CD Sync Fails
Check:
# Check application status
argocd app get control-center
# Check for sync errors
argocd app logs control-center
# Manual sync with prune
argocd app sync control-center --prune
Notes
Development Environment
- Use
control-center-venv(Python 3.10) for backend work to match production runtime - Keep
resources/Pricing Calculator.xlsxandEGI NomenClature.xlsxunderresources/ - Regenerate JSON from Excel files before each release:
python tools/export_dashboard_data.py
Module Management
When adding new modules/engines:
- Update
MANIFESTfile - Trigger module toggle updates in Control Center
- Update documentation
- Add to SKU validator
Monitoring Integration
Uptime Kuma:
- Monitor: Control Center UI availability
- Monitor: Control Center API health endpoint
- Alert: Service down >2 minutes → Slack
#alerts-critical
PostHog:
- Track: User interactions with Control Center
- Track: API endpoint usage
- Dashboard: Control Center Analytics
CrowdSec:
- Monitor: API abuse attempts
- Monitor: Unauthorized access attempts
- Action: Auto-ban suspicious IPs
Deployment Checklist
Before Deployment:
- All tests passing
- Documentation updated
- CI pipeline green
- Backup database
- Post deployment notice in Slack
During Deployment:
- Image built and pushed
- Helm values updated
- Argo CD synced
- Health checks pass
After Deployment:
- UI verification complete
- API verification complete
- Monitoring active
- Post success notice in Slack
- Document any issues
Emergency Rollback Procedure
If critical issues discovered immediately after deployment:
# 1. Scale down new deployment
kubectl scale deployment control-center --replicas=0 -n hq
# 2. Restore database from backup (if needed)
# See: runbooks/03-database-backup-restore.md
# 3. Deploy previous version
argocd app rollback control-center
# 4. Scale up previous deployment
kubectl scale deployment control-center --replicas=3 -n hq
# 5. Verify service restored
curl https://control-center.egintegrations.com/api/engines
# 6. Post emergency rollback notice in Slack #alerts-critical
Last Updated: 2026-03-25 Version: 1.1