Runbook: Deploy New Engine for Client
This runbook reflects the retired engine-era operating model and is preserved for reference only. It is not part of the current golden path.
Complexity: MEDIUM Time Required: 30-60 minutes Owner: Platform Team
Overview
This runbook walks through deploying a new engine for a client using one of the EGI templates.
Prerequisites
- Client added to Control Center
- Kubernetes namespace created
- GitHub repository created (if using CI/CD)
- GHCR access configured
- Helm installed locally
- Uptime Kuma monitor prepared for new engine
- Slack notification channel ready
Step 1: Choose Template (2 minutes)
Available Templates:
api-fastapi- REST API with databasebot-worker- Background task processingrag-engine- LLM-powered Q&A system
Selection Criteria:
- Need API endpoints? →
api-fastapi - Need background jobs? →
bot-worker - Need document search/chat? →
rag-engine
Step 2: Copy Template (5 minutes)
# Navigate to templates
cd /Users/elliottgodwin/Desktop/egi-engine/templates
# Copy chosen template
cp -r api-fastapi ~/client-projects/{client-name}-{module}
# Navigate to new directory
cd ~/client-projects/{client-name}-{module}
# Example:
# cp -r api-fastapi ~/client-projects/acme-corp-api
# cd ~/client-projects/acme-corp-api
Step 3: Configure Engine (10 minutes)
A. Update Engine Identification
Edit: app/main.py (or worker/main.py for worker)
# Change these lines:
ENGINE_ID = os.getenv("ENGINE_ID", "{client-name}.{module}")
ENGINE_SKU = os.getenv("ENGINE_SKU", "{X.Y.Z}.{TYPE}.{PLATFORM}.{YYYYMMDD}")
# Example:
ENGINE_ID = os.getenv("ENGINE_ID", "acme-corp.api")
ENGINE_SKU = os.getenv("ENGINE_SKU", "1.0.0.API.EKS.20260325")
Generate SKU:
cd /Users/elliottgodwin/Desktop/egi-engine
python standards/sku/gen_sku.py -M 1 -m 0 -p 0 -t API -P EKS
B. Update Helm Values
Edit: chart/values.yaml
# Update image repository
image:
repository: ghcr.io/egintegrations/{client-name}-{module}
tag: "main"
# Update environment variables
env:
ENGINE_ID: "{client-name}.{module}"
ENGINE_SKU: "{generated_sku}"
DATABASE_URL: "postgresql://..." # If using database
C. Update README
Edit: README.md
- Replace "my-new-api" with actual name
- Update deployment instructions
- Add client-specific notes
Step 4: Initialize Git Repository (5 minutes)
# Initialize git
git init
# Add all files
git add .
# Initial commit
git commit -m "feat: initialize {client-name} {module} from EGI template
Based on: templates/{template-name}
Client: {client-name}
Module: {module}"
# Create GitHub repository
gh repo create {client-name}-{module} --private --source=. --remote=origin
# Push to GitHub
git push -u origin main
Step 5: Configure GitHub Actions (3 minutes)
Verify workflow file exists:
ls .github/workflows/build.yaml
Add GitHub Secret for GHCR:
- Go to: https://github.com/{org}/{repo}/settings/secrets/actions
- Click "New repository secret"
- Name:
GHCR_PAT - Value: Your GitHub Personal Access Token
- Click "Add secret"
Create PAT if needed:
- Go to: https://github.com/settings/tokens
- Generate new token (classic)
- Scopes:
write:packages,read:packages - Copy token immediately
Step 6: Build and Push Image (10 minutes)
Option A: Manual Build (Faster for First Deploy)
# Login to GHCR
echo $GHCR_PAT | docker login ghcr.io -u {github_username} --password-stdin
# Build image
docker build -t ghcr.io/egintegrations/{client-name}-{module}:main .
# Push image
docker push ghcr.io/egintegrations/{client-name}-{module}:main
# Verify image exists
docker pull ghcr.io/egintegrations/{client-name}-{module}:main
Option B: GitHub Actions (Automated)
# Push triggers automatic build
git push origin main
# Monitor build
# Go to: https://github.com/{org}/{repo}/actions
# Wait for green checkmark (2-5 minutes)
Step 7: Create Kubernetes Namespace (1 minute)
# Create namespace if doesn't exist
kubectl create namespace client-{client-slug}
# Example:
kubectl create namespace client-acme-corp
# Verify
kubectl get namespace client-{client-slug}
Step 8: Deploy with Helm (5 minutes)
# Deploy
helm upgrade --install {client-name}-{module} ./chart \
--set image.repository=ghcr.io/egintegrations/{client-name}-{module} \
--set image.tag=main \
--namespace client-{client-slug} \
--create-namespace \
--wait
# Example:
helm upgrade --install acme-corp-api ./chart \
--set image.repository=ghcr.io/egintegrations/acme-corp-api \
--set image.tag=main \
--namespace client-acme-corp \
--create-namespace \
--wait
Expected output:
Release "acme-corp-api" has been installed. Happy Helming!
NAME: acme-corp-api
LAST DEPLOYED: ...
NAMESPACE: client-acme-corp
STATUS: deployed
REVISION: 1
Step 9: Verify Deployment (5 minutes)
A. Check Pods
# Check pod status
kubectl get pods -n client-{client-slug}
# Should show:
# NAME READY STATUS RESTARTS AGE
# acme-corp-api-xxx 1/1 Running 0 2m
# Check logs
kubectl logs -f -n client-{client-slug} deployment/{client-name}-{module}
B. Test Health Endpoint
# Port forward
kubectl port-forward -n client-{client-slug} svc/{client-name}-{module} 8080:80
# Test (in another terminal)
curl http://localhost:8080/.well-known/engine-status
# Should return JSON with:
# - id: "{client-name}.{module}"
# - sku: "`{sku}`"
# - status.state: "ok"
C. Check in Control Center
# Trigger refresh
curl -X POST https://control-center.egintegrations.com/api/refresh
# Wait 2-3 minutes, then check UI
# Open: https://control-center.egintegrations.com/ui
# Engines tab should show new engine
Step 10: Register Engine (5 minutes)
Add to engines inventory:
# Edit engines.yaml
cd /Users/elliottgodwin/Desktop/egi-engine
nano engines/engines.yaml
Add entry:
- id: {client-name}.{module}
client: {client-name}
environment: production
module: {module}
type: {TYPE} # API, BOT, RAG, etc.
platform: EKS
sku: {generated_sku}
runtime: kubernetes
namespace: client-{client-slug}
base_url: http://{client-name}-{module}.client-{client-slug}.svc.cluster.local
status_path: /.well-known/engine-status
poll_interval_seconds: 900
links:
dashboard: https://grafana.egintegrations.com
logs: https://grafana.egintegrations.com/explore
notes: "Deployed on `{date}`"
Commit and push:
git add engines/engines.yaml
git commit -m "feat: register {client-name} {module} engine"
git push
Step 11: Configure Monitoring (5 minutes)
A. Uptime Kuma
- Log into Uptime Kuma dashboard
- Click "Add New Monitor"
- Configure:
- Monitor Type: HTTP(s)
- Friendly Name:
{client-name}.{module} - URL: Health check endpoint
- Heartbeat Interval: 60 seconds
- Retries: 3
- Notifications: Slack
#alerts-warning
- Save monitor
B. Slack Notifications
Configure alert routing in Slack:
- Critical alerts →
#alerts-critical - Warning alerts →
#alerts-warning - Info alerts →
#alerts-info
C. PostHog Analytics (Optional)
If engine exposes user analytics:
- Add PostHog API key to environment variables
- Configure event tracking
- Create custom dashboard in PostHog
D. CrowdSec Security (Optional)
If engine is public-facing:
- Configure CrowdSec agent for namespace
- Set up IP ban rules
- Monitor security events in CrowdSec dashboard
Step 12: Verify Everything (5 minutes)
Checklist:
- Pod is Running (1/1)
- Health endpoint returns 200
- Logs show no errors
- Engine appears in Control Center (green status)
- Uptime Kuma monitor active and reporting UP
- Entry added to engines.yaml
- Slack notifications configured
- PostHog analytics configured (if applicable)
- CrowdSec rules applied (if public-facing)
- Documentation updated
Post-Deployment
1. Document
Create deployment record:
# In client project folder
cat > DEPLOYMENT.md <<EOF
# Deployment Record
**Date:** `$(date)`
**Engine:** {client-name}.{module}
**SKU:** `{sku}`
**Namespace:** client-{client-slug}
**Image:** ghcr.io/egintegrations/{client-name}-{module}:main
## Deployed By
- Name: `{your_name}`
- Date: `$(date)`
## Configuration
- Database: `{if applicable}`
- External Services: `{if applicable}`
- Environment Variables: `{list important ones}`
## Monitoring
- Uptime Kuma: ✅
- Slack Alerts: ✅ (#alerts-warning, #alerts-critical)
- PostHog Analytics: `{✅ or N/A}`
- CrowdSec Security: `{✅ or N/A}`
## Verification
- Health Check: ✅
- Control Center: ✅
- Monitoring: ✅
## Notes
{any special notes}
EOF
2. Notify Team
Post in Slack with:
- Engine name and ID
- Deployment status
- Monitoring links (Uptime Kuma, PostHog)
- Support contact
- Next steps
3. Set Up Alerts
Verify alerts are configured:
- Uptime Kuma: Pod down, health check failure
- PostHog: Usage analytics, error tracking
- CrowdSec: Security threats, IP bans
- Resource alerts: High CPU/memory via Kubernetes metrics
Rollback Procedure
If deployment fails:
# Delete deployment
helm uninstall {client-name}-{module} -n client-{client-slug}
# Or rollback to previous version
helm rollback {client-name}-{module} -n client-{client-slug}
# Check status
helm history {client-name}-{module} -n client-{client-slug}
Post rollback notification in Slack #alerts-critical.
Troubleshooting
Pod Not Starting
See: runbooks/01-engine-down.md
Image Pull Failed
# Check image exists
docker pull ghcr.io/egintegrations/{client-name}-{module}:main
# Check image pull secrets
kubectl get secrets -n client-{client-slug}
# Create if missing (see Step 6)
Health Check Failing
# Check logs
kubectl logs -n client-{client-slug} deployment/{client-name}-{module}
# Common issues:
# - Database not accessible
# - Missing environment variables
# - Port mismatch
Next Steps
After successful deployment:
- Set up CI/CD for automatic deployments
- Configure ingress (if public-facing)
- Add custom domain
- Set up automated backups
- Create client-specific dashboard in PostHog
- Review CrowdSec security logs weekly
Last Updated: 2026-03-25 Version: 1.1