Skip to main content

Runbook: Deploy New Engine for Client

Historical Runbook

This runbook reflects the retired engine-era operating model and is preserved for reference only. It is not part of the current golden path.

Complexity: MEDIUM Time Required: 30-60 minutes Owner: Platform Team


Overview

This runbook walks through deploying a new engine for a client using one of the EGI templates.


Prerequisites

  • Client added to Control Center
  • Kubernetes namespace created
  • GitHub repository created (if using CI/CD)
  • GHCR access configured
  • Helm installed locally
  • Uptime Kuma monitor prepared for new engine
  • Slack notification channel ready

Step 1: Choose Template (2 minutes)

Available Templates:

  • api-fastapi - REST API with database
  • bot-worker - Background task processing
  • rag-engine - LLM-powered Q&A system

Selection Criteria:

  • Need API endpoints? → api-fastapi
  • Need background jobs? → bot-worker
  • Need document search/chat? → rag-engine

Step 2: Copy Template (5 minutes)

# Navigate to templates
cd /Users/elliottgodwin/Desktop/egi-engine/templates

# Copy chosen template
cp -r api-fastapi ~/client-projects/{client-name}-{module}

# Navigate to new directory
cd ~/client-projects/{client-name}-{module}

# Example:
# cp -r api-fastapi ~/client-projects/acme-corp-api
# cd ~/client-projects/acme-corp-api

Step 3: Configure Engine (10 minutes)

A. Update Engine Identification

Edit: app/main.py (or worker/main.py for worker)

# Change these lines:
ENGINE_ID = os.getenv("ENGINE_ID", "{client-name}.{module}")
ENGINE_SKU = os.getenv("ENGINE_SKU", "{X.Y.Z}.{TYPE}.{PLATFORM}.{YYYYMMDD}")

# Example:
ENGINE_ID = os.getenv("ENGINE_ID", "acme-corp.api")
ENGINE_SKU = os.getenv("ENGINE_SKU", "1.0.0.API.EKS.20260325")

Generate SKU:

cd /Users/elliottgodwin/Desktop/egi-engine
python standards/sku/gen_sku.py -M 1 -m 0 -p 0 -t API -P EKS

B. Update Helm Values

Edit: chart/values.yaml

# Update image repository
image:
repository: ghcr.io/egintegrations/{client-name}-{module}
tag: "main"

# Update environment variables
env:
ENGINE_ID: "{client-name}.{module}"
ENGINE_SKU: "{generated_sku}"
DATABASE_URL: "postgresql://..." # If using database

C. Update README

Edit: README.md

  • Replace "my-new-api" with actual name
  • Update deployment instructions
  • Add client-specific notes

Step 4: Initialize Git Repository (5 minutes)

# Initialize git
git init

# Add all files
git add .

# Initial commit
git commit -m "feat: initialize {client-name} {module} from EGI template

Based on: templates/{template-name}
Client: {client-name}
Module: {module}"

# Create GitHub repository
gh repo create {client-name}-{module} --private --source=. --remote=origin

# Push to GitHub
git push -u origin main

Step 5: Configure GitHub Actions (3 minutes)

Verify workflow file exists:

ls .github/workflows/build.yaml

Add GitHub Secret for GHCR:

  1. Go to: https://github.com/{org}/{repo}/settings/secrets/actions
  2. Click "New repository secret"
  3. Name: GHCR_PAT
  4. Value: Your GitHub Personal Access Token
  5. Click "Add secret"

Create PAT if needed:


Step 6: Build and Push Image (10 minutes)

Option A: Manual Build (Faster for First Deploy)

# Login to GHCR
echo $GHCR_PAT | docker login ghcr.io -u {github_username} --password-stdin

# Build image
docker build -t ghcr.io/egintegrations/{client-name}-{module}:main .

# Push image
docker push ghcr.io/egintegrations/{client-name}-{module}:main

# Verify image exists
docker pull ghcr.io/egintegrations/{client-name}-{module}:main

Option B: GitHub Actions (Automated)

# Push triggers automatic build
git push origin main

# Monitor build
# Go to: https://github.com/{org}/{repo}/actions

# Wait for green checkmark (2-5 minutes)

Step 7: Create Kubernetes Namespace (1 minute)

# Create namespace if doesn't exist
kubectl create namespace client-{client-slug}

# Example:
kubectl create namespace client-acme-corp

# Verify
kubectl get namespace client-{client-slug}

Step 8: Deploy with Helm (5 minutes)

# Deploy
helm upgrade --install {client-name}-{module} ./chart \
--set image.repository=ghcr.io/egintegrations/{client-name}-{module} \
--set image.tag=main \
--namespace client-{client-slug} \
--create-namespace \
--wait

# Example:
helm upgrade --install acme-corp-api ./chart \
--set image.repository=ghcr.io/egintegrations/acme-corp-api \
--set image.tag=main \
--namespace client-acme-corp \
--create-namespace \
--wait

Expected output:

Release "acme-corp-api" has been installed. Happy Helming!
NAME: acme-corp-api
LAST DEPLOYED: ...
NAMESPACE: client-acme-corp
STATUS: deployed
REVISION: 1

Step 9: Verify Deployment (5 minutes)

A. Check Pods

# Check pod status
kubectl get pods -n client-{client-slug}

# Should show:
# NAME READY STATUS RESTARTS AGE
# acme-corp-api-xxx 1/1 Running 0 2m

# Check logs
kubectl logs -f -n client-{client-slug} deployment/{client-name}-{module}

B. Test Health Endpoint

# Port forward
kubectl port-forward -n client-{client-slug} svc/{client-name}-{module} 8080:80

# Test (in another terminal)
curl http://localhost:8080/.well-known/engine-status

# Should return JSON with:
# - id: "{client-name}.{module}"
# - sku: "`{sku}`"
# - status.state: "ok"

C. Check in Control Center

# Trigger refresh
curl -X POST https://control-center.egintegrations.com/api/refresh

# Wait 2-3 minutes, then check UI
# Open: https://control-center.egintegrations.com/ui
# Engines tab should show new engine

Step 10: Register Engine (5 minutes)

Add to engines inventory:

# Edit engines.yaml
cd /Users/elliottgodwin/Desktop/egi-engine
nano engines/engines.yaml

Add entry:

- id: {client-name}.{module}
client: {client-name}
environment: production
module: {module}
type: {TYPE} # API, BOT, RAG, etc.
platform: EKS
sku: {generated_sku}
runtime: kubernetes
namespace: client-{client-slug}
base_url: http://{client-name}-{module}.client-{client-slug}.svc.cluster.local
status_path: /.well-known/engine-status
poll_interval_seconds: 900
links:
dashboard: https://grafana.egintegrations.com
logs: https://grafana.egintegrations.com/explore
notes: "Deployed on `{date}`"

Commit and push:

git add engines/engines.yaml
git commit -m "feat: register {client-name} {module} engine"
git push

Step 11: Configure Monitoring (5 minutes)

A. Uptime Kuma

  1. Log into Uptime Kuma dashboard
  2. Click "Add New Monitor"
  3. Configure:
    • Monitor Type: HTTP(s)
    • Friendly Name: {client-name}.{module}
    • URL: Health check endpoint
    • Heartbeat Interval: 60 seconds
    • Retries: 3
    • Notifications: Slack #alerts-warning
  4. Save monitor

B. Slack Notifications

Configure alert routing in Slack:

  • Critical alerts → #alerts-critical
  • Warning alerts → #alerts-warning
  • Info alerts → #alerts-info

C. PostHog Analytics (Optional)

If engine exposes user analytics:

  1. Add PostHog API key to environment variables
  2. Configure event tracking
  3. Create custom dashboard in PostHog

D. CrowdSec Security (Optional)

If engine is public-facing:

  1. Configure CrowdSec agent for namespace
  2. Set up IP ban rules
  3. Monitor security events in CrowdSec dashboard

Step 12: Verify Everything (5 minutes)

Checklist:

  • Pod is Running (1/1)
  • Health endpoint returns 200
  • Logs show no errors
  • Engine appears in Control Center (green status)
  • Uptime Kuma monitor active and reporting UP
  • Entry added to engines.yaml
  • Slack notifications configured
  • PostHog analytics configured (if applicable)
  • CrowdSec rules applied (if public-facing)
  • Documentation updated

Post-Deployment

1. Document

Create deployment record:

# In client project folder
cat > DEPLOYMENT.md <<EOF
# Deployment Record

**Date:** `$(date)`
**Engine:** {client-name}.{module}
**SKU:** `{sku}`
**Namespace:** client-{client-slug}
**Image:** ghcr.io/egintegrations/{client-name}-{module}:main

## Deployed By
- Name: `{your_name}`
- Date: `$(date)`

## Configuration
- Database: `{if applicable}`
- External Services: `{if applicable}`
- Environment Variables: `{list important ones}`

## Monitoring
- Uptime Kuma: ✅
- Slack Alerts: ✅ (#alerts-warning, #alerts-critical)
- PostHog Analytics: `{✅ or N/A}`
- CrowdSec Security: `{✅ or N/A}`

## Verification
- Health Check: ✅
- Control Center: ✅
- Monitoring: ✅

## Notes
{any special notes}
EOF

2. Notify Team

Post in Slack with:

  • Engine name and ID
  • Deployment status
  • Monitoring links (Uptime Kuma, PostHog)
  • Support contact
  • Next steps

3. Set Up Alerts

Verify alerts are configured:

  • Uptime Kuma: Pod down, health check failure
  • PostHog: Usage analytics, error tracking
  • CrowdSec: Security threats, IP bans
  • Resource alerts: High CPU/memory via Kubernetes metrics

Rollback Procedure

If deployment fails:

# Delete deployment
helm uninstall {client-name}-{module} -n client-{client-slug}

# Or rollback to previous version
helm rollback {client-name}-{module} -n client-{client-slug}

# Check status
helm history {client-name}-{module} -n client-{client-slug}

Post rollback notification in Slack #alerts-critical.


Troubleshooting

Pod Not Starting

See: runbooks/01-engine-down.md

Image Pull Failed

# Check image exists
docker pull ghcr.io/egintegrations/{client-name}-{module}:main

# Check image pull secrets
kubectl get secrets -n client-{client-slug}

# Create if missing (see Step 6)

Health Check Failing

# Check logs
kubectl logs -n client-{client-slug} deployment/{client-name}-{module}

# Common issues:
# - Database not accessible
# - Missing environment variables
# - Port mismatch

Next Steps

After successful deployment:

  1. Set up CI/CD for automatic deployments
  2. Configure ingress (if public-facing)
  3. Add custom domain
  4. Set up automated backups
  5. Create client-specific dashboard in PostHog
  6. Review CrowdSec security logs weekly

Last Updated: 2026-03-25 Version: 1.1