Support Ownership
This document defines who is responsible for supporting a product at each stage of its lifecycle, how issues are escalated, and what service levels apply. Clear support ownership prevents dropped issues and ensures every reported problem has a defined path to resolution.
Ownership by Lifecycle Stage
Development (EGI Owns)
Phases: Discovery, Design, Build, Test
During active development, EGI is solely responsible for all aspects of the product.
| Responsibility | Owner |
|---|---|
| Bug triage and resolution | EGI engineering |
| Environment management (dev, staging) | EGI engineering |
| Client-reported issues during UAT | EGI project lead |
| Infrastructure provisioning | EGI engineering |
Support channels: Internal Slack channels (#project-[name]) and client Slack Connect (#client-[name]).
There are no formal SLAs during development. Issues are prioritized within the sprint backlog according to severity and impact on the development timeline.
Launch and Stabilization (EGI Primary, Anchor Monitoring)
Phases: Launch, first 30 days post-launch
EGI remains the primary owner during launch and the 30-day stabilization period. Anchor MSP begins onboarding and shadowing during this time.
| Responsibility | Owner |
|---|---|
| Production incident response | EGI engineering (primary) |
| Monitoring and alerting | EGI + Anchor (shared) |
| Bug fixes and hotfixes | EGI engineering |
| Client communication for issues | EGI project lead |
| Runbook onboarding | Anchor operations |
| Shadow on-call | Anchor operations |
Key activities during stabilization:
- EGI handles all production incidents and bugs directly
- Anchor monitors dashboards and alerts alongside EGI to build familiarity
- Anchor reviews the runbook and asks clarifying questions
- EGI documents any new operational patterns discovered during stabilization
- Anchor begins handling routine operational tasks (backups, certificate checks) under EGI supervision
Post-Handoff (Anchor Operations, EGI Application-Level)
Phases: After formal handoff sign-off
Anchor MSP is the primary support owner. EGI provides application-level support for issues that require code changes or deep application knowledge.
| Responsibility | Owner |
|---|---|
| Infrastructure monitoring and incident response | Anchor operations |
| Routine maintenance (patching, updates, backups) | Anchor operations |
| First-line client support (if applicable) | Anchor operations |
| SLA management and uptime reporting | Anchor operations |
| Application bug fixes | EGI engineering (via escalation) |
| Feature enhancements and new development | EGI engineering (scoped as new work) |
| Performance optimization recommendations | Mast |
Escalation Paths
Anchor to EGI Escalation
When Anchor encounters an issue that is beyond routine operations, they escalate to EGI using the following process:
Step 1: Classify the issue
| Classification | Description | Examples |
|---|---|---|
| Infrastructure | Server, network, hosting, DNS, certificates | Server down, DNS misconfiguration, SSL expiry |
| Application | Code bugs, feature regressions, data integrity | 500 errors, broken feature, incorrect data |
| Configuration | Environment variables, third-party service settings | API key expired, misconfigured webhook |
- Infrastructure and Configuration issues are Anchor's responsibility to resolve
- Application issues are escalated to EGI
Step 2: Post the escalation
Post to the #cross-egi-anchor Slack channel (or the project-specific cross-company channel) using this template:
**Escalation: [Project Name]**
- Severity: P0 / P1 / P2 / P3
- Impact: [Who is affected and how]
- Affected systems: [List services, endpoints, or features]
- Steps already taken: [What Anchor has tried]
- Logs/screenshots: [Attach relevant evidence]
Step 3: EGI acknowledges
| Severity | Acknowledgment Time | Resolution Target |
|---|---|---|
| P0 (Critical) | 30 minutes | 4 hours |
| P1 (High) | 2 hours | 8 business hours |
| P2 (Medium) | 4 hours | 2 business days |
| P3 (Low) | 1 business day | Next sprint |
Step 4: Collaborate and resolve
EGI assigns an owner, collaborates with Anchor in the escalation thread, resolves the issue, and updates the runbook if the fix changes any operational procedures.
Internal EGI Escalation
For issues during development or stabilization where the assigned engineer needs help:
- Post in the
#project-[name]channel with context and what has been tried - Tag the project lead if the issue is time-sensitive or blocking
- If the issue requires cross-project knowledge, tag
@egi-engineering - For production incidents during stabilization, follow the incident response procedure below
Severity Definitions
| Severity | Definition | Examples |
|---|---|---|
| P0 -- Critical | The product is completely down or a critical function is broken for all users. Data loss may be occurring. | Full outage, database corruption, security breach |
| P1 -- High | A major feature is broken or severely degraded for a significant portion of users. No workaround exists. | Authentication failure, payment processing broken, data not saving |
| P2 -- Medium | A feature is partially broken or degraded. A workaround exists. | Report export failing for one format, slow page load, minor UI bug affecting usability |
| P3 -- Low | A minor issue that does not significantly impact functionality. Cosmetic issues. | Typo, styling inconsistency, tooltip not appearing, non-critical log warning |
SLA Definitions
Production SLAs (Post-Handoff)
These SLAs apply to products that have completed the handoff to Anchor.
| Metric | Target |
|---|---|
| Uptime | 99.9% measured monthly (excludes scheduled maintenance windows) |
| P0 response time | Anchor acknowledges within 15 minutes; EGI escalation acknowledged within 30 minutes |
| P1 response time | Anchor acknowledges within 1 hour; EGI escalation acknowledged within 2 hours |
| Scheduled maintenance windows | Communicated to stakeholders at least 48 hours in advance |
| Backup frequency | Daily automated backups with 30-day retention |
| Backup restoration test | Quarterly |
Reporting
Anchor provides a monthly uptime and incident report for each product under their management. The report includes:
- Uptime percentage for the month
- List of incidents with severity, duration, root cause, and resolution
- SLA compliance status
- Upcoming maintenance or known risks
The report is shared with the EGI project lead and the client (if applicable) through SuiteDash.
On-Call
During Stabilization (EGI)
- EGI assigns a primary on-call engineer for each product during the 30-day stabilization period
- On-call rotates weekly among the project's engineering team
- The on-call engineer monitors alerts and responds to P0/P1 incidents outside business hours
- On-call expectations: respond to pages within 15 minutes, begin investigation within 30 minutes
Post-Handoff (Anchor)
- Anchor maintains an on-call rotation for all managed products
- EGI engineers are not on-call after handoff is complete, but are available for escalation during business hours
- For P0 incidents that require EGI involvement after hours, Anchor contacts the EGI project lead via phone/SMS
Incident Response
Incident Lifecycle
- Detection -- Alert fires or issue is reported
- Acknowledgment -- On-call engineer acknowledges the alert and begins investigation
- Triage -- Classify severity, determine scope of impact, identify the owning team
- Resolution -- Implement a fix or workaround to restore service
- Communication -- Notify affected stakeholders (internal team, client)
- Post-mortem -- For P0 and P1 incidents, conduct a blameless post-mortem within 48 hours
Post-Mortem Template
Post-mortems are documented in the project repository under /docs/incidents/ and include:
- Incident summary: What happened, when, and how long it lasted
- Impact: Who was affected and how
- Root cause: Why it happened
- Timeline: Chronological sequence of events from detection to resolution
- What went well: Aspects of the response that worked
- What could be improved: Gaps in process, tooling, or communication
- Action items: Specific follow-up tasks with owners and deadlines
Ownership Transition Summary
| Stage | Primary Owner | Support Channel | SLA Active |
|---|---|---|---|
| Development | EGI | #project-[name] | No |
| Launch | EGI | #project-[name], #ops-alerts | No (best-effort) |
| Stabilization (30 days) | EGI (Anchor shadow) | #project-[name], #ops-alerts | Informal |
| Post-handoff | Anchor (EGI escalation) | #cross-egi-anchor, #ops-alerts | Yes |