Skip to main content

Support Ownership

This document defines who is responsible for supporting a product at each stage of its lifecycle, how issues are escalated, and what service levels apply. Clear support ownership prevents dropped issues and ensures every reported problem has a defined path to resolution.

Ownership by Lifecycle Stage

Development (EGI Owns)

Phases: Discovery, Design, Build, Test

During active development, EGI is solely responsible for all aspects of the product.

ResponsibilityOwner
Bug triage and resolutionEGI engineering
Environment management (dev, staging)EGI engineering
Client-reported issues during UATEGI project lead
Infrastructure provisioningEGI engineering

Support channels: Internal Slack channels (#project-[name]) and client Slack Connect (#client-[name]).

There are no formal SLAs during development. Issues are prioritized within the sprint backlog according to severity and impact on the development timeline.

Launch and Stabilization (EGI Primary, Anchor Monitoring)

Phases: Launch, first 30 days post-launch

EGI remains the primary owner during launch and the 30-day stabilization period. Anchor MSP begins onboarding and shadowing during this time.

ResponsibilityOwner
Production incident responseEGI engineering (primary)
Monitoring and alertingEGI + Anchor (shared)
Bug fixes and hotfixesEGI engineering
Client communication for issuesEGI project lead
Runbook onboardingAnchor operations
Shadow on-callAnchor operations

Key activities during stabilization:

  1. EGI handles all production incidents and bugs directly
  2. Anchor monitors dashboards and alerts alongside EGI to build familiarity
  3. Anchor reviews the runbook and asks clarifying questions
  4. EGI documents any new operational patterns discovered during stabilization
  5. Anchor begins handling routine operational tasks (backups, certificate checks) under EGI supervision

Post-Handoff (Anchor Operations, EGI Application-Level)

Phases: After formal handoff sign-off

Anchor MSP is the primary support owner. EGI provides application-level support for issues that require code changes or deep application knowledge.

ResponsibilityOwner
Infrastructure monitoring and incident responseAnchor operations
Routine maintenance (patching, updates, backups)Anchor operations
First-line client support (if applicable)Anchor operations
SLA management and uptime reportingAnchor operations
Application bug fixesEGI engineering (via escalation)
Feature enhancements and new developmentEGI engineering (scoped as new work)
Performance optimization recommendationsMast

Escalation Paths

Anchor to EGI Escalation

When Anchor encounters an issue that is beyond routine operations, they escalate to EGI using the following process:

Step 1: Classify the issue

ClassificationDescriptionExamples
InfrastructureServer, network, hosting, DNS, certificatesServer down, DNS misconfiguration, SSL expiry
ApplicationCode bugs, feature regressions, data integrity500 errors, broken feature, incorrect data
ConfigurationEnvironment variables, third-party service settingsAPI key expired, misconfigured webhook
  • Infrastructure and Configuration issues are Anchor's responsibility to resolve
  • Application issues are escalated to EGI

Step 2: Post the escalation

Post to the #cross-egi-anchor Slack channel (or the project-specific cross-company channel) using this template:

**Escalation: [Project Name]**
- Severity: P0 / P1 / P2 / P3
- Impact: [Who is affected and how]
- Affected systems: [List services, endpoints, or features]
- Steps already taken: [What Anchor has tried]
- Logs/screenshots: [Attach relevant evidence]

Step 3: EGI acknowledges

SeverityAcknowledgment TimeResolution Target
P0 (Critical)30 minutes4 hours
P1 (High)2 hours8 business hours
P2 (Medium)4 hours2 business days
P3 (Low)1 business dayNext sprint

Step 4: Collaborate and resolve

EGI assigns an owner, collaborates with Anchor in the escalation thread, resolves the issue, and updates the runbook if the fix changes any operational procedures.

Internal EGI Escalation

For issues during development or stabilization where the assigned engineer needs help:

  1. Post in the #project-[name] channel with context and what has been tried
  2. Tag the project lead if the issue is time-sensitive or blocking
  3. If the issue requires cross-project knowledge, tag @egi-engineering
  4. For production incidents during stabilization, follow the incident response procedure below

Severity Definitions

SeverityDefinitionExamples
P0 -- CriticalThe product is completely down or a critical function is broken for all users. Data loss may be occurring.Full outage, database corruption, security breach
P1 -- HighA major feature is broken or severely degraded for a significant portion of users. No workaround exists.Authentication failure, payment processing broken, data not saving
P2 -- MediumA feature is partially broken or degraded. A workaround exists.Report export failing for one format, slow page load, minor UI bug affecting usability
P3 -- LowA minor issue that does not significantly impact functionality. Cosmetic issues.Typo, styling inconsistency, tooltip not appearing, non-critical log warning

SLA Definitions

Production SLAs (Post-Handoff)

These SLAs apply to products that have completed the handoff to Anchor.

MetricTarget
Uptime99.9% measured monthly (excludes scheduled maintenance windows)
P0 response timeAnchor acknowledges within 15 minutes; EGI escalation acknowledged within 30 minutes
P1 response timeAnchor acknowledges within 1 hour; EGI escalation acknowledged within 2 hours
Scheduled maintenance windowsCommunicated to stakeholders at least 48 hours in advance
Backup frequencyDaily automated backups with 30-day retention
Backup restoration testQuarterly

Reporting

Anchor provides a monthly uptime and incident report for each product under their management. The report includes:

  • Uptime percentage for the month
  • List of incidents with severity, duration, root cause, and resolution
  • SLA compliance status
  • Upcoming maintenance or known risks

The report is shared with the EGI project lead and the client (if applicable) through SuiteDash.

On-Call

During Stabilization (EGI)

  • EGI assigns a primary on-call engineer for each product during the 30-day stabilization period
  • On-call rotates weekly among the project's engineering team
  • The on-call engineer monitors alerts and responds to P0/P1 incidents outside business hours
  • On-call expectations: respond to pages within 15 minutes, begin investigation within 30 minutes

Post-Handoff (Anchor)

  • Anchor maintains an on-call rotation for all managed products
  • EGI engineers are not on-call after handoff is complete, but are available for escalation during business hours
  • For P0 incidents that require EGI involvement after hours, Anchor contacts the EGI project lead via phone/SMS

Incident Response

Incident Lifecycle

  1. Detection -- Alert fires or issue is reported
  2. Acknowledgment -- On-call engineer acknowledges the alert and begins investigation
  3. Triage -- Classify severity, determine scope of impact, identify the owning team
  4. Resolution -- Implement a fix or workaround to restore service
  5. Communication -- Notify affected stakeholders (internal team, client)
  6. Post-mortem -- For P0 and P1 incidents, conduct a blameless post-mortem within 48 hours

Post-Mortem Template

Post-mortems are documented in the project repository under /docs/incidents/ and include:

  • Incident summary: What happened, when, and how long it lasted
  • Impact: Who was affected and how
  • Root cause: Why it happened
  • Timeline: Chronological sequence of events from detection to resolution
  • What went well: Aspects of the response that worked
  • What could be improved: Gaps in process, tooling, or communication
  • Action items: Specific follow-up tasks with owners and deadlines

Ownership Transition Summary

StagePrimary OwnerSupport ChannelSLA Active
DevelopmentEGI#project-[name]No
LaunchEGI#project-[name], #ops-alertsNo (best-effort)
Stabilization (30 days)EGI (Anchor shadow)#project-[name], #ops-alertsInformal
Post-handoffAnchor (EGI escalation)#cross-egi-anchor, #ops-alertsYes