Skip to main content

Uptime Kuma Setup Checklist: [Service Name]

Template Instructions

Replace all text in [brackets] with your content. Delete this callout when done. This checklist ensures comprehensive monitoring setup in Uptime Kuma.

Service Overview

Service Name: [Service Name]

Service Type: [Website, API, Database, etc.]

Environment: [Production, Staging, Development]

Criticality: [Low, Medium, High, Critical]

Owner: [Name]

Setup Date: [YYYY-MM-DD]

Prerequisites

  • Uptime Kuma instance accessible

  • Service information gathered

    • Service URL(s): [URLs]
    • Health check endpoints: [Endpoints]
    • Expected response codes: [200, etc.]
    • Authentication requirements: [If any]
  • Alert channels configured

    • Slack webhook: [Available]
    • Email: [Configured]
    • SMS/Phone: [If applicable]
    • Other: [If applicable]

Monitor Configuration

Basic Monitor Setup

  • Monitor created

    • Monitor name: [Descriptive name]
    • Monitor type: [HTTP(s), TCP, etc.]
    • URL: [Full URL with protocol]
    • Monitor created in Uptime Kuma: ✅
  • Monitor settings configured

    • Friendly name: [Service Name - Production]
    • Interval: [60 seconds recommended]
    • Retry interval: [30 seconds]
    • Retries: [1-2 recommended]
    • Tags: [production, web-app, critical]
  • HTTP settings (if HTTP monitor)

    • Method: [GET, POST, HEAD]
    • Expected status code: [200, 201, etc.]
    • Follow redirects: [Yes/No]
    • Max redirects: [5]
    • Accept any status code: [No recommended]

Advanced Configuration

  • Request settings

    • Headers added (if needed):
      User-Agent: Uptime-Kuma-Monitor
      Authorization: Bearer [token if needed]
    • Request body (if POST): [Body]
    • Request timeout: [30 seconds]
  • Response validation

    • Keyword check: [Keyword to find in response]
    • Inverse keyword: [No]
    • JSON query: [If checking API response]
    • Expected value: [Expected JSON value]
  • Certificate monitoring (if HTTPS)

    • Ignore TLS/SSL error: [No]
    • Certificate expiry notification: [Enabled]
    • Days before expiry alert: [14 days]

Authentication (if required)

  • Authentication configured
    • Auth type: [Basic, Bearer, etc.]
    • Credentials configured
    • Credentials tested
    • Credentials stored securely

Health Check Endpoints

Best Practice

Use dedicated health check endpoints that verify critical dependencies.

Primary Health Check

  • Endpoint configured

    • URL: [/health, /api/health, etc.]
    • Method: [GET]
    • Expected status: [200]
    • Expected response: {"status": "ok"}
  • Health check verifies

    • Application is running
    • Database connectivity
    • Cache connectivity (if applicable)
    • External API connectivity (if critical)
    • File system access (if critical)

Additional Endpoints

  • Homepage/Landing page

    • URL: [/]
    • Verifies: [Frontend is accessible]
    • Monitor name: [Service - Homepage]
  • API endpoint (if applicable)

    • URL: [/api/status or /api/health]
    • Verifies: [API is responding]
    • Monitor name: [Service - API]
  • Admin panel (if applicable)

    • URL: [/admin or admin subdomain]
    • Verifies: [Admin interface accessible]
    • Monitor name: [Service - Admin]

Alert Configuration

Notification Channels

  • Slack integration

    • Webhook URL configured
    • Channel: [#alerts or #monitoring]
    • Test notification sent: ✅
    • Notification received: ✅
  • Email notifications

    • Email address(es): [emails]
    • Test email sent: ✅
    • Email received: ✅
  • SMS/Phone (if critical)

    • Phone numbers: [numbers]
    • Test SMS sent: ✅
    • SMS received: ✅
  • Additional channels

    • [Discord, Teams, etc. if used]
    • Configured: ✅
    • Tested: ✅

Alert Settings

  • Alert thresholds configured

    • Send notification on: [Down, Certificate expiry]
    • Uptime threshold: [98%]
    • Certificate expiry days: [14 days]
    • Down notification delay: [0-1 retries]
  • Alert recipients assigned

    • Primary on-call: [Name/channel]
    • Secondary on-call: [Name/channel]
    • Team channel: [#team-alerts]
    • Escalation defined: [Process]
  • Alert message customized

    • Include service name
    • Include environment
    • Include link to service
    • Include link to runbook (if exists)

Notification Rules

  • Notification frequency

    • Initial alert: [Immediate]
    • Reminder if still down: [Every 15 minutes]
    • Recovery notification: [Yes]
  • Maintenance mode

    • Pause notifications during maintenance: [Process defined]
    • Maintenance schedule: [If regular maintenance windows]

Monitor Groups & Organization

Grouping

  • Service assigned to groups

    • Group 1: [Production Services]
    • Group 2: [Web Applications]
    • Group 3: [High Priority]
    • Custom group: [If applicable]
  • Tags applied

    • Environment: [production]
    • Type: [web, api, database]
    • Team: [team-name]
    • Criticality: [critical, high, medium, low]

Dashboard Organization

  • Monitor positioned on dashboard

    • In appropriate group
    • Priority order correct
    • Visible on status page (if public)
  • Status page configuration (if applicable)

    • Service visible on status page: [Yes/No]
    • Status page URL: [https://status.company.com]
    • Service description added
    • Service group assigned

Testing & Verification

Initial Testing

  • Monitor is active

    • Status: [Up/Down]
    • First check completed: ✅
    • Response time: [X ms]
    • No immediate issues: ✅
  • Simulate downtime

    • Service temporarily disabled/blocked
    • Alert triggered: ✅
    • Alert received in Slack: ✅
    • Alert received via email: ✅
    • Time to alert: [X seconds]
  • Verify recovery notification

    • Service re-enabled
    • Recovery detected: ✅
    • Recovery notification sent: ✅
    • Recovery notification received: ✅

Response Time Baseline

  • Baseline established

    • Average response time: [X ms]
    • 95th percentile: [X ms]
    • 99th percentile: [X ms]
    • Baseline documented
  • Performance thresholds set (if applicable)

    • Warning threshold: [X ms]
    • Critical threshold: [Y ms]
    • Alerts configured for slow responses

Documentation

Monitor Documentation

  • System profile updated

    • Monitoring section updated
    • Monitor IDs documented
    • Alert channels documented
    • Runbook linked
  • Runbook created/updated

    • What this monitor checks
    • What it means if it fails
    • Who to contact
    • Troubleshooting steps
    • Common causes
    • How to resolve
  • Alert response documented

    • Expected response time: [X minutes]
    • Escalation path: [Defined]
    • Recovery procedures: [Documented]

Team Communication

  • Team notified

    • Monitor setup communicated
    • Alert channels shared
    • Runbook link shared
    • Response expectations set
  • Training completed (if needed)

    • Team trained on alert response
    • Runbook reviewed
    • Escalation process understood

Multi-Environment Setup (if applicable)

Production

  • Production monitor configured
    • Monitor name: [Service - Production]
    • URL: [https://service.com]
    • Alerts to: [Primary channels]
    • Tags: [production, high-priority]

Staging

  • Staging monitor configured

Development

  • Development monitor configured (optional)
    • Monitor name: [Service - Development]
    • URL: [https://dev.service.com]
    • Alerts to: [Dev channel]
    • Tags: [development, low-priority]

Advanced Features

Dependency Monitoring

  • Dependencies identified

    • Database: [Monitor name]
    • Cache (Redis): [Monitor name]
    • External API: [Monitor name]
    • CDN: [Monitor name]
  • Dependency monitors created

    • Each dependency has monitor
    • Dependency relationships documented
    • Alert impact understood

Certificate Monitoring

  • SSL certificate monitoring enabled
    • Certificate expiry tracked
    • Alert before expiry: [14 days]
    • Renewal process documented
    • Certificate owner identified

Custom Scripts (if applicable)

  • Custom monitoring scripts
    • Script purpose: [Description]
    • Script location: [Path]
    • Script schedule: [Frequency]
    • Script output monitored

Maintenance & Review

Regular Checks

  • Weekly checks scheduled

    • Monitor status reviewed
    • Alert history reviewed
    • False positives investigated
    • Response times checked
  • Monthly review scheduled

    • Uptime percentage: [Target: 99.9%]
    • Average response time: [Target: < X ms]
    • Alert count: [Count]
    • Thresholds still appropriate: ✅/❌

Continuous Improvement

  • Monitor optimization

    • Interval appropriate for service
    • Alert fatigue prevented
    • Thresholds tuned based on data
    • False positives minimized
  • Documentation maintained

    • Runbook kept current
    • Contact information current
    • Troubleshooting steps updated
    • Lessons learned added

Monitor Information Reference

Monitor Details

FieldValue
Monitor ID[ID in Uptime Kuma]
Monitor Name[Full name]
URL[Full URL]
Type[HTTP(s), TCP, etc.]
Interval[60 seconds]
Created[YYYY-MM-DD]
Created By[Name]

Alert Channels

ChannelDestinationPurposeActive
Slack[#channel]Primary alerts
Email[email]Secondary
SMS[phone]Critical only

Team & Ownership

RoleNameContactResponsibilities
Service Owner[Name][Contact]Overall ownership
Technical Lead[Name][Contact]Technical issues
On-Call Primary[Name][Contact]First responder
On-Call Secondary[Name][Contact]Escalation

Checklist Status

Setup Progress

  • Monitor configured in Uptime Kuma
  • Alerts configured and tested
  • Documentation created/updated
  • Team notified and trained
  • Baseline established
  • Thresholds set
  • Runbook created/linked
  • Regular review scheduled

Setup Status: [In Progress / Complete]

Completion Date: [YYYY-MM-DD]

Verified By: [Name]

  • [System Profile: [Service Name]]
  • [Runbook: [Service Name] Downtime]
  • [SOP: Uptime Kuma Administration]
  • [Incident Response Procedures]

Notes

[Any additional notes, observations, or special considerations for this monitor]


Checklist Version: 1.0 Last Updated: [YYYY-MM-DD] Next Review: [YYYY-MM-DD]