Uptime Kuma Setup Checklist: [Service Name]

Template Instructions

Replace all text in [brackets] with your content. Delete this callout when done. This checklist ensures comprehensive monitoring setup in Uptime Kuma.

Service Overview

Service Name: [Service Name]

Service Type: [Website, API, Database, etc.]

Environment: [Production, Staging, Development]

Criticality: [Low, Medium, High, Critical]

Owner: [Name]

Setup Date: [YYYY-MM-DD]

Prerequisites

Uptime Kuma instance accessible
- URL: [https://uptime.company.com]
- Login credentials available
- Admin access verified
Service information gathered
- Service URL(s): [URLs]
- Health check endpoints: [Endpoints]
- Expected response codes: [200, etc.]
- Authentication requirements: [If any]
Alert channels configured
- Slack webhook: [Available]
- Email: [Configured]
- SMS/Phone: [If applicable]
- Other: [If applicable]

Monitor Configuration

Basic Monitor Setup

Monitor created
- Monitor name: [Descriptive name]
- Monitor type: [HTTP(s), TCP, etc.]
- URL: [Full URL with protocol]
- Monitor created in Uptime Kuma: ✅
Monitor settings configured
- Friendly name: [Service Name - Production]
- Interval: [60 seconds recommended]
- Retry interval: [30 seconds]
- Retries: [1-2 recommended]
- Tags: [production, web-app, critical]
HTTP settings (if HTTP monitor)
- Method: [GET, POST, HEAD]
- Expected status code: [200, 201, etc.]
- Follow redirects: [Yes/No]
- Max redirects: [5]
- Accept any status code: [No recommended]

Advanced Configuration

Request settings
- Headers added (if needed):
```
User-Agent: Uptime-Kuma-Monitor
Authorization: Bearer [token if needed]
```
- Request body (if POST): [Body]
- Request timeout: [30 seconds]
Response validation
- Keyword check: [Keyword to find in response]
- Inverse keyword: [No]
- JSON query: [If checking API response]
- Expected value: [Expected JSON value]
Certificate monitoring (if HTTPS)
- Ignore TLS/SSL error: [No]
- Certificate expiry notification: [Enabled]
- Days before expiry alert: [14 days]

Authentication (if required)

Authentication configured
- Auth type: [Basic, Bearer, etc.]
- Credentials configured
- Credentials tested
- Credentials stored securely

Health Check Endpoints

Best Practice

Use dedicated health check endpoints that verify critical dependencies.

Primary Health Check

Endpoint configured
- URL: [/health, /api/health, etc.]
- Method: [GET]
- Expected status: [200]
- Expected response: {"status": "ok"}
Health check verifies
- Application is running
- Database connectivity
- Cache connectivity (if applicable)
- External API connectivity (if critical)
- File system access (if critical)

Additional Endpoints

Homepage/Landing page
- URL: [/]
- Verifies: [Frontend is accessible]
- Monitor name: [Service - Homepage]
API endpoint (if applicable)
- URL: [/api/status or /api/health]
- Verifies: [API is responding]
- Monitor name: [Service - API]
Admin panel (if applicable)
- URL: [/admin or admin subdomain]
- Verifies: [Admin interface accessible]
- Monitor name: [Service - Admin]

Alert Configuration

Notification Channels

Slack integration
- Webhook URL configured
- Channel: [#alerts or #monitoring]
- Test notification sent: ✅
- Notification received: ✅
Email notifications
- Email address(es): [emails]
- Test email sent: ✅
- Email received: ✅
SMS/Phone (if critical)
- Phone numbers: [numbers]
- Test SMS sent: ✅
- SMS received: ✅
Additional channels
- [Discord, Teams, etc. if used]
- Configured: ✅
- Tested: ✅

Alert Settings

Alert thresholds configured
- Send notification on: [Down, Certificate expiry]
- Uptime threshold: [98%]
- Certificate expiry days: [14 days]
- Down notification delay: [0-1 retries]
Alert recipients assigned
- Primary on-call: [Name/channel]
- Secondary on-call: [Name/channel]
- Team channel: [#team-alerts]
- Escalation defined: [Process]
Alert message customized
- Include service name
- Include environment
- Include link to service
- Include link to runbook (if exists)

Notification Rules

Notification frequency
- Initial alert: [Immediate]
- Reminder if still down: [Every 15 minutes]
- Recovery notification: [Yes]
Maintenance mode
- Pause notifications during maintenance: [Process defined]
- Maintenance schedule: [If regular maintenance windows]

Monitor Groups & Organization

Grouping

Service assigned to groups
- Group 1: [Production Services]
- Group 2: [Web Applications]
- Group 3: [High Priority]
- Custom group: [If applicable]
Tags applied
- Environment: [production]
- Type: [web, api, database]
- Team: [team-name]
- Criticality: [critical, high, medium, low]

Dashboard Organization

Monitor positioned on dashboard
- In appropriate group
- Priority order correct
- Visible on status page (if public)
Status page configuration (if applicable)
- Service visible on status page: [Yes/No]
- Status page URL: [https://status.company.com]
- Service description added
- Service group assigned

Testing & Verification

Initial Testing

Monitor is active
- Status: [Up/Down]
- First check completed: ✅
- Response time: [X ms]
- No immediate issues: ✅
Simulate downtime
- Service temporarily disabled/blocked
- Alert triggered: ✅
- Alert received in Slack: ✅
- Alert received via email: ✅
- Time to alert: [X seconds]
Verify recovery notification
- Service re-enabled
- Recovery detected: ✅
- Recovery notification sent: ✅
- Recovery notification received: ✅

Response Time Baseline

Baseline established
- Average response time: [X ms]
- 95th percentile: [X ms]
- 99th percentile: [X ms]
- Baseline documented
Performance thresholds set (if applicable)
- Warning threshold: [X ms]
- Critical threshold: [Y ms]
- Alerts configured for slow responses

Documentation

Monitor Documentation

System profile updated
- Monitoring section updated
- Monitor IDs documented
- Alert channels documented
- Runbook linked
Runbook created/updated
- What this monitor checks
- What it means if it fails
- Who to contact
- Troubleshooting steps
- Common causes
- How to resolve
Alert response documented
- Expected response time: [X minutes]
- Escalation path: [Defined]
- Recovery procedures: [Documented]

Team Communication

Team notified
- Monitor setup communicated
- Alert channels shared
- Runbook link shared
- Response expectations set
Training completed (if needed)
- Team trained on alert response
- Runbook reviewed
- Escalation process understood

Multi-Environment Setup (if applicable)

Production

Production monitor configured
- Monitor name: [Service - Production]
- URL: [https://service.com]
- Alerts to: [Primary channels]
- Tags: [production, high-priority]

Staging

Staging monitor configured
- Monitor name: [Service - Staging]
- URL: [https://staging.service.com]
- Alerts to: [Dev channel]
- Tags: [staging, medium-priority]

Development

Development monitor configured (optional)
- Monitor name: [Service - Development]
- URL: [https://dev.service.com]
- Alerts to: [Dev channel]
- Tags: [development, low-priority]

Advanced Features

Dependency Monitoring

Dependencies identified
- Database: [Monitor name]
- Cache (Redis): [Monitor name]
- External API: [Monitor name]
- CDN: [Monitor name]
Dependency monitors created
- Each dependency has monitor
- Dependency relationships documented
- Alert impact understood

Certificate Monitoring

SSL certificate monitoring enabled
- Certificate expiry tracked
- Alert before expiry: [14 days]
- Renewal process documented
- Certificate owner identified

Custom Scripts (if applicable)

Custom monitoring scripts
- Script purpose: [Description]
- Script location: [Path]
- Script schedule: [Frequency]
- Script output monitored

Maintenance & Review

Regular Checks

Weekly checks scheduled
- Monitor status reviewed
- Alert history reviewed
- False positives investigated
- Response times checked
Monthly review scheduled
- Uptime percentage: [Target: 99.9%]
- Average response time: [Target: < X ms]
- Alert count: [Count]
- Thresholds still appropriate: ✅/❌

Continuous Improvement

Monitor optimization
- Interval appropriate for service
- Alert fatigue prevented
- Thresholds tuned based on data
- False positives minimized
Documentation maintained
- Runbook kept current
- Contact information current
- Troubleshooting steps updated
- Lessons learned added

Monitor Information Reference

Monitor Details

Field	Value
Monitor ID	[ID in Uptime Kuma]
Monitor Name	[Full name]
URL	[Full URL]
Type	[HTTP(s), TCP, etc.]
Interval	[60 seconds]
Created	[YYYY-MM-DD]
Created By	[Name]

Alert Channels

Channel	Destination	Purpose	Active
Slack	[#channel]	Primary alerts	✅
Email	[email]	Secondary	✅
SMS	[phone]	Critical only	✅

Team & Ownership

Role	Name	Contact	Responsibilities
Service Owner	[Name]	[Contact]	Overall ownership
Technical Lead	[Name]	[Contact]	Technical issues
On-Call Primary	[Name]	[Contact]	First responder
On-Call Secondary	[Name]	[Contact]	Escalation

Checklist Status

Setup Progress

Setup Status: [In Progress / Complete]

Completion Date: [YYYY-MM-DD]

Verified By: [Name]

[System Profile: [Service Name]]
[Runbook: [Service Name] Downtime]
[SOP: Uptime Kuma Administration]
[Incident Response Procedures]

Notes

[Any additional notes, observations, or special considerations for this monitor]

Checklist Version: 1.0 Last Updated: [YYYY-MM-DD] Next Review: [YYYY-MM-DD]

Service Overview​

Prerequisites​

Monitor Configuration​

Basic Monitor Setup​

Advanced Configuration​

Authentication (if required)​

Health Check Endpoints​

Primary Health Check​

Additional Endpoints​

Alert Configuration​

Notification Channels​

Alert Settings​

Notification Rules​

Monitor Groups & Organization​

Grouping​

Dashboard Organization​

Testing & Verification​

Initial Testing​

Response Time Baseline​

Documentation​

Monitor Documentation​

Team Communication​

Multi-Environment Setup (if applicable)​

Production​

Staging​

Development​

Advanced Features​

Dependency Monitoring​

Certificate Monitoring​

Custom Scripts (if applicable)​

Maintenance & Review​

Regular Checks​

Continuous Improvement​

Monitor Information Reference​

Monitor Details​

Alert Channels​

Team & Ownership​

Checklist Status​

Setup Progress​

Related Documentation​

Notes​