Uptime Kuma Setup Checklist: [Service Name]
Replace all text in [brackets] with your content. Delete this callout when done. This checklist ensures comprehensive monitoring setup in Uptime Kuma.
Service Overview
Service Name: [Service Name]
Service Type: [Website, API, Database, etc.]
Environment: [Production, Staging, Development]
Criticality: [Low, Medium, High, Critical]
Owner: [Name]
Setup Date: [YYYY-MM-DD]
Prerequisites
-
Uptime Kuma instance accessible
- URL: [https://uptime.company.com]
- Login credentials available
- Admin access verified
-
Service information gathered
- Service URL(s): [URLs]
- Health check endpoints: [Endpoints]
- Expected response codes: [200, etc.]
- Authentication requirements: [If any]
-
Alert channels configured
- Slack webhook: [Available]
- Email: [Configured]
- SMS/Phone: [If applicable]
- Other: [If applicable]
Monitor Configuration
Basic Monitor Setup
-
Monitor created
- Monitor name: [Descriptive name]
- Monitor type: [HTTP(s), TCP, etc.]
- URL: [Full URL with protocol]
- Monitor created in Uptime Kuma: ✅
-
Monitor settings configured
- Friendly name: [Service Name - Production]
- Interval: [60 seconds recommended]
- Retry interval: [30 seconds]
- Retries: [1-2 recommended]
- Tags: [production, web-app, critical]
-
HTTP settings (if HTTP monitor)
- Method: [GET, POST, HEAD]
- Expected status code: [200, 201, etc.]
- Follow redirects: [Yes/No]
- Max redirects: [5]
- Accept any status code: [No recommended]
Advanced Configuration
-
Request settings
- Headers added (if needed):
User-Agent: Uptime-Kuma-Monitor
Authorization: Bearer [token if needed] - Request body (if POST): [Body]
- Request timeout: [30 seconds]
- Headers added (if needed):
-
Response validation
- Keyword check: [Keyword to find in response]
- Inverse keyword: [No]
- JSON query: [If checking API response]
- Expected value: [Expected JSON value]
-
Certificate monitoring (if HTTPS)
- Ignore TLS/SSL error: [No]
- Certificate expiry notification: [Enabled]
- Days before expiry alert: [14 days]
Authentication (if required)
- Authentication configured
- Auth type: [Basic, Bearer, etc.]
- Credentials configured
- Credentials tested
- Credentials stored securely
Health Check Endpoints
Use dedicated health check endpoints that verify critical dependencies.
Primary Health Check
-
Endpoint configured
- URL: [/health, /api/health, etc.]
- Method: [GET]
- Expected status: [200]
- Expected response:
{"status": "ok"}
-
Health check verifies
- Application is running
- Database connectivity
- Cache connectivity (if applicable)
- External API connectivity (if critical)
- File system access (if critical)
Additional Endpoints
-
Homepage/Landing page
- URL: [/]
- Verifies: [Frontend is accessible]
- Monitor name: [Service - Homepage]
-
API endpoint (if applicable)
- URL: [/api/status or /api/health]
- Verifies: [API is responding]
- Monitor name: [Service - API]
-
Admin panel (if applicable)
- URL: [/admin or admin subdomain]
- Verifies: [Admin interface accessible]
- Monitor name: [Service - Admin]
Alert Configuration
Notification Channels
-
Slack integration
- Webhook URL configured
- Channel: [#alerts or #monitoring]
- Test notification sent: ✅
- Notification received: ✅
-
Email notifications
- Email address(es): [emails]
- Test email sent: ✅
- Email received: ✅
-
SMS/Phone (if critical)
- Phone numbers: [numbers]
- Test SMS sent: ✅
- SMS received: ✅
-
Additional channels
- [Discord, Teams, etc. if used]
- Configured: ✅
- Tested: ✅
Alert Settings
-
Alert thresholds configured
- Send notification on: [Down, Certificate expiry]
- Uptime threshold: [98%]
- Certificate expiry days: [14 days]
- Down notification delay: [0-1 retries]
-
Alert recipients assigned
- Primary on-call: [Name/channel]
- Secondary on-call: [Name/channel]
- Team channel: [#team-alerts]
- Escalation defined: [Process]
-
Alert message customized
- Include service name
- Include environment
- Include link to service
- Include link to runbook (if exists)
Notification Rules
-
Notification frequency
- Initial alert: [Immediate]
- Reminder if still down: [Every 15 minutes]
- Recovery notification: [Yes]
-
Maintenance mode
- Pause notifications during maintenance: [Process defined]
- Maintenance schedule: [If regular maintenance windows]
Monitor Groups & Organization
Grouping
-
Service assigned to groups
- Group 1: [Production Services]
- Group 2: [Web Applications]
- Group 3: [High Priority]
- Custom group: [If applicable]
-
Tags applied
- Environment: [production]
- Type: [web, api, database]
- Team: [team-name]
- Criticality: [critical, high, medium, low]
Dashboard Organization
-
Monitor positioned on dashboard
- In appropriate group
- Priority order correct
- Visible on status page (if public)
-
Status page configuration (if applicable)
- Service visible on status page: [Yes/No]
- Status page URL: [https://status.company.com]
- Service description added
- Service group assigned
Testing & Verification
Initial Testing
-
Monitor is active
- Status: [Up/Down]
- First check completed: ✅
- Response time: [X ms]
- No immediate issues: ✅
-
Simulate downtime
- Service temporarily disabled/blocked
- Alert triggered: ✅
- Alert received in Slack: ✅
- Alert received via email: ✅
- Time to alert: [X seconds]
-
Verify recovery notification
- Service re-enabled
- Recovery detected: ✅
- Recovery notification sent: ✅
- Recovery notification received: ✅
Response Time Baseline
-
Baseline established
- Average response time: [X ms]
- 95th percentile: [X ms]
- 99th percentile: [X ms]
- Baseline documented
-
Performance thresholds set (if applicable)
- Warning threshold: [X ms]
- Critical threshold: [Y ms]
- Alerts configured for slow responses
Documentation
Monitor Documentation
-
System profile updated
- Monitoring section updated
- Monitor IDs documented
- Alert channels documented
- Runbook linked
-
Runbook created/updated
- What this monitor checks
- What it means if it fails
- Who to contact
- Troubleshooting steps
- Common causes
- How to resolve
-
Alert response documented
- Expected response time: [X minutes]
- Escalation path: [Defined]
- Recovery procedures: [Documented]
Team Communication
-
Team notified
- Monitor setup communicated
- Alert channels shared
- Runbook link shared
- Response expectations set
-
Training completed (if needed)
- Team trained on alert response
- Runbook reviewed
- Escalation process understood
Multi-Environment Setup (if applicable)
Production
- Production monitor configured
- Monitor name: [Service - Production]
- URL: [https://service.com]
- Alerts to: [Primary channels]
- Tags: [production, high-priority]
Staging
- Staging monitor configured
- Monitor name: [Service - Staging]
- URL: [https://staging.service.com]
- Alerts to: [Dev channel]
- Tags: [staging, medium-priority]
Development
- Development monitor configured (optional)
- Monitor name: [Service - Development]
- URL: [https://dev.service.com]
- Alerts to: [Dev channel]
- Tags: [development, low-priority]
Advanced Features
Dependency Monitoring
-
Dependencies identified
- Database: [Monitor name]
- Cache (Redis): [Monitor name]
- External API: [Monitor name]
- CDN: [Monitor name]
-
Dependency monitors created
- Each dependency has monitor
- Dependency relationships documented
- Alert impact understood
Certificate Monitoring
- SSL certificate monitoring enabled
- Certificate expiry tracked
- Alert before expiry: [14 days]
- Renewal process documented
- Certificate owner identified
Custom Scripts (if applicable)
- Custom monitoring scripts
- Script purpose: [Description]
- Script location: [Path]
- Script schedule: [Frequency]
- Script output monitored
Maintenance & Review
Regular Checks
-
Weekly checks scheduled
- Monitor status reviewed
- Alert history reviewed
- False positives investigated
- Response times checked
-
Monthly review scheduled
- Uptime percentage: [Target: 99.9%]
- Average response time: [Target: < X ms]
- Alert count: [Count]
- Thresholds still appropriate: ✅/❌
Continuous Improvement
-
Monitor optimization
- Interval appropriate for service
- Alert fatigue prevented
- Thresholds tuned based on data
- False positives minimized
-
Documentation maintained
- Runbook kept current
- Contact information current
- Troubleshooting steps updated
- Lessons learned added
Monitor Information Reference
Monitor Details
| Field | Value |
|---|---|
| Monitor ID | [ID in Uptime Kuma] |
| Monitor Name | [Full name] |
| URL | [Full URL] |
| Type | [HTTP(s), TCP, etc.] |
| Interval | [60 seconds] |
| Created | [YYYY-MM-DD] |
| Created By | [Name] |
Alert Channels
| Channel | Destination | Purpose | Active |
|---|---|---|---|
| Slack | [#channel] | Primary alerts | ✅ |
| [email] | Secondary | ✅ | |
| SMS | [phone] | Critical only | ✅ |
Team & Ownership
| Role | Name | Contact | Responsibilities |
|---|---|---|---|
| Service Owner | [Name] | [Contact] | Overall ownership |
| Technical Lead | [Name] | [Contact] | Technical issues |
| On-Call Primary | [Name] | [Contact] | First responder |
| On-Call Secondary | [Name] | [Contact] | Escalation |
Checklist Status
Setup Progress
- Monitor configured in Uptime Kuma
- Alerts configured and tested
- Documentation created/updated
- Team notified and trained
- Baseline established
- Thresholds set
- Runbook created/linked
- Regular review scheduled
Setup Status: [In Progress / Complete]
Completion Date: [YYYY-MM-DD]
Verified By: [Name]
Related Documentation
- [System Profile: [Service Name]]
- [Runbook: [Service Name] Downtime]
- [SOP: Uptime Kuma Administration]
- [Incident Response Procedures]
Notes
[Any additional notes, observations, or special considerations for this monitor]
Checklist Version: 1.0 Last Updated: [YYYY-MM-DD] Next Review: [YYYY-MM-DD]