Incident Timeline & Analysis: Real-Time MTTR Tracking

Q: What does severity escalation mean?

If a monitor stays down for an extended period, the severity escalates: Info (0-5 min) to Warning (5-15 min) to Error (15-30 min) to Critical (30+ min). This helps you prioritize responses.

Q: Can I resolve an incident manually?

Yes. Manual resolution requires resolution notes, a root cause category, and optional prevention notes. This documentation is stored with the incident for future reference.

Q: Does the timeline update in real time?

Yes. The incident dashboard refreshes every 30 seconds. WebSocket connections push updates so you don't need to manually refresh.

Q: Can agencies use this for client reporting?

Absolutely. MTTR and MTTA metrics show clients how fast you respond. Incident history with root cause documentation proves accountability.

Why incident clarity matters

When monitors go down, you don't just need alerts. You need context: what failed, when it escalated, how long it lasted, and what you can learn from it. Incident timelines give you that clarity.

The Problem

Most monitoring tools give you alerts when things break, but no context. You scramble to understand what's down, how long it's been down, and whether it's getting worse. You waste time hunting for details. You can't track how fast your team responds. You have no record of what caused past incidents.

Alerts without context create chaos. What you need is a timeline that shows the full story.

The Solution

PerkyDash automatically creates incidents when monitors fail. It tracks severity escalation in real time. It gives you a visual timeline showing exactly when the incident started, when it escalated, and when it was resolved. It calculates MTTR and MTTA so you can improve your response over time.

One dashboard. Complete visibility. Zero guesswork. You understand incidents in seconds.

What's in incident timeline & analysis

Four powerful features working together

Automatic Detection

Incidents created instantly when monitors fail. Real-time updates every 30 seconds.

Severity Escalation

Incidents escalate from Info to Critical based on duration. Visual timeline shows progression.

Acknowledge & Resolve

Track team response. Document root causes. Add prevention notes for future reference.

MTTR Analytics

Mean Time to Resolve and Acknowledge. Track operational excellence over time.

Automatic Incident Detection

Incidents appear the moment monitors fail

When a monitor transitions to DOWN or DEGRADED, PerkyDash automatically creates an incident. No manual intervention. The incident dashboard updates live every 30 seconds. You see a counter badge in the navbar showing open incidents. Toast notifications alert you instantly.

When the monitor recovers, the incident is automatically closed unless you've manually overridden it. You can filter incidents by timeframe: Today, Last 7 days, Last 30 days, or All time. Perfect for on-call workflows.

Workflow:

Monitor fails → Incident created → Live updates → Auto-resolved on recovery

Benefits:

Immediate visibility, zero manual work, perfect for fast response

Production API Down

Started 2 minutes ago

Critical

Slow Response Time

Started 8 minutes ago

Warning

Database Connection Restored

Resolved 15 minutes ago

Resolved

Auto-refresh: 30s

Live

Severity Progression Timeline

Info 0-5 minutes

Monitor just went down

Warning 5-15 minutes

Still failing, needs attention

Error 15-30 minutes

Extended outage, impacting users

Critical 30+ minutes

Major outage, immediate action required

Escalates to Critical in 2m 14s

Severity Escalation System

Watch incidents escalate in real time

If a monitor continues to fail, severity escalates automatically: Info (blue) → Warning (yellow) → Error (orange) → Critical (red). The escalation timeline is visualized in its own component showing timestamps of severity changes, animated badges for Critical incidents, and a countdown to the next escalation.

This helps builders understand how fast a problem is evolving. A 2-minute outage at Info severity is different from a 35-minute outage at Critical. The visual progression helps teams prioritize which incident needs attention first.

Clear differentiation:

Mild degradation vs critical outages instantly visible

Prioritization:

Know which fire to fight first based on severity and duration

Acknowledge & Resolve Workflows

Track response and document root causes

When an incident appears, you can acknowledge it to show your team you're working on it. Add optional notes. The acknowledgment is visible in the detail panel, useful for team coordination and tracking Mean Time to Acknowledge (MTTA).

When you resolve an incident, PerkyDash requires resolution notes (mandatory), a root cause category dropdown, and optional prevention notes. This documentation improves MTTR accuracy, team accountability, and creates a historical record for future analysis.

Acknowledge:

Mark as "being worked on" with notes, visible to team

Resolve:

Required notes, root cause category, optional prevention steps

Acknowledge Incident

Notes (optional)

Investigating database connection pool exhaustion...

Resolve Incident

Resolution Notes (required)

Increased connection pool size from 10 to 25. Traffic spike caused exhaustion.

Root Cause

Configuration Issue

Prevention (optional)

Set up auto-scaling for connection pool based on load.

MTTR

14m 32s

-18% vs last week

MTTA

3m 47s

-22% vs last week

Open Incidents

2 Warning, 1 Critical

Resolved (7d)

+12%

Measure operational excellence

MTTR (Mean Time to Resolve) and MTTA (Mean Time to Acknowledge) are displayed in a dedicated stats card component. You see open incidents, resolved incidents, trend indicators compared to the previous period, and the top 5 most problematic monitors.

MTTR matters because it helps measure operational excellence. It reduces blind spots about recurring failures. For agencies managing multiple clients, MTTR data shows you're responsive and reliable. Track improvements week-over-week. Identify monitors that fail often and fix them permanently.

MTTR & MTTA tracking:

See how fast your team responds and resolves incidents

Trend visibility:

Week-over-week comparisons show improvement over time

User stories: who this helps

Different builders have different needs. Incident timelines help all of them.

As a maker

I want to see incidents in real time so I can fix issues before users notice. Automatic detection means I don't miss outages when I'm building.

As an agency

I want MTTR metrics so I can report reliability to clients. Showing we resolve incidents in under 15 minutes proves we're responsive.

As a SaaS founder

I want severity escalation so I can prioritize which outage needs attention first. A Critical incident at 35 minutes is more urgent than a Warning at 6 minutes.

As an on-call engineer

I want acknowledge workflows so my team knows I'm already working on it. Adding notes keeps everyone on the same page without Slack chaos.

As a small team lead

I want root cause documentation so we stop repeating the same mistakes. Prevention notes from past incidents help us build more reliable systems.

As a freelance developer

I want visual timelines so I can explain to clients exactly what happened and how long it took to fix. Clear incident history builds trust.

Frequently asked questions

Everything you need to know about incident timelines

How does PerkyDash detect incidents automatically?

Whenever a monitor transitions to DOWN or DEGRADED, PerkyDash automatically creates an incident. The system checks monitor state every time it runs a check. If the monitor recovers and goes back to UP, the incident is automatically closed. You don't need to manually create or close incidents unless you want to override the default behavior.

What does severity escalation mean?

If a monitor stays down for an extended period, the incident severity escalates automatically. It goes from Info (0-5 minutes) to Warning (5-15 minutes) to Error (15-30 minutes) to Critical (30+ minutes). This helps you prioritize: a 2-minute blip is less urgent than a 40-minute outage. The timeline component shows the exact escalation progression with timestamps.

How is MTTR calculated?

MTTR (Mean Time to Resolve) is the average time between when an incident is created and when it's marked as resolved. PerkyDash calculates this across all resolved incidents in the selected time range (7 days, 30 days, etc.). MTTA (Mean Time to Acknowledge) is the average time between incident creation and when someone acknowledges it. Both metrics help you measure operational response speed.

Can I resolve an incident manually?

Yes. When you click "Resolve" on an incident, a modal appears requiring resolution notes (mandatory), a root cause category from a dropdown, and optional prevention notes. This documentation is stored with the incident for future reference. Manual resolution is useful when you fix the issue but want to document the root cause, or when you want to override the auto-resolution behavior.

Does the timeline update in real time?

Yes. The incident dashboard refreshes every 30 seconds automatically. New incidents appear, severity changes update, and resolved incidents move to the resolved list. You see a live indicator in the UI. WebSocket connections push updates so you don't need to manually refresh. Toast notifications alert you when new incidents are created.

Can agencies use this for client reporting?

Absolutely. The MTTR and MTTA metrics are perfect for showing clients how fast you respond to issues. You can show week-over-week improvements in response time. The incident history with root cause documentation proves accountability. The "Top 5 Problematic Monitors" report helps identify recurring issues you've fixed. Incident data builds client trust.

What data does the incident detail panel show?

The detail panel shows the incident's current severity, the full escalation timeline (Info → Warning → Error → Critical with timestamps), error details from the monitor check, logs of all state changes, acknowledgment notes if acknowledged, resolution notes if resolved, and action buttons (Acknowledge, Resolve). It's a complete view of what happened and when.

Understand every outage. Fix faster. Learn from every incident.