Project Overview

Autonomous Incident Commander

A multi-agent SRE command center that detects incidents, builds causal graphs from telemetry, executes guarded runbooks, and drafts postmortems automatically. Reduced mean time to resolution by 63% in staging chaos drills while keeping human approval in the loop for every high-risk action.

PythonLangChainPrometheusKubernetesGrafana
PrivateBack to Projects
GALLERY
Guarded runbook execution timeline
Live incident detection dashboard

Live incident detection dashboard

Causal graph built from telemetry