An open-source Kubernetes RCA Operator correlates Kubernetes events, OpenTelemetry traces, and logs into durable
IncidentReport
records, enriches them with service topology, and serves a built-in dashboard for incident review.
Phase 2 connects Kubernetes-native signals with OpenTelemetry traces and logs, then persists the correlated evidence in IncidentReport CRs.
RCACorrelationRule supports event types, attribute predicates, trace matching, priorities, and templated incident summaries.IncidentReport now includes trace IDs, fired rule metadata, severity labels, evidence counts, and serialized graph data.Phase 2 expands RCA Operator with OpenTelemetry resources, trace-aware correlation, topology graphs, dashboard trace detail, and production-ready Helm profiles.
A controller-runtime operator with Kubernetes collectors, OTLP ingest, correlation rules, Jaeger trace enrichment, CR-backed incident state, and a dashboard for evidence review.
A phased roadmap that keeps Kubernetes CRDs as the source of truth while adding telemetry-driven incident context.
Use Helm for the current chart and Phase 2 values profiles. Minimal installs run the control plane; full installs include OpenTelemetry and Jaeger components.
RCA Operator is MIT licensed and actively developed. Every contribution — bug reports, docs, playbooks, or code — moves us closer to production-ready.
make test, and open a Pull Request.RCA Operator is free, open source, and designed for platform teams that want trace-aware RCA without leaving the Kubernetes control plane.