HolmesGPT is an AI agent for investigating problems in your cloud, finding the root cause, and suggesting remediations. It has dozens of built-in integrations for cloud providers, observability tools, and on-call systems.
HolmesGPT has been submitted to the CNCF as a sandbox project (view status). You can learn more about HolmesGPT's maintainers and adopters here.
How it Works |
Installation |
LLM Providers |
YouTube Demo |
HolmesGPT connects AI models with live observability data and organizational knowledge. It uses an agentic loop to analyze data from multiple sources and identify possible root causes.

HolmesGPT integrates with popular observability and cloud platforms. The following data sources ("toolsets") are built-in. Add your own.
Data Source | Status | Notes |
---|---|---|
![]() |
โ | Get status, history and manifests and more of apps, projects and clusters |
![]() |
โ | Fetch events, instances, slow query logs and more |
![]() |
โ | Private runbooks and documentation |
![]() |
โ | Retrieve logs for any resource |
![]() |
โ | Date and time-related operations |
![]() |
โ | Get images, logs, events, history and more |
![]() |
๐ก Beta | Remediate alerts by opening pull requests with fixes |
![]() |
๐ก Beta | Fetches log data from datadog |
![]() |
โ | Query logs for Kubernetes resources or any query |
![]() |
โ | Fetch trace info, debug issues like high latency in application. |
![]() |
โ | Release status, chart metadata, and values |
![]() |
โ | Public runbooks, community docs etc |
![]() |
โ | Fetch metadata, list consumers and topics or find lagging consumer groups |
![]() |
โ | Pod logs, K8s events, and resource status (kubectl describe) |
![]() |
๐ก Beta | Investigate alerts, query tracing data |
![]() |
โ | Query health, shard, and settings related info of one or more clusters |
![]() |
โ | Investigate alerts, query metrics and generate PromQL queries |
![]() |
โ | Info about partitions, memory/disk alerts to troubleshoot split-brain scenarios and more |
![]() |
โ | Multi-cluster monitoring, historical change data, user-configured runbooks, PromQL graphs and more |
![]() |
โ | Team knowledge base and runbooks on demand |
HolmesGPT can fetch alerts/tickets to investigate from external systems, then write the analysis back to the source or Slack.
Integration | Status | Notes |
---|---|---|
Slack | ๐ก Beta | Demo. Tag HolmesGPT bot in any Slack message |
Prometheus/AlertManager | โ | Robusta SaaS or HolmesGPT CLI |
PagerDuty | โ | HolmesGPT CLI only |
OpsGenie | โ | HolmesGPT CLI only |
Jira | โ | HolmesGPT CLI only |
GitHub | โ | HolmesGPT CLI only |

Read the installation documentation to learn how to install HolmesGPT.

Read the LLM Providers documentation to learn how to set up your LLM API key.
- In the Robusta SaaS: Go to platform.robusta.dev and use Holmes from your browser
- With HolmesGPT CLI: setup an LLM API key and ask Holmes a question ๐
holmes ask "what pods are unhealthy and why?"
You can also provide files as context:
holmes ask "summarize the key points in this document" -f ./mydocument.txt
You can also load the prompt from a file using the --prompt-file
option:
holmes ask --prompt-file ~/long-prompt.txt
Enter interactive mode to ask follow-up questions:
```bash
holmes ask "what pods are unhealthy and why?" --interactive
# or
holmes ask "what pods are unhealthy and why?" -i
Also supported:
HolmesGPT CLI: investigate Prometheus alerts
Pull alerts from AlertManager and investigate them with HolmesGPT:
holmes investigate alertmanager --alertmanager-url http://localhost:9093
# if on Mac OS and using the Holmes Docker image๐
# holmes investigate alertmanager --alertmanager-url http://docker.for.mac.localhost:9093
To investigate alerts in your browser, sign up for a free trial of Robusta SaaS.
Optional: port-forward to AlertManager before running the command mentioned above (if running Prometheus inside Kubernetes)
kubectl port-forward alertmanager-robusta-kube-prometheus-st-alertmanager-0 9093:9093 &
HolmesGPT CLI: investigate PagerDuty and OpsGenie alerts
holmes investigate opsgenie --opsgenie-api-key <OPSGENIE_API_KEY>
holmes investigate pagerduty --pagerduty-api-key <PAGERDUTY_API_KEY>
# to write the analysis back to the incident as a comment
holmes investigate pagerduty --pagerduty-api-key <PAGERDUTY_API_KEY> --update
For more details, run holmes investigate <source> --help
HolmesGPT can investigate many issues out of the box, with no customization or training. Optionally, you can extend Holmes to improve results:
Custom Data Sources: Add data sources (toolsets) to improve investigations
- If using Robusta SaaS: See here
- If using the CLI: Use
-t
flag with custom toolset files or add to~/.holmes/config.yaml
Custom Runbooks: Give HolmesGPT instructions for known alerts:
- If using Robusta SaaS: Use the Robusta UI to add runbooks
- If using the CLI: Use
-r
flag with custom runbook files or add to~/.holmes/config.yaml
You can save common settings and API Keys in a config file to avoid passing them from the CLI each time:
Reading settings from a config file
You can save common settings and API keys in config file for re-use. Place the config file in ~/.holmes/config.yaml`
or pass it using the --config
You can view an example config file with all available settings here.
By design, HolmesGPT has read-only access and respects RBAC permissions. It is safe to run in production environments.
We do not train HolmesGPT on your data. Data sent to Robusta SaaS is private to your account.
For extra privacy, bring an API key for your own AI model.
Because HolmesGPT relies on LLMs, it relies on a suite of pytest based evaluations to ensure the prompt and HolmesGPT's default set of tools work as expected with LLMs.
- Introduction to HolmesGPT's evals.
- Write your own evals.
- Use Braintrust to view analyze results (optional).
Distributed under the MIT License. See LICENSE.txt for more information.
If you have any questions, feel free to message us on robustacommunity.slack.com
Please read our CONTRIBUTING.md for guidelines and instructions.
For help, contact us on Slack or ask DeepWiki AI your questions.