netdata
diff --git a/‎docs/category-overview-pages/machine-learning-and-assisted-troubleshooting.md
Lines changed: 45 additions & 10 deletions b/‎docs/category-overview-pages/machine-learning-and-assisted-troubleshooting.md
Lines changed: 45 additions & 10 deletions
diff --git a/‎docs/metric-correlations.md
Lines changed: 262 additions & 31 deletions b/‎docs/metric-correlations.md
Lines changed: 262 additions & 31 deletions
@@ -1,19 +1,54 @@
-# Machine Learning and Anomaly Detection
+# Netdata AI
 
-Netdata includes advanced Machine Learning capabilities to help you detect and resolve anomalies in your infrastructure before they escalate into critical issues. These features provide real-time insights and proactive monitoring to improve system reliability.
+Boost your monitoring and troubleshooting capabilities with Netdata's AI-powered features.
 
-## Key Features
+Netdata AI helps you **detect anomalies, understand metric relationships, and resolve issues quickly** with intelligent assistance all designed to make your infrastructure management smarter, faster, and bulletproof.
 
-### Anomaly Detection with K-Means Clustering
+## What Can Netdata AI Do For You?
 
-Netdata trains K-means clustering models to detect anomalies in your infrastructure. These models power the [Anomaly Advisor](/docs/dashboards-and-charts/anomaly-advisor-tab.md), which visually highlights anomalies on the dashboard, allowing you to quickly identify and investigate unexpected behavior.
+Netdata AI combines powerful machine learning capabilities with intuitive interfaces to help you:
 
-### Metric Correlations
+1. **Detect anomalies automatically** before they escalate into critical issues
+2. **Understand relationships** between metrics during troubleshooting
+3. **Get expert guidance** when resolving alerts and performance problems
 
-Netdata enables metric correlation analysis through the dashboard. This feature uses the [Two-sample Kolmogorov-Smirnov test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test) and volume heuristic measures to help you understand relationships between different metrics and identify potential causes of anomalies.
+## Machine Learning and Anomaly Detection
 
-### Netdata Assistant for Troubleshooting
+Our ML-powered anomaly detection works silently in the background, monitoring your metrics and identifying unusual patterns.
 
-The [Netdata Assistant](/docs/netdata-assistant.md) provides AI-driven assistance for troubleshooting alerts and anomalies. You can interact with it directly to get explanations, recommendations, and next steps based on detected anomalies and system behavior.
+| Feature                          | What It Does For You                                                         |
+|----------------------------------|------------------------------------------------------------------------------|
+| **Unsupervised Learning**        | Works automatically without requiring manual training or labeling of data    |
+| **Multiple Model Consensus**     | Reduces false positives by 99% by requiring agreement across multiple models |
+| **Real-time Anomaly Bits**       | Flags unusual metrics instantly, with zero storage overhead                  |
+| **Anomaly Rate Visualization**   | Highlights anomalous time periods in your dashboard for quick investigation  |
+| **Node-Level Anomaly Detection** | Identifies when your entire system is behaving unusually                     |
+| **Metric Correlations**          | Helps you find relationships between metrics to pinpoint root causes         |
 
-These Machine Learning features enhance observability and streamline incident response, helping you maintain system health with greater efficiency.
+Learn more in the [Machine Learning and Anomaly Detection](/src/ml/README.md) documentation.
+
+## Netdata Assistant
+
+When alerts trigger or anomalies emerge, Netdata Assistant serves as your AI-powered troubleshooting companion.
+
+| Feature                    | What It Does For You                                                  |
+|----------------------------|-----------------------------------------------------------------------|
+| **Alert Context**          | Explains what each alert means and why you should care about it       |
+| **Guided Troubleshooting** | Offers step-by-step instructions tailored to your specific situation  |
+| **Persistent Window**      | Follows you throughout your dashboards as you investigate issues      |
+| **Curated Resources**      | Provides links to relevant documentation to deepen your understanding |
+| **Time-Saving**            | Eliminates the need for searching documentation or online forums      |
+
+Learn more about [Netdata Assistant](/docs/netdata-assistant.md) and how it helps streamline your troubleshooting workflow.
+
+## Getting Started
+
+Netdata AI features are enabled by default with the standard installation. The machine learning capabilities require the `dbengine` database mode, which is the default setting.
+
+To start exploring:
+
+1. **Anomaly Detection**: Check the [Anomaly Advisor tab](/docs/dashboards-and-charts/anomaly-advisor-tab.md) to see detected anomalies
+2. **Metric Correlations**: Use the Metric Correlations button in the dashboard to analyze relationships between metrics
+3. **Netdata Assistant**: Click the Assistant button in the Alerts tab when troubleshooting alerts
+
+These AI features work seamlessly with Netdata's other capabilities, enhancing your overall monitoring and troubleshooting experience without requiring any AI expertise.
@@ -1,66 +1,297 @@
 # Metric Correlations
 
-The Metric Correlations feature helps you quickly identify metrics and charts relevant to a specific time window of interest, allowing for faster root cause analysis.
+The **Metric Correlations** feature helps you quickly identify metrics and charts relevant to a specific time window of interest, allowing for faster root cause analysis.
 
-By filtering the standard Netdata dashboard to display only the most relevant charts, Metric Correlations makes it easier to pinpoint anomalies and investigate issues.
+:::tip
 
-Since it leverages every available metric in your infrastructure with up to 1-second granularity, Metric Correlations provides highly accurate insights.
+By filtering your standard Netdata dashboard to **display only the most relevant charts**, Metric Correlations make it easier for you to pinpoint anomalies and investigate issues.
+
+:::
+
+Since it leverages every available metric in your infrastructure with up to 1-second granularity, **Metric Correlations provides you with highly accurate insights**.
 
 ## Using Metric Correlations
 
 When viewing the [Metrics tab or a single-node dashboard](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md), you'll find the **Metric Correlations** button in the top-right corner.
 
-To start:
+<details>
+<summary><strong>To start:</strong></summary><br/>
 
 1. Click **Metric Correlations**.
-2. [Highlight a selection of metrics](/docs/dashboards-and-charts/netdata-charts.md#highlight) on a single chart. The selected timeframe must be at least 15 seconds.
-3. The menu displays details about the selected area and reference baseline. Metric Correlations compares the highlighted window to a reference baseline, which is four times its length and precedes it immediately.
-4. Click **Find Correlations**. This button is only active if a valid timeframe is selected.
-5. The process evaluates all available metrics and returns a filtered Netdata dashboard showing only the most changed metrics between the baseline and the highlighted window.
+2. Highlight a selection of metrics on a single chart. **The selected timeframe must be at least 15 seconds**.
+3. The menu displays details about your selected area and reference baseline. Metric Correlations compares your highlighted window to a reference baseline, which is four times its length and precedes it immediately.
+4. Click **Find Correlations**.
+
+:::note
+
+This button is only active if you've selected a valid timeframe.
+
+:::
+
+5. **The process evaluates all your available metrics and returns a filtered Netdata dashboard** showing only the most changed metrics between the baseline and your highlighted window.
 6. If needed, select another window and press **Find Correlations** again to refine your analysis.
 
+</details>
+
+## Integration with Anomaly Detection
+
+You can combine Metric Correlations with Anomaly Detection for powerful troubleshooting:
+
+:::tip
+
+When you notice an anomaly in your system, use Metric Correlations with the **Anomaly Rate** data type to quickly identify which metrics are contributing to the anomalous behavior.
+
+:::
+
+### How to Use Together
+
+```mermaid
+flowchart TD
+    %% Node styling
+    classDef neutral fill:#f9f9f9,stroke:#000000,color:#000000,stroke-width:2px
+    classDef success fill:#4caf50,stroke:#000000,color:#000000,stroke-width:2px
+    classDef warning fill:#ffeb3b,stroke:#000000,color:#000000,stroke-width:2px
+    classDef danger fill:#f44336,stroke:#000000,color:#000000,stroke-width:2px
+    
+    A[Spot a spike in the<br/>node anomaly rate chart] --> B[Highlight that<br/>time period]
+    B --> C[Select Anomaly Rate<br/>as data type<br/>and Volume as method]
+    C --> D[Click Find Correlations]
+    D --> E[Review metrics with<br/>highest anomaly rates]
+    E --> F[Examine these metrics<br/>in detail to determine<br/>root cause]
+    
+    %% Apply styles
+    class A,B neutral
+    class C,D warning
+    class E success
+    class F danger
+```
+
+:::tip
+
+**This workflow helps you move from detecting** that *"something is wrong"* **to understanding** exactly which components are behaving abnormally, significantly reducing your troubleshooting time.
+
+:::
+
+## API Access
+
+You can access anomaly detection data and use it with metric correlations through Netdata's API:
+
+<details>
+<summary><strong>Querying Anomaly Bits</strong></summary><br/>
+
+To get the anomaly bits for any metric, add the `options=anomaly-bit` parameter to your API query:
+
+```
+https://your-netdata-node/api/v1/data?chart=system.cpu&dimensions=user&after=-60&options=anomaly-bit
+```
+
+Sample response:
+
+```json
+{
+  "labels": [
+    "time",
+    "user"
+  ],
+  "data": [
+    [
+      1684852570,
+      0
+    ],
+    [
+      1684852569,
+      0
+    ],
+    [
+      1684852568,
+      0
+    ],
+    [
+      1684852567,
+      0
+    ],
+    [
+      1684852566,
+      0
+    ],
+    [
+      1684852565,
+      0
+    ],
+    [
+      1684852564,
+      0
+    ],
+    [
+      1684852563,
+      0
+    ],
+    [
+      1684852562,
+      0
+    ],
+    [
+      1684852561,
+      0
+    ]
+  ]
+}
+```
+
+</details>
+
+<details>
+<summary><strong>Querying Anomaly Rates</strong></summary><br/>
+
+For anomaly rates over a time window, use the same parameter but with aggregated data:
+
+```
+https://your-netdata-node/api/v1/data?chart=system.cpu&dimensions=user&after=-600&before=0&points=10&options=anomaly-bit
+```
+
+Sample response showing the percentage of time each metric was anomalous:
+
+```json
+{
+  "labels": [
+    "time",
+    "user"
+  ],
+  "data": [
+    [
+      1684852770,
+      0
+    ],
+    [
+      1684852710,
+      20
+    ],
+    [
+      1684852650,
+      0
+    ],
+    [
+      1684852590,
+      10
+    ],
+    [
+      1684852530,
+      0
+    ],
+    [
+      1684852470,
+      0
+    ],
+    [
+      1684852410,
+      30
+    ],
+    [
+      1684852350,
+      0
+    ],
+    [
+      1684852290,
+      0
+    ],
+    [
+      1684852230,
+      0
+    ]
+  ]
+}
+```
+
+</details>
+
+:::tip
+
+You can programmatically access this data to build custom dashboards or alerts based on anomaly patterns in your infrastructure.
+
+:::
+
 ## Metric Correlations Options
 
-Metric Correlations offers adjustable parameters for deeper data exploration. Since different data types and incidents require different approaches, these settings allow for flexible analysis.
+Metric Correlations offer adjustable parameters for deeper data exploration. Since different data types and incidents require different approaches, **these settings allow for flexible analysis**.
 
-### Method
+<details>
+<summary><strong>Method</strong></summary><br/>
 
 Two algorithms are available for scoring metrics based on changes between the baseline and highlight windows:
 
-- **`KS2` (Kolmogorov-Smirnov Test)**: A statistical method comparing distributions between the highlighted and baseline windows to detect significant changes. [Implementation details](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L212).
-- **`Volume`**: A heuristic approach based on percentage change in averages, designed to handle edge cases. [Implementation details](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L516).
+* **`KS2` (Kolmogorov-Smirnov Test)**: A statistical method comparing distributions between the highlighted and baseline windows to detect significant changes. [Implementation details](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L212).
+* **`Volume`**: A heuristic approach based on percentage change in averages, designed to handle edge cases. [Implementation details](https://github.com/netdata/netdata/blob/d917f9831c0a1638ef4a56580f321eb6c9a88037/database/metric_correlations.c#L516).
+
+</details>
+
+<details>
+<summary><strong>Aggregation</strong></summary><br/>
 
-### Aggregation
+To accommodate different window lengths, Netdata aggregates your raw data as needed. The default aggregation method is `Average`, but you can also choose `Median`, `Min`, `Max`, or `Stddev`.
+</details>
 
-To accommodate different window lengths, Netdata aggregates raw data as needed. The default aggregation method is `Average`, but you can also choose `Median`, `Min`, `Max`, or `Stddev`.
+<details>
+<summary><strong>Data Type</strong></summary><br/>
 
-### Data Type
+Netdata assigns an [Anomaly Bit](https://github.com/netdata/netdata/tree/master/src/ml#anomaly-bit) to each of your metrics in real-time, flagging whether it deviates significantly from normal behavior. You can analyze either raw data or anomaly rates:
 
-Netdata assigns an [Anomaly Bit](https://github.com/netdata/netdata/tree/master/src/ml#anomaly-bit) to each metric in real-time, flagging whether it deviates significantly from normal behavior. You can analyze either raw data or anomaly rates:
+* **`Metrics`**: Runs Metric Correlations on your raw metric values.
+* **`Anomaly Rate`**: Runs Metric Correlations on anomaly rates for each of your metrics.
 
-- **`Metrics`**: Runs Metric Correlations on raw metric values.
-- **`Anomaly Rate`**: Runs Metric Correlations on anomaly rates for each metric.
+</details>
 
 ## Metric Correlations on the Agent
 
 Metric Correlations (MC) requests to Netdata Cloud are handled in two ways:
 
-1. If MC is enabled on any node, the request is routed to the highest-level node (a Parent node or the node itself).
-2. If MC is not enabled on any node, Netdata Cloud processes the request by collecting data from nodes and computing correlations on its backend.
+1. **If MC is enabled** on any of your nodes, the request is routed to the highest-level node (a Parent node or the node itself).
+2. **If MC is not enabled** on any of your nodes, Netdata Cloud processes the request by collecting data from your nodes and computing correlations on its backend.
+
+## Interpreting Combined Results
+
+When you use Metric Correlations together with Anomaly Detection, you'll want to understand how to interpret the results:
+
+:::tip
+
+**High anomaly rates combined with significant metric changes** often indicate genuine issues rather than false positives.
+
+:::
+
+Here's how to interpret different scenarios:
+
+| Anomaly Rate | Metric Correlation | Interpretation                                       |
+|--------------|--------------------|------------------------------------------------------|
+| High         | Strong             | Likely a significant issue affecting system behavior |
+| High         | Weak               | Possible edge case or intermittent issue             |
+| Low          | Strong             | Normal but significant change in system behavior     |
+| Low          | Weak               | Likely normal system operation                       |
+
+:::tip
+
+By examining both the anomaly rate and the correlation strength, you can prioritize your troubleshooting efforts more effectively.
+
+:::
 
 ## Usage Tips
 
-- When running Metric Correlations from the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) across multiple nodes, refine your results by grouping by node:
-    1. Run MC on all nodes if you're unsure which ones are relevant.
-    2. Group the most interesting charts by node to determine whether changes affect all nodes or just a subset.
-    3. If a subset of nodes stands out, filter for those nodes and rerun MC to get more precise results.
+:::tip
+
+When running Metric Correlations from the [Metrics tab](/docs/dashboards-and-charts/metrics-tab-and-single-node-tabs.md) across multiple nodes, refine your results by grouping by node:
+
+1. Run MC on all your nodes if you're unsure which ones are relevant.
+2. Group the most interesting charts by node to determine whether changes affect all your nodes or just a subset.
+3. If a subset of your nodes stands out, filter for those nodes and rerun MC to get more precise results.
+
+Choose the **`Volume`** algorithm for sparse metrics (e.g., request latency with few requests). Otherwise, use **`KS2`**.
+
+- **`KS2`** is ideal for detecting complex distribution changes in your metrics, such as shifts in variance.
+- **`Volume`** is better for detecting your metrics that were inactive and then spiked (or vice versa).
+
+**Example:**
 
-- Choose the **`Volume`** algorithm for sparse metrics (e.g., request latency with few requests). Otherwise, use **`KS2`**.
-    - **`KS2`** is ideal for detecting complex distribution changes, such as shifts in variance.
-    - **`Volume`** is better for detecting metrics that were inactive and then spiked (or vice versa).
+- `Volume` can highlight network traffic suddenly turning on in your system.
+- `KS2` can detect entropy distribution changes in your data missed by `Volume`.
 
-  **Example:**
-    - `Volume` can highlight network traffic suddenly turning on.
-    - `KS2` can detect entropy distribution changes missed by `Volume`.
+Combine **`Volume`** and **`Anomaly Rate`** to identify the most anomalous metrics within your selected timeframe. Expand the anomaly rate chart to visualize results more clearly.
 
-- Combine **`Volume`** and **`Anomaly Rate`** to identify the most anomalous metrics within a timeframe. Expand the anomaly rate chart to visualize results more clearly.
+:::