Skip to content

Fix local storage pool disconnect issue #11200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 28, 2025

Conversation

harikrishna-patnala
Copy link
Contributor

@harikrishna-patnala harikrishna-patnala commented Jul 15, 2025

Description

This PR fixes the issue #11104 which is a regression from #10381. Prior to the PR #10381 local storage connection management is skipped in StoragePoolMonitor. Here in this PR, added the same workflow.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

How Has This Been Tested?

  1. Had an environment with a KVM host and local storage
  2. Tried restarting the KVM agent
  3. Before this fix, local storage is disconnected on host, so was unable to use the local storage even though it shows in UI. Now with the fix, storage connection remains the same and I'm able to use the storage for volume provisioning.

How did you try to break this feature and the system with this change?

@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@harikrishna-patnala harikrishna-patnala linked an issue Jul 15, 2025 that may be closed by this pull request
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.99%. Comparing base (1a9efe8) to head (0c5100d).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...com/cloud/storage/listener/StoragePoolMonitor.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11200      +/-   ##
============================================
- Coverage     17.00%   16.99%   -0.01%     
+ Complexity    14719    14715       -4     
============================================
  Files          5832     5832              
  Lines        517561   517562       +1     
  Branches      62982    62983       +1     
============================================
- Hits          87986    87976      -10     
- Misses       419641   419652      +11     
  Partials       9934     9934              
Flag Coverage Δ
uitests 3.82% <ø> (ø)
unittests 17.95% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14191

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, clgtm.

@blueorangutan
Copy link

[SF] Trillian test result (tid-13863)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 55974 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11200-t13863-kvm-ol8.zip
Smoke tests completed. 142 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@harikrishna-patnala harikrishna-patnala force-pushed the localStorageDisconnectIssue branch from 26dcad9 to 0c5100d Compare July 28, 2025 10:32
@harikrishna-patnala
Copy link
Contributor Author

@blueorangutan package

1 similar comment
@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 14418

@sureshanaparti
Copy link
Contributor

Verified the fix with the below steps.

  • Enabled local storage in the zone (restart cloudstack-management).
  • Create a compute offering for local storage.
  • Deploy an instance on the local storage (using the above offering).
  • Restart the agent in KVM host, Stop and Start the instance deployed above.

Before FIX => Local pools removed after agent restart

MS:

2025-07-28 11:03:31,829 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-5:[ctx-8e591dc2]) (logid:b972c864) Removing pool StoragePool {"id":3,"name":"pr10149-t13934-kvm-ol8-kvm1-local-b4c53d3b","poolType":"Filesystem","uuid":"b4c53d3b-1006-4b7f-9e14-fcdb28e37cfc"} from host Host {"id":1,"name":"pr10149-t13934-kvm-ol8-kvm1","type":"Routing","uuid":"80d82041-269f-4838-b98f-1e4a7d4ec9a5"}
2025-07-28 11:03:31,832 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-5:[ctx-8e591dc2]) (logid:b972c864) Wait time setting on com.cloud.agent.api.DeleteStoragePoolCommand is 1800 seconds
...
2025-07-28 11:03:35,003 DEBUG [c.c.s.StorageManagerImpl] (AgentConnectTaskPool-6:[ctx-3f61b71e]) (logid:161e6c57) Removing pool StoragePool {"id":4,"name":"pr10149-t13934-kvm-ol8-kvm2-local-cc67e53b","poolType":"Filesystem","uuid":"cc67e53b-d383-47c4-8a6a-e95992f477fb"} from host Host {"id":2,"name":"pr10149-t13934-kvm-ol8-kvm2","type":"Routing","uuid":"9a3d8e48-777a-4ba8-8b3c-4036ffe6e9ea"}
2025-07-28 11:03:35,006 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-6:[ctx-3f61b71e]) (logid:161e6c57) Wait time setting on com.cloud.agent.api.DeleteStoragePoolCommand is 1800 seconds
mysql> SELECT uuid, name, pool_type, scope, storage_provider_name, host_address, path, status FROM cloud.storage_pool WHERE removed IS NULL;
+--------------------------------------+--------------------------------------------+-------------------+---------+-----------------------+--------------+---------------------------------------------------------------------+--------+
| uuid                                 | name                                       | pool_type         | scope   | storage_provider_name | host_address | path                                                                | status |
+--------------------------------------+--------------------------------------------+-------------------+---------+-----------------------+--------------+---------------------------------------------------------------------+--------+
| 89d82eb8-a6f3-39d5-8316-b0c96974fe09 | pr10149-t13934-kvm-ol8-kvm-pri1            | NetworkFilesystem | CLUSTER | DefaultPrimary        | 10.0.32.4    | /acs/primary/pr10149-t13934-kvm-ol8/pr10149-t13934-kvm-ol8-kvm-pri1 | Up     |
| 5bd12560-426a-30cf-b5c8-dce10adf660f | pr10149-t13934-kvm-ol8-kvm-pri2            | NetworkFilesystem | CLUSTER | DefaultPrimary        | 10.0.32.4    | /acs/primary/pr10149-t13934-kvm-ol8/pr10149-t13934-kvm-ol8-kvm-pri2 | Up     |
| b4c53d3b-1006-4b7f-9e14-fcdb28e37cfc | pr10149-t13934-kvm-ol8-kvm1-local-b4c53d3b | Filesystem        | HOST    | DefaultPrimary        | 10.0.34.122  | /var/lib/libvirt/images                                             | Up     |
| cc67e53b-d383-47c4-8a6a-e95992f477fb | pr10149-t13934-kvm-ol8-kvm2-local-cc67e53b | Filesystem        | HOST    | DefaultPrimary        | 10.0.35.31   | /var/lib/libvirt/images                                             | Up     |
+--------------------------------------+--------------------------------------------+-------------------+---------+-----------------------+--------------+---------------------------------------------------------------------+--------+
4 rows in set (0.00 sec)

KVM Host 1:

2025-07-28 11:03:31,881 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:b972c864) Processing command: com.cloud.agent.api.DeleteStoragePoolCommand
2025-07-28 11:03:31,881 INFO  [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-2:[]) (logid:b972c864) Attempting to remove storage pool b4c53d3b-1006-4b7f-9e14-fcdb28e37cfc from libvirt
Before Agent Restart:
[root@pr10149-t13934-kvm-ol8-kvm1 ~]# virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 5bd12560-426a-30cf-b5c8-dce10adf660f   active   no
 89d82eb8-a6f3-39d5-8316-b0c96974fe09   active   no
 b4c53d3b-1006-4b7f-9e14-fcdb28e37cfc   active   no
 
 After Agent Restart:
 [root@pr10149-t13934-kvm-ol8-kvm1 ~]# virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 5bd12560-426a-30cf-b5c8-dce10adf660f   active   no
 89d82eb8-a6f3-39d5-8316-b0c96974fe09   active   no

KVM Host 2:

2025-07-28 11:03:35,056 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-2:[]) (logid:161e6c57) Processing command: com.cloud.agent.api.DeleteStoragePoolCommand
2025-07-28 11:03:35,057 INFO  [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-2:[]) (logid:161e6c57) Attempting to remove storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb from libvirt
...
2025-07-28 11:03:43,991 DEBUG [cloud.agent.Agent] (AgentRequest-Handler-5:[]) (logid:4e7c780c) Processing command: com.cloud.agent.api.GetStorageStatsCommand
2025-07-28 11:03:43,991 INFO  [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-5:[]) (logid:4e7c780c) Trying to fetch storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb from libvirt
2025-07-28 11:03:43,991 DEBUG [kvm.resource.LibvirtConnection] (AgentRequest-Handler-5:[]) (logid:4e7c780c) Looking for libvirtd connection at: qemu:///system
2025-07-28 11:03:43,993 DEBUG [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-5:[]) (logid:4e7c780c) Could not find storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb in libvirt
...
2025-07-28 11:07:52,327 INFO  [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Trying to fetch storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb from libvirt
2025-07-28 11:07:52,327 DEBUG [kvm.resource.LibvirtConnection] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Looking for libvirtd connection at: qemu:///system
2025-07-28 11:07:52,328 DEBUG [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Could not find storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb in libvirt
2025-07-28 11:07:52,331 DEBUG [kvm.storage.KVMStoragePoolManager] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Failed to find volume:8ae0dc00-3299-4e65-967a-f0e6fe97fa32 due to com.cloud.utils.exception.CloudRuntimeException: Could not fetch storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb from libvirt due to org.libvirt.LibvirtException: Storage pool not found: no storage pool with matching uuid 'cc67e53b-d383-47c4-8a6a-e95992f477fb', retry:0
...
2025-07-28 11:12:16,505 INFO  [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Trying to fetch storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb from libvirt
2025-07-28 11:12:16,505 DEBUG [kvm.resource.LibvirtConnection] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Looking for libvirtd connection at: qemu:///system
2025-07-28 11:12:16,506 DEBUG [kvm.storage.LibvirtStorageAdaptor] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Could not find storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb in libvirt
2025-07-28 11:12:16,507 DEBUG [kvm.storage.KVMStoragePoolManager] (AgentRequest-Handler-1:[]) (logid:6ad357e8) Failed to find volume:8ae0dc00-3299-4e65-967a-f0e6fe97fa32 due to com.cloud.utils.exception.CloudRuntimeException: Could not fetch storage pool cc67e53b-d383-47c4-8a6a-e95992f477fb from libvirt due to org.libvirt.LibvirtException: Storage pool not found: no storage pool with matching uuid 'cc67e53b-d383-47c4-8a6a-e95992f477fb', retry:88
Before Agent Restart:
[root@pr10149-t13934-kvm-ol8-kvm2 ~]# virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 5bd12560-426a-30cf-b5c8-dce10adf660f   active   no
 89d82eb8-a6f3-39d5-8316-b0c96974fe09   active   no
 cc67e53b-d383-47c4-8a6a-e95992f477fb   active   no
 
 After Agent Restart:
[root@pr10149-t13934-kvm-ol8-kvm2 ~]# virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 5bd12560-426a-30cf-b5c8-dce10adf660f   active   no
 89d82eb8-a6f3-39d5-8316-b0c96974fe09   active   no

After FIX => Local storage pools are not removed during agent restart

MS:

mysql> SELECT uuid, name, pool_type, scope, storage_provider_name, host_address, path, status FROM cloud.storage_pool WHERE removed IS NULL;
+--------------------------------------+--------------------------------------------+-------------------+---------+-----------------------+--------------+---------------------------------------------------------------------+--------+
| uuid                                 | name                                       | pool_type         | scope   | storage_provider_name | host_address | path                                                                | status |
+--------------------------------------+--------------------------------------------+-------------------+---------+-----------------------+--------------+---------------------------------------------------------------------+--------+
| 2aa7a529-b09e-305d-b345-c4bf3c561842 | pr11200-t13938-kvm-ol8-kvm-pri1            | NetworkFilesystem | CLUSTER | DefaultPrimary        | 10.0.32.4    | /acs/primary/pr11200-t13938-kvm-ol8/pr11200-t13938-kvm-ol8-kvm-pri1 | Up     |
| 618a1bab-95a5-348b-9133-c355693e01fd | pr11200-t13938-kvm-ol8-kvm-pri2            | NetworkFilesystem | CLUSTER | DefaultPrimary        | 10.0.32.4    | /acs/primary/pr11200-t13938-kvm-ol8/pr11200-t13938-kvm-ol8-kvm-pri2 | Up     |
| 37044676-7f87-4b35-b83f-8af70e8748c6 | pr11200-t13938-kvm-ol8-kvm2-local-37044676 | Filesystem        | HOST    | DefaultPrimary        | 10.0.34.180  | /var/lib/libvirt/images                                             | Up     |
| 11aea44c-8240-4067-9151-84861ce00d3d | pr11200-t13938-kvm-ol8-kvm1-local-11aea44c | Filesystem        | HOST    | DefaultPrimary        | 10.0.32.223  | /var/lib/libvirt/images                                             | Up     |
+--------------------------------------+--------------------------------------------+-------------------+---------+-----------------------+--------------+---------------------------------------------------------------------+--------+
4 rows in set (0.00 sec)

KVM Host 1:

Before Agent Restart:
[root@pr11200-t13938-kvm-ol8-kvm1 ~]#  virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 11aea44c-8240-4067-9151-84861ce00d3d   active   no
 2aa7a529-b09e-305d-b345-c4bf3c561842   active   no
 618a1bab-95a5-348b-9133-c355693e01fd   active   no

 After Agent Restart:
 [root@pr11200-t13938-kvm-ol8-kvm1 ~]#  virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 11aea44c-8240-4067-9151-84861ce00d3d   active   no
 2aa7a529-b09e-305d-b345-c4bf3c561842   active   no
 618a1bab-95a5-348b-9133-c355693e01fd   active   no

KVM Host 2:

Before Agent Restart:
[root@pr11200-t13938-kvm-ol8-kvm2 ~]# virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 2aa7a529-b09e-305d-b345-c4bf3c561842   active   no
 37044676-7f87-4b35-b83f-8af70e8748c6   active   no
 618a1bab-95a5-348b-9133-c355693e01fd   active   no

After Agent Restart:
[root@pr11200-t13938-kvm-ol8-kvm2 ~]# virsh pool-list
 Name                                   State    Autostart
------------------------------------------------------------
 2aa7a529-b09e-305d-b345-c4bf3c561842   active   no
 37044676-7f87-4b35-b83f-8af70e8748c6   active   no
 618a1bab-95a5-348b-9133-c355693e01fd   active   no

@sureshanaparti sureshanaparti merged commit 9fee6da into apache:main Jul 28, 2025
23 of 26 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Apache CloudStack 4.21.0 Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

KVM local storage pool gets removed from hosts
5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy