Skip to content

Fix restore from NAS backup when datadisk is older than the root disk. #11258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 23, 2025

Conversation

abh1sar
Copy link
Collaborator

@abh1sar abh1sar commented Jul 22, 2025

Description

This PR fixes #11257.

  • Libvirt backs up the volumes in the order of their deviceId.
  • However, during restore, volumes are processed based on their id in the volumes table.
  • If data volume has a lower id in the volumes table than the root volume, restore mistakenly assumes the data disk to be the root disk.
  • Restore fails silently, without any error

Another minor issue exists with Volume attach and detach operations on Instances that have an associated backup offering:

  • Attach and Detach operations use different logic when handling Instances with backup offerings.
  • Detach fails if the Instance was associated with a backup offering but has no backups yet.
  • In contrast, Attach succeeds under the same conditions.

Changes done

  1. Sorting the volumes list by deviceId before restore and also before saving the backedVolumes in the backup table.
  2. Return proper error from LibvirtRestorebackupCommand in case of any error instead of failing silently.
  3. Using consistent logic in Attach and Detach Volume.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

  1. Verified with the test case mentioned in the Issue. Restore passed with the correct data.
  2. Repeated the test case mentioned in the Issue with just the fix for returning proper error from restore. Restore failed with an error.
  3. Tested backup and restore with stopped and running Instances
  4. Tested attach and detach volume on Instance having backup offering but no backups.
  5. Tested attach and detach volume on Instance having backups.

How did you try to break this feature and the system with this change?

Copy link

codecov bot commented Jul 22, 2025

Codecov Report

Attention: Patch coverage is 12.82051% with 34 lines in your changes missing coverage. Please review.

Project coverage is 16.15%. Comparing base (ba0204f) to head (d013c2f).
Report is 26 commits behind head on 4.20.

Files with missing lines Patch % Lines
...ce/wrapper/LibvirtRestoreBackupCommandWrapper.java 0.00% 28 Missing ⚠️
...rg/apache/cloudstack/backup/NASBackupProvider.java 0.00% 5 Missing ⚠️
...rg/apache/cloudstack/backup/BackupManagerImpl.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               4.20   #11258    +/-   ##
==========================================
  Coverage     16.15%   16.15%            
+ Complexity    13277    13274     -3     
==========================================
  Files          5657     5656     -1     
  Lines        497939   497811   -128     
  Branches      60386    60372    -14     
==========================================
- Hits          80443    80436     -7     
+ Misses       408532   408423   -109     
+ Partials       8964     8952    -12     
Flag Coverage Δ
uitests 4.00% <ø> (+<0.01%) ⬆️
unittests 17.00% <12.82%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rajujith rajujith self-assigned this Jul 22, 2025
@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@sureshanaparti sureshanaparti requested a review from Copilot July 22, 2025 10:43
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a restore issue with NAS backups where the wrong disk could be identified as the root disk during restoration, causing silent failures. The fix ensures volumes are processed in the correct order by sorting by deviceId instead of relying on database ID order.

  • Sort volumes by deviceId during backup creation and restore operations to maintain correct disk ordering
  • Improve error handling in LibvirtRestoreBackupCommandWrapper to return proper errors instead of failing silently
  • Standardize backup validation logic between volume attach and detach operations

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
BackupManagerImpl.java Sort volumes by deviceId when creating volume info for backups
VolumeApiServiceImpl.java Refactor backup validation logic and consistently apply to both attach/detach
NASBackupProvider.java Sort volumes by deviceId during backup and restore operations
LibvirtRestoreBackupCommandWrapper.java Improve error handling to return proper error messages instead of silent failures
VolumeApiServiceImplTest.java Update test method calls to reflect refactored validation method names

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14294

@sureshanaparti
Copy link
Contributor

@blueorangutan test

Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm @abh1sar , but can you consider CoPilot’s comments?

@blueorangutan
Copy link

[SF] Trillian test result (tid-13843)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 53221 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr11258-t13843-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@sureshanaparti
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

@rajujith rajujith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The reported issue is not present anymore. I followed the steps given in the issue that is fixed.

@rajujith rajujith removed their assignment Jul 23, 2025
@DaanHoogland DaanHoogland merged commit 1b74c2d into apache:4.20 Jul 23, 2025
25 of 26 checks passed
@DaanHoogland DaanHoogland deleted the restore-disks-order branch July 23, 2025 10:45
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14308

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NAS backup provider: Restore fails if the data volume is created before the root volume.
5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy