[storage] anvil-manage-server-storage must be able to handle drbd resync during grow #748

fabbione · 2024-10-14T04:27:17Z

This is not a super common situation, but regardless it needs to be handled properly or storage is leaked during grow processes.

create a server, stop the server to resize root disk (this can happen on any disk, in my test i only had one disk).

Run for the first time:
anvil-manage-server-storage --server an-test-deploy1 --grow 5G --disk vda --confirm
....
Done!

wait for drbd resync to be completed <-- IMPORTANT. All good, you can issue again:

anvil-manage-server-storage --server an-test-deploy1 --grow 5G --disk vda --confirm
....
Done!

and it will work as expected.

wait for drbd resync to be completed <-- IMPORTANT. All good, you can issue:

anvil-manage-server-storage --server an-test-deploy1 --grow 30G --disk vda --confirm
...
Done!

and issue the same command IMMEDIATELY after:

# anvil-manage-server-storage --server an-test-deploy1 --grow 30G --disk vda --confirm
Working with the server: [an-test-deploy1], UUID: [d5af3b99-8e57-418f-99d6-90f74372ff78]
- Target: [vda], boot: [01], path: [/dev/drbd/by-res/an-test-deploy1/0], Available space: [130.00 GiB]
- Preparing to grow the storage by: [30.00GiB]...
 - Extending local LV: [/dev/anvil-test-vg/an-test-deploy1_0]...
Done!
 - Extending peer: [an-a01n02:/dev/anvil-test-vg/an-test-deploy1_0], via: [10.201.10.2 (bcn1)]
Done!
- Extending backing devices complete. Now extending DRBD resource/volume...
 Error!
[ Failed ] - When trying to grow the DRBD device: [an-test-deploy1/0]
[ Failed ] - using the command: [/usr/sbin/drbdadm resize an-test-deploy1/0]
[ Failed ] - The return code: [10] was received, expected '0'. Output, if any:
==========
print $output!#
==========
The extension of the resource is incomplete, manual intervention is required!!
[ Note ] - All backing devices have been grown. Manually resolving the drbd grow
[ Note ] - error should complete the drive expansion!

This issue is caused by drbd resource refusing a resize one is already in flight. At this point we are leaking storage.

The lv has been resized, but drbd will not see it or recognize it.

Storage is leaked any time a drbd resize request fails, this is just one possible trigger.

For the grow operation specifically, either check drbd status BEFORE resizing the lv and exit 1 if in progress (avoid leaking) or a loop is necessary to wait for the first sync to complete before issuing the next resize.

The text was updated successfully, but these errors were encountered:

digimer · 2024-10-14T04:30:38Z

What do you mean by “leaking storage”?

fabbione · 2024-10-14T04:40:17Z

Simple, the lv is resized, but not the drbd device. That means the VMs doesn´t see the storage but it is allocated in the lv/lvm. That storage is unavailable to anyone to use.

digimer · 2024-10-14T04:43:20Z

Ah, that is expected. There's a period of time where it's unavoidable that one LV is grown before the peer node is grown, and DRBD can't be grown until both are grown. If I've started a grow operation, I don't want that space to be available to others to use. The scan-lvm scan agent should see the reduced free space in the VG and drop the available space in the associated storage group.

fabbione · 2024-10-14T04:46:24Z

That is NOT the issue. The issue is that lv is grown (correctly), second drbd resize fails, nothing is going to trigger another drbd resize to match the new lv size. Hence the space is lost.

digimer · 2024-10-14T04:48:13Z

Aaaah, ok, sorry I misunderstood.

digimer · 2024-10-16T16:13:24Z

ToDo:

Check the drbd device size and compare against LV size when doing resize, and make sure all space is used. If not, do a grow.
On resize, don't even start a resize operation until both/all DRBD resources are UpToDate

Don't allow resize job to start until all nodes are online (no other way to ensure UpToDate on all DRBD nodes)

fabbione added bug High To be resolved once all urgent issues are clear 3.1 Priorities labels Oct 14, 2024

fabbione assigned digimer Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[storage] anvil-manage-server-storage must be able to handle drbd resync during grow #748

[storage] anvil-manage-server-storage must be able to handle drbd resync during grow #748

fabbione commented Oct 14, 2024

digimer commented Oct 14, 2024

fabbione commented Oct 14, 2024

digimer commented Oct 14, 2024

fabbione commented Oct 14, 2024

digimer commented Oct 14, 2024

digimer commented Oct 16, 2024

[storage] anvil-manage-server-storage must be able to handle drbd resync during grow #748

[storage] anvil-manage-server-storage must be able to handle drbd resync during grow #748

Comments

fabbione commented Oct 14, 2024

digimer commented Oct 14, 2024

fabbione commented Oct 14, 2024

digimer commented Oct 14, 2024

fabbione commented Oct 14, 2024

digimer commented Oct 14, 2024

digimer commented Oct 16, 2024