feat(core/remio): add Remote I/O infrastructure #176

joaopeixoto13 · 2024-09-19T14:08:50Z

This PR introduces the foundational support for VirtIO by implementing a mechanism to forward VirtIO requests bidirectionally between the guest VM (or frontend VM) and its associated backend VM.

Validation

Platforms
- Armv8-A AArch64
  - Xilinx Zynq UltraScale+ MPSoC ZCU102/4
  - NVIDIA Jetson TX2
  - QEMU virt
- RISC-V RV64
  - QEMU virt
System configuration
- Multiple VirtIO frontend drivers on multiple guest VMs, each running on multiple vCPUs
- Multiple VirtIO backend devices on multiple backend VMs, each running on multiple vCPUs
Backend devices
- Rust-VMM backends (VirtIO-Block, VirtIO-Net, VirtIO-Console)
- QEMU (VirtIO-Block, VirtIO-Net, VirtIO-Console)
- Vhost-user backends (vhost-user-fs, vhost-user-vsock)
- Vhost backends (vhost-vsock)

DavidMCerdeira

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io

some of the comments may be a bit subjective. @josecm feel free to wage in

src/core/remote_io.c

src/core/inc/remote_io.h

src/core/remote_io.c

joaopeixoto13 · 2024-09-20T10:28:11Z

This new version includes a refactor of the code following @DavidMCerdeira's suggestions.
The key changes are as follows:

Introduced helper functions as part of a new API, such as remote_io_create_request, remote_io_get_request, remote_io_push_request_event, remote_io_pop_request_event, etc.
Decoupled the large remote_io_hypercall function into smaller, modular functions, each corresponding to a specific hypercall event.
Added the REMOTE_IO_DEV_TYPE enum for better type safety and code clarity.
Introduced a new type, remote_io_id_t, to manage the Remote I/O ID more effectively.

This refactoring improves the structure and maintainability of the code.

joaopeixoto13 · 2024-09-20T10:45:35Z

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io

some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

src/core/inc/remote_io.h

src/core/remote_io.c

josecm · 2024-09-20T12:32:50Z

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

joaopeixoto13 · 2024-09-20T13:16:01Z

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

Update: 01c2cbe

DavidMCerdeira · 2024-09-20T13:27:08Z

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

Update: 01c2cbe

The namespace change should be reflected in the function, variable names, enums, etc

joaopeixoto13 · 2024-09-20T13:47:36Z

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

Update: 01c2cbe

The namespace change should be reflected in the function, variable names, enums, etc

Done 00b5acd

src/core/remio.c

src/core/vm.c

src/core/remio.c

src/core/vm.c

src/core/remio.c

DavidMCerdeira

I'd still like to test this before approving, but for now it looks good! 👍
You can squash the commits into a single one now

joaopeixoto13 · 2024-09-21T09:12:45Z

I'd still like to test this before approving, but for now it looks good! 👍 You can squash the commits into a single one now

Thanks, @DavidMCerdeira ! Also, please take a look at this commit, where I temporarily resolved the issue by replacing the psci_power_down call with a WFI instruction. However, I believe we need to find a more appropriate long-term solution. I haven't reviewed the current guest reboot PR yet, but it may offer some insight or help in addressing this issue.

DavidMCerdeira · 2024-09-21T09:20:33Z

I'd still like to test this before approving, but for now it looks good! 👍 You can squash the commits into a single one now

Thanks, @DavidMCerdeira ! Also, please take a look at this commit, where I temporarily resolved the issue by replacing the psci_power_down call with a WFI instruction. However, I believe we need to find a more appropriate long-term solution. I haven't reviewed the current guest reboot PR yet, but it may offer some insight or help in addressing this issue.

I think for now it is fine. Actually we are aware of at least two platforms that require this behavior to work with bao currently. We should eventually allow platforms to have specific code to handle this sort of platform specific quirks

josecm

Despite the very (!) good work, I think we still need a bit of work on this PR. This is my first pass on it. I think we will need one or two more.

src/arch/armv8/armv8-a/cpu.c

src/arch/riscv/sbi.c

src/core/inc/remio.h

src/core/remio.c

josecm · 2024-09-29T17:01:57Z

src/core/remio.c

+ */
+static struct remio_request_event* remio_create_event(void)
+{
+    struct remio_request_event* event = objpool_alloc(&remio_request_event_pool);


Since there is at most one event per vcpu at any given time. Couldn't we have that event in the vcpu struct instead of allocating it "dynamically"?

I don't think this approach is feasible. Even if we modify the remio_device struct to include pointers to the frontend and backend VMs (replacing the existing remio_device_config struct), a remote I/O device can run across multiple vCPUs. So, my question is: how would we retrieve the first event in this scenario? Given that the VM struct contains a pointer to its vCPUs, the only solution I’m aware of is iterating over all vCPUs to check for events. However, this approach would prioritize vCPUs with lower IDs over those with higher IDs. Is there a more efficient method that could avoid this bias?

A given event/request is associated with a single vcpu that carries out the access. Neverthless, that wouldn't be possible because the frontend vcpu is not mapped by the hypervisor in the backend cpu.

However, I'd suggest restructuring the remio_request pool in a single array with VCPU_NUM size.

Done 3e9940e

josecm · 2024-09-29T17:05:46Z

src/core/remio.c

+struct list remio_device_list;
+
+/** Array of Remote I/O requests */
+struct remio_request remio_requests[REMIO_CPU_NUM][REMIO_VCPU_NUM];


This seems to be wasting memory since for a given CPU there is at most one vcpu.

More, I don't really like the way were are allocating requests. I would suggest changing this structure. Since each request is associated with an event, we could merge the two concepts and only free the object from the pool once the request was completed.

Then instead of indexing requests with the cpu_id/vcpu_id combination we would have a dynamically allocated id for events/requests that would be passed to the backend for later identification on hypercalls. Also associated with it, we would have a "backend owner VM id" that the hypercall handler would use to check that that event/request is indeed being hanlded by that VM.

This seems to be wasting memory since for a given CPU there is at most one vcpu.

Yes, you're right. However, I implemented it this way to allow for potential future enhancements in Bao, such as exploring more advanced configurations. These could include 1-to-many mappings between physical and virtual CPUs, or sharing the same vCPU across multiple VMs. In the former case, the vCPU ID would correspond to the actual vCPU, while in the latter, it would be associated with the VM ID.

More, I don't really like the way were are allocating requests. I would suggest changing this structure. Since each request is associated with an event, we could merge the two concepts and only free the object from the pool once the request was completed.

From my tests, I attempted to merge the two concepts, but based on my experience, this introduces higher costs in terms of both code complexity and performance. First, since the request event is associated with a specific remote I/O device, when the backend requests a new event, it doesn’t need to iterate through all vCPUs to retrieve it. Instead, it can simply pop the oldest element from the list.

Additionally, the issue of vCPU prioritization that I mentioned earlier still persists when merging the two concepts. Furthermore, as the list is already protected by a spinlock, there’s no need for additional synchronization primitives, which results in simpler code.

Then instead of indexing requests with the cpu_id/vcpu_id combination we would have a dynamically allocated id for events/requests that would be passed to the backend for later identification on hypercalls.

But how do you locate the request? Are you suggesting iterating over all vCPUs associated with a frontend VM? The same issue I mentioned earlier would persist in this approach.

I do agree that we could dynamically create a unique identifier to replace the CPU and vCPU IDs. Perhaps we could apply a basic mask and concatenate the two numbers into a single unsigned long variable, generating a unique identifier that could then be used to index the request.

Also associated with it, we would have a "backend owner VM id" that the hypercall handler would use to check that that event/request is indeed being hanlded by that VM.

I think I may have misunderstood what you referred to as the "backend owner VM ID." However, the current implementation already verifies that the backend VM performing the hypercall is the owner of the remote I/O backend device. Since the event request list is associated with each remote I/O device, we ensure that only the corresponding backend interacts with the event request.

The logic of handling the request queue would be exactly the same. The only thing that would fundamentally change in my suggestion would be the way you allocate events/requests and the way the backend identifies the specific event/request on an hypercall.

Done 3e9940e

I still think we dont need the event array in the device. We can directly access it via the objpool. We just need some check to make sure that event belongs to the VM calling the ask hypercall and that that VM is the backend for the device associated with that event.

josecm · 2024-09-29T17:15:33Z

src/core/remio.c

+    unsigned long value = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(3));
+    unsigned long cpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(4));
+    unsigned long vcpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(5));


I am thinking we should change the hypercall API to allow more arguments. What do you think @DavidMCerdeira?

If you agree, I can add a commit to modify the hypercall function to retrieve additional arguments (e.g., three in this case).

Ideally we would come up with a way of registering hypercall handlers with different numbers of arguments...

I think adding an API for reading arguments is a good idea
This way each handler parses the arguments as it sees fit

I like that idea too! Maybe also an analogous function for setting the return values?

Yes I agree!

@joaopeixoto13 did you address this?

src/core/remio.c

josecm · 2024-09-29T19:57:21Z

src/core/remio.c

+    switch (op) {
+        case REMIO_HYP_WRITE:
+        case REMIO_HYP_READ:
+            if (!remio_handle_rw(value, cpu_id, vcpu_id)) {


Isn't there a major security breach here? Any VM could issue an hypercall with one of these ops and any cpu_id/vcpu_id indexing which and consequently be able to alter the state of that event/request and consequently the response to the backend.

This is not true. We first verify if the VM that performed the hypercall owns the remote I/O device using remio_find_vm_dev_by_id, and only after confirming ownership do we proceed to call the associated methods.

But this is not checked on the remio_handle_rw call. The cpu_id/vcpu_id are used to access the request without any check.

Done 3e9940e
After validating and extracting the associated Remote I/O device from the backend VM, we now pass the device pointer to the remio_handle_rw. Since the requests are tied to the specific Remote I/O device, this ensures that only the correct device on the correct backend VM can modify its state.

There is still an issue here. Calling remio_handle_rw will end up in calling remio_get_request that performs the access to the request array without checking if the id is valid!

src/arch/riscv/sbi.c

josecm · 2024-10-02T14:26:20Z

src/core/remio.c

+    unsigned long value = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(3));
+    unsigned long cpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(4));
+    unsigned long vcpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(5));


I like that idea too! Maybe also an analogous function for setting the return values?

josecm · 2024-10-29T11:17:58Z

@joaopeixoto13, to keep a clean history, try as much as possible remove the "fix" commits and introduce the change correctly in the first commit that code is introduced.

joaopeixoto13 · 2024-10-29T14:15:19Z

@joaopeixoto13, to keep a clean history, try as much as possible remove the "fix" commits and introduce the change correctly in the first commit that code is introduced.

Yes, I understand. I split these smaller fixes to make it easier for you to review the changes. At the end of the PR, I’ll rebase all the fix commits into the main "parent" commit that introduced the feature.

DavidMCerdeira

Only nitpicks
Otherwise looks good to me!

src/core/remio.c

src/core/inc/config_defs.h

josecm · 2024-11-02T12:17:00Z

@joaopeixoto13 rebase on main and resolve conflicts please

Signed-off-by: João Peixoto <[email protected]>

This commit renames the first argument of the hypercall to make the code independent of the specific hypercall type. Signed-off-by: João Peixoto <[email protected]>

This commit introduces a separation of input and output conventions for hypercalls, addressing architecture-specific differences between ARM and RISC-V. On ARM, the x0 register is used to pass both the extid, fid, and the return value, while additional parameters (hypercalls arguments) are passed via registers x1 to xn. On RISC-V, while the return value is passed in a0, the extid and fid are handled by the a7 and a6 registers, respectively. This means that the hypercall input arguments on RISC-V should be passed through registers a0 to a5, while output hypercall arguments are passed via a2 and onward (because all registers except a0 and a1 must be preserved across an SBI call by the callee). To accommodate these differences and provide a uniform interface, two macros were introduced to abstract the handling of input and output hypercall arguments across architectures. Signed-off-by: João Peixoto <[email protected]>

Refactored the vCPU program counter incrementation by introducing the architecture-agnostic method vcpu_writepc, aligning the implementation with the approach used in the ARM architecture. Signed-off-by: João Peixoto <[email protected]>

Implemented support for abort and exception handlers to suspend the current vCPU based on the vCPU active flag. Signed-off-by: João Peixoto <[email protected]>

joaopeixoto13 · 2024-11-02T12:26:11Z

@joaopeixoto13 rebase on main and resolve conflicts please

Done. If everything looks good, I can proceed to merge all the fix commits into the main parent commit that introduced the feature.

josecm · 2024-11-02T15:51:54Z

Done. If everything looks good, I can proceed to merge all the fix commits into the main parent commit that introduced the feature.

Please go ahead with that.

This commit introduces two new methods to the objpool API. The first method, objpool_alloc_with_id, enables the allocation of an object within a pool and returns the index where the object was allocated. The second method, objpool_get_by_id, allows retrieving a previously allocated object using the index of the pool where it was initially allocated. Signed-off-by: João Peixoto <[email protected]>

This commit introduces the foundational support for VirtIO by implementing a mechanism to forward VirtIO requests bidirectionally between the guest VM (or frontend VM) and its associated backend VM. Signed-off-by: João Peixoto <[email protected]>

joaopeixoto13 · 2024-11-02T17:21:56Z

Done. If everything looks good, I can proceed to merge all the fix commits into the main parent commit that introduced the feature.

Please go ahead with that.

Done ✅

joaopeixoto13 requested review from josecm, ninolomata and DavidMCerdeira as code owners September 19, 2024 14:08

joaopeixoto13 force-pushed the feat/remote_io branch from 0afe0bf to 38d7db2 Compare September 19, 2024 14:13

josecm self-assigned this Sep 19, 2024

joaopeixoto13 force-pushed the feat/remote_io branch from 38d7db2 to 01019b2 Compare September 19, 2024 15:37

DavidMCerdeira requested changes Sep 19, 2024

View reviewed changes

joaopeixoto13 force-pushed the feat/remote_io branch from 01019b2 to b999298 Compare September 20, 2024 10:15

DavidMCerdeira self-requested a review September 20, 2024 10:30

DavidMCerdeira requested changes Sep 20, 2024

View reviewed changes

joaopeixoto13 changed the title ~~feat(core/remote_io): add Remote I/O infrastructure~~ feat(core/remio): add Remote I/O infrastructure Sep 20, 2024

DavidMCerdeira self-requested a review September 20, 2024 14:02

DavidMCerdeira requested changes Sep 20, 2024

View reviewed changes

src/core/remio.c Outdated Show resolved Hide resolved

src/core/remio.c Outdated Show resolved Hide resolved

src/core/remio.c Outdated Show resolved Hide resolved

src/core/vm.c Outdated Show resolved Hide resolved

src/core/remio.c Outdated Show resolved Hide resolved

DavidMCerdeira requested changes Sep 20, 2024

View reviewed changes

src/core/vm.c Outdated Show resolved Hide resolved

src/core/remio.c Outdated Show resolved Hide resolved

src/core/remio.c Outdated Show resolved Hide resolved

src/core/remio.c Outdated Show resolved Hide resolved

joaopeixoto13 force-pushed the feat/remote_io branch from 93e33be to 0c544f9 Compare September 21, 2024 08:34

DavidMCerdeira reviewed Sep 21, 2024

View reviewed changes

joaopeixoto13 force-pushed the feat/remote_io branch 2 times, most recently from 9c31e3e to 7c277bf Compare September 27, 2024 12:17

josecm requested changes Sep 29, 2024

View reviewed changes

josecm requested changes Oct 7, 2024

View reviewed changes

joaopeixoto13 force-pushed the feat/remote_io branch from f0a2ce4 to 93e0e01 Compare October 22, 2024 16:18

joaopeixoto13 requested a review from AfonsoSantos96 as a code owner October 22, 2024 16:18

joaopeixoto13 force-pushed the feat/remote_io branch from 93106f2 to 78b08a2 Compare October 28, 2024 15:57

joaopeixoto13 force-pushed the feat/remote_io branch 2 times, most recently from 29988f8 to 26a8a46 Compare November 1, 2024 11:42

DavidMCerdeira previously approved these changes Nov 1, 2024

View reviewed changes

src/core/remio.c Outdated Show resolved Hide resolved

src/core/inc/config_defs.h Show resolved Hide resolved

joaopeixoto13 added 5 commits November 2, 2024 12:20

feat(inc/list): add list size function

25881e8

Signed-off-by: João Peixoto <[email protected]>

ref(core/hypercall): rename first argument

de51610

This commit renames the first argument of the hypercall to make the code independent of the specific hypercall type. Signed-off-by: João Peixoto <[email protected]>

feat(arch/exception_handler): enable exception handlers to suspend vCPU

6e4a384

Implemented support for abort and exception handlers to suspend the current vCPU based on the vCPU active flag. Signed-off-by: João Peixoto <[email protected]>

joaopeixoto13 dismissed DavidMCerdeira’s stale review via 6a55d0e November 2, 2024 12:24

joaopeixoto13 force-pushed the feat/remote_io branch from 26a8a46 to 6a55d0e Compare November 2, 2024 12:24

joaopeixoto13 added 2 commits November 2, 2024 17:15

joaopeixoto13 force-pushed the feat/remote_io branch from 6a55d0e to d56ef75 Compare November 2, 2024 17:21

DavidMCerdeira self-requested a review November 2, 2024 18:27

DavidMCerdeira approved these changes Nov 2, 2024

View reviewed changes

feat(core/remio): add Remote I/O infrastructure #176

Are you sure you want to change the base?

feat(core/remio): add Remote I/O infrastructure #176

Conversation

joaopeixoto13 commented Sep 19, 2024 • edited Loading

DavidMCerdeira left a comment

Choose a reason for hiding this comment

joaopeixoto13 commented Sep 20, 2024

joaopeixoto13 commented Sep 20, 2024

josecm commented Sep 20, 2024

joaopeixoto13 commented Sep 20, 2024

DavidMCerdeira commented Sep 20, 2024

joaopeixoto13 commented Sep 20, 2024

DavidMCerdeira left a comment • edited Loading

Choose a reason for hiding this comment

joaopeixoto13 commented Sep 21, 2024

DavidMCerdeira commented Sep 21, 2024

josecm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josecm commented Oct 29, 2024

joaopeixoto13 commented Oct 29, 2024

DavidMCerdeira left a comment

Choose a reason for hiding this comment

josecm commented Nov 2, 2024

joaopeixoto13 commented Nov 2, 2024 • edited Loading

josecm commented Nov 2, 2024

joaopeixoto13 commented Nov 2, 2024 • edited Loading

joaopeixoto13 commented Sep 19, 2024 •

edited

Loading

DavidMCerdeira left a comment •

edited

Loading

joaopeixoto13 commented Nov 2, 2024 •

edited

Loading

joaopeixoto13 commented Nov 2, 2024 •

edited

Loading