Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core/remio): add Remote I/O infrastructure #176

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

joaopeixoto13
Copy link
Member

@joaopeixoto13 joaopeixoto13 commented Sep 19, 2024

This PR introduces the foundational support for VirtIO by implementing a mechanism to forward VirtIO requests bidirectionally between the guest VM (or frontend VM) and its associated backend VM.

Validation

  • Platforms
    • Armv8-A AArch64
      • Xilinx Zynq UltraScale+ MPSoC ZCU102/4
      • NVIDIA Jetson TX2
      • QEMU virt
    • RISC-V RV64
      • QEMU virt
  • System configuration
    • Multiple VirtIO frontend drivers on multiple guest VMs, each running on multiple vCPUs
    • Multiple VirtIO backend devices on multiple backend VMs, each running on multiple vCPUs
  • Backend devices

Copy link
Member

@DavidMCerdeira DavidMCerdeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io

some of the comments may be a bit subjective. @josecm feel free to wage in

src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/inc/remote_io.h Outdated Show resolved Hide resolved
src/core/inc/remote_io.h Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
@joaopeixoto13
Copy link
Member Author

This new version includes a refactor of the code following @DavidMCerdeira's suggestions.
The key changes are as follows:

  • Introduced helper functions as part of a new API, such as remote_io_create_request, remote_io_get_request, remote_io_push_request_event, remote_io_pop_request_event, etc.
  • Decoupled the large remote_io_hypercall function into smaller, modular functions, each corresponding to a specific hypercall event.
  • Added the REMOTE_IO_DEV_TYPE enum for better type safety and code clarity.
  • Introduced a new type, remote_io_id_t, to manage the Remote I/O ID more effectively.

This refactoring improves the structure and maintainability of the code.

@joaopeixoto13
Copy link
Member Author

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io

some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

src/core/inc/remote_io.h Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
src/core/remote_io.c Outdated Show resolved Hide resolved
@josecm
Copy link
Member

josecm commented Sep 20, 2024

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

@joaopeixoto13 joaopeixoto13 changed the title feat(core/remote_io): add Remote I/O infrastructure feat(core/remio): add Remote I/O infrastructure Sep 20, 2024
@joaopeixoto13
Copy link
Member Author

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

Update: 01c2cbe

@DavidMCerdeira
Copy link
Member

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

Update: 01c2cbe

The namespace change should be reflected in the function, variable names, enums, etc

@joaopeixoto13
Copy link
Member Author

Generic comment: the namespace for this module can be shortened. I suggest rmio, remio, rm_io, rem_io
some of the comments may be a bit subjective. @josecm feel free to wage in

@josecm What are your thoughts on this? From @DavidMCerdeira's list of suggestions, remio seems like the most appropriate name. However, I'm unsure if remote_io is still considered too lengthy. For this reason, I've retained the current name in the latest version.

My suggestion would be either remio or simply rio.

Update: 01c2cbe

The namespace change should be reflected in the function, variable names, enums, etc

Done 00b5acd

src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/vm.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/vm.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
Copy link
Member

@DavidMCerdeira DavidMCerdeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to test this before approving, but for now it looks good! 👍
You can squash the commits into a single one now

@joaopeixoto13
Copy link
Member Author

I'd still like to test this before approving, but for now it looks good! 👍 You can squash the commits into a single one now

Thanks, @DavidMCerdeira ! Also, please take a look at this commit, where I temporarily resolved the issue by replacing the psci_power_down call with a WFI instruction. However, I believe we need to find a more appropriate long-term solution. I haven't reviewed the current guest reboot PR yet, but it may offer some insight or help in addressing this issue.

@DavidMCerdeira
Copy link
Member

I'd still like to test this before approving, but for now it looks good! 👍 You can squash the commits into a single one now

Thanks, @DavidMCerdeira ! Also, please take a look at this commit, where I temporarily resolved the issue by replacing the psci_power_down call with a WFI instruction. However, I believe we need to find a more appropriate long-term solution. I haven't reviewed the current guest reboot PR yet, but it may offer some insight or help in addressing this issue.

I think for now it is fine. Actually we are aware of at least two platforms that require this behavior to work with bao currently. We should eventually allow platforms to have specific code to handle this sort of platform specific quirks

@joaopeixoto13 joaopeixoto13 force-pushed the feat/remote_io branch 2 times, most recently from 9c31e3e to 7c277bf Compare September 27, 2024 12:17
Copy link
Member

@josecm josecm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the very (!) good work, I think we still need a bit of work on this PR. This is my first pass on it. I think we will need one or two more.

src/arch/armv8/armv8-a/cpu.c Outdated Show resolved Hide resolved
src/arch/riscv/sbi.c Outdated Show resolved Hide resolved
src/core/inc/remio.h Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated
*/
static struct remio_request_event* remio_create_event(void)
{
struct remio_request_event* event = objpool_alloc(&remio_request_event_pool);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is at most one event per vcpu at any given time. Couldn't we have that event in the vcpu struct instead of allocating it "dynamically"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is feasible. Even if we modify the remio_device struct to include pointers to the frontend and backend VMs (replacing the existing remio_device_config struct), a remote I/O device can run across multiple vCPUs. So, my question is: how would we retrieve the first event in this scenario? Given that the VM struct contains a pointer to its vCPUs, the only solution I’m aware of is iterating over all vCPUs to check for events. However, this approach would prioritize vCPUs with lower IDs over those with higher IDs. Is there a more efficient method that could avoid this bias?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A given event/request is associated with a single vcpu that carries out the access. Neverthless, that wouldn't be possible because the frontend vcpu is not mapped by the hypervisor in the backend cpu.

However, I'd suggest restructuring the remio_request pool in a single array with VCPU_NUM size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3e9940e

src/core/remio.c Outdated
struct list remio_device_list;

/** Array of Remote I/O requests */
struct remio_request remio_requests[REMIO_CPU_NUM][REMIO_VCPU_NUM];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be wasting memory since for a given CPU there is at most one vcpu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More, I don't really like the way were are allocating requests. I would suggest changing this structure. Since each request is associated with an event, we could merge the two concepts and only free the object from the pool once the request was completed.

Then instead of indexing requests with the cpu_id/vcpu_id combination we would have a dynamically allocated id for events/requests that would be passed to the backend for later identification on hypercalls. Also associated with it, we would have a "backend owner VM id" that the hypercall handler would use to check that that event/request is indeed being hanlded by that VM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be wasting memory since for a given CPU there is at most one vcpu.

Yes, you're right. However, I implemented it this way to allow for potential future enhancements in Bao, such as exploring more advanced configurations. These could include 1-to-many mappings between physical and virtual CPUs, or sharing the same vCPU across multiple VMs. In the former case, the vCPU ID would correspond to the actual vCPU, while in the latter, it would be associated with the VM ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More, I don't really like the way were are allocating requests. I would suggest changing this structure. Since each request is associated with an event, we could merge the two concepts and only free the object from the pool once the request was completed.

From my tests, I attempted to merge the two concepts, but based on my experience, this introduces higher costs in terms of both code complexity and performance. First, since the request event is associated with a specific remote I/O device, when the backend requests a new event, it doesn’t need to iterate through all vCPUs to retrieve it. Instead, it can simply pop the oldest element from the list.

Additionally, the issue of vCPU prioritization that I mentioned earlier still persists when merging the two concepts. Furthermore, as the list is already protected by a spinlock, there’s no need for additional synchronization primitives, which results in simpler code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then instead of indexing requests with the cpu_id/vcpu_id combination we would have a dynamically allocated id for events/requests that would be passed to the backend for later identification on hypercalls.

But how do you locate the request? Are you suggesting iterating over all vCPUs associated with a frontend VM? The same issue I mentioned earlier would persist in this approach.

I do agree that we could dynamically create a unique identifier to replace the CPU and vCPU IDs. Perhaps we could apply a basic mask and concatenate the two numbers into a single unsigned long variable, generating a unique identifier that could then be used to index the request.

Also associated with it, we would have a "backend owner VM id" that the hypercall handler would use to check that that event/request is indeed being hanlded by that VM.

I think I may have misunderstood what you referred to as the "backend owner VM ID." However, the current implementation already verifies that the backend VM performing the hypercall is the owner of the remote I/O backend device. Since the event request list is associated with each remote I/O device, we ensure that only the corresponding backend interacts with the event request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of handling the request queue would be exactly the same. The only thing that would fundamentally change in my suggestion would be the way you allocate events/requests and the way the backend identifies the specific event/request on an hypercall.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3e9940e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we dont need the event array in the device. We can directly access it via the objpool. We just need some check to make sure that event belongs to the VM calling the ask hypercall and that that VM is the backend for the device associated with that event.

src/core/remio.c Outdated
Comment on lines 608 to 610
unsigned long value = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(3));
unsigned long cpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(4));
unsigned long vcpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(5));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking we should change the hypercall API to allow more arguments. What do you think @DavidMCerdeira?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree, I can add a commit to modify the hypercall function to retrieve additional arguments (e.g., three in this case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would come up with a way of registering hypercall handlers with different numbers of arguments...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding an API for reading arguments is a good idea
This way each handler parses the arguments as it sees fit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea too! Maybe also an analogous function for setting the return values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joaopeixoto13 did you address this?

src/core/remio.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated
switch (op) {
case REMIO_HYP_WRITE:
case REMIO_HYP_READ:
if (!remio_handle_rw(value, cpu_id, vcpu_id)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a major security breach here? Any VM could issue an hypercall with one of these ops and any cpu_id/vcpu_id indexing which and consequently be able to alter the state of that event/request and consequently the response to the backend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. We first verify if the VM that performed the hypercall owns the remote I/O device using remio_find_vm_dev_by_id, and only after confirming ownership do we proceed to call the associated methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not checked on the remio_handle_rw call. The cpu_id/vcpu_id are used to access the request without any check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 3e9940e
After validating and extracting the associated Remote I/O device from the backend VM, we now pass the device pointer to the remio_handle_rw. Since the requests are tied to the specific Remote I/O device, this ensures that only the correct device on the correct backend VM can modify its state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still an issue here. Calling remio_handle_rw will end up in calling remio_get_request that performs the access to the request array without checking if the id is valid!

src/arch/riscv/sbi.c Outdated Show resolved Hide resolved
src/core/remio.c Outdated
Comment on lines 608 to 610
unsigned long value = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(3));
unsigned long cpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(4));
unsigned long vcpu_id = vcpu_readreg(cpu()->vcpu, HYPCALL_IN_ARG_REG(5));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea too! Maybe also an analogous function for setting the return values?

@josecm
Copy link
Member

josecm commented Oct 29, 2024

@joaopeixoto13, to keep a clean history, try as much as possible remove the "fix" commits and introduce the change correctly in the first commit that code is introduced.

@joaopeixoto13
Copy link
Member Author

@joaopeixoto13, to keep a clean history, try as much as possible remove the "fix" commits and introduce the change correctly in the first commit that code is introduced.

Yes, I understand. I split these smaller fixes to make it easier for you to review the changes. At the end of the PR, I’ll rebase all the fix commits into the main "parent" commit that introduced the feature.

@joaopeixoto13 joaopeixoto13 force-pushed the feat/remote_io branch 2 times, most recently from 29988f8 to 26a8a46 Compare November 1, 2024 11:42
DavidMCerdeira
DavidMCerdeira previously approved these changes Nov 1, 2024
Copy link
Member

@DavidMCerdeira DavidMCerdeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nitpicks
Otherwise looks good to me!

src/core/remio.c Outdated Show resolved Hide resolved
src/core/inc/config_defs.h Show resolved Hide resolved
@josecm
Copy link
Member

josecm commented Nov 2, 2024

@joaopeixoto13 rebase on main and resolve conflicts please

This commit renames the first argument of the hypercall to make
the code independent of the specific hypercall type.

Signed-off-by: João Peixoto <[email protected]>
This commit introduces a separation of input and output conventions for
hypercalls, addressing architecture-specific differences between ARM
and RISC-V. On ARM, the x0 register is used to pass both the extid,
fid, and the return value, while additional parameters (hypercalls arguments)
are passed via registers x1 to xn. On RISC-V, while the return value is
passed in a0, the extid and fid are handled by the a7 and a6 registers,
respectively. This means that the hypercall input arguments on RISC-V
should be passed through registers a0 to a5, while output hypercall
arguments are passed via a2 and onward (because all registers except
a0 and a1 must be preserved across an SBI call by the callee).
To accommodate these differences and provide a uniform interface,
two macros were introduced to abstract the handling of input and output
hypercall arguments across architectures.

Signed-off-by: João Peixoto <[email protected]>
Refactored the vCPU program counter incrementation by introducing
the architecture-agnostic method vcpu_writepc, aligning the
implementation with the approach used in the ARM architecture.

Signed-off-by: João Peixoto <[email protected]>
Implemented support for abort and exception handlers to suspend the
current vCPU based on the vCPU active flag.

Signed-off-by: João Peixoto <[email protected]>
@joaopeixoto13
Copy link
Member Author

joaopeixoto13 commented Nov 2, 2024

@joaopeixoto13 rebase on main and resolve conflicts please

Done. If everything looks good, I can proceed to merge all the fix commits into the main parent commit that introduced the feature.

@josecm
Copy link
Member

josecm commented Nov 2, 2024

Done. If everything looks good, I can proceed to merge all the fix commits into the main parent commit that introduced the feature.

Please go ahead with that.

This commit introduces two new methods to the objpool API.
The first method, objpool_alloc_with_id, enables the allocation of an
object within a pool and returns the index where the object was allocated.
The second method, objpool_get_by_id, allows retrieving a previously
allocated object using the index of the pool where it was initially allocated.

Signed-off-by: João Peixoto <[email protected]>
This commit introduces the foundational support for VirtIO by implementing
a mechanism to forward VirtIO requests bidirectionally between the
guest VM (or frontend VM) and its associated backend VM.

Signed-off-by: João Peixoto <[email protected]>
@joaopeixoto13
Copy link
Member Author

joaopeixoto13 commented Nov 2, 2024

Done. If everything looks good, I can proceed to merge all the fix commits into the main parent commit that introduced the feature.

Please go ahead with that.

Done ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants