-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for vIOMMU with mmap() #787
Comments
I've spent some more time looking into IOMMU/ATS/PRI/PASID and wrote some exploratory code. With this background, I think I have a better grasp now on what would be needed to support these features:
So much for what I've been able to identify. Given the absence of device models that make use of ATS, I don't plan to implement the above in the predictable future, but wanted to dump it here in the hope that it'll be useful to whoever may pick this up in the future. |
Thanks for the write up. Just a couple of comments:
The server is required to handle this:
Unless I'm missing something this looks quite straightforward. Regarding the rest, I'll try to extend this once I've fully understood it ;). |
FWIW, I have uploaded my proof-of-concept code at https://github.com/mnissler-rivos/libvfio-user/tree/ats It successfully handles PASID-enabled DMA read/write against pages that get dynamically requested. Tested with a standalone implementation of qemu's edu device model (with some extensions to add PASID support) as the server and a heavily hacked qemu client based on oracle's qemu github repo. Just sharing what I have in case it is useful - this code isn't suitable for merging, but intended to supplement my wall of text above. |
Took me a while to read this. One question: I'm not quite clear why we would/should implement something like PRI Since we (server) need to have mapped access to memory ahead of time anyway via DMA_MAP, why does it make sense for us to be demand-faulting in like this? |
You will probably know/understand some/most of the below, but let me start at the beginning to present a coherent line of thought: The premise of placing IOMMUs in the DMA data path is that devices actually don't have access to the entire system memory, but the OS can control what they have access to by programming the (v)IOMMU. Even then, the device/VFIO-user server can still pretend that it has full access to its I/O virtual address space: The host can perform DMA_MAP/DMA_UNMAP operations as the guest kernel manages the IOVA and reprograms the vIOMMU. As long as the device only hits accessible memory, everything will work fine (qemu actually supports this to some extent for vhost already IIRC). The problematic point with the above approach is that we need to proactively set up IOMMU mappings for all memory that the device/VFIO-user server may potentially access. There are two issues with that:
Both cases are addressed by demand-paging as enabled by PRI. So much for what problems PRI support can solve. Whether these problems are worth addressing in VFIO-user is a separate question though. I suspect that right now, setups that employ a vIOMMU are pretty rare, and VFIO-user servers that expose devices which actually would benefit from demand-paging / PRI are largely non-existent (as a side note, my own interest is motivated by the desire to model/mimic real hardware as closely as possible, but I appreciate that this isn't a primary goal for the VFIO-user protocol or libvfio-user project). The situation might change in the future though: If/when more hardware starts adopting ATS/PRI and SVA becomes more prevalent, sooner or later folks will want to use this within their VMs, and thus PRI/ATS will become relevant for VFIO-user. Given the above, IMO it's perfectly fair to keep this open as a future enhancement, to be picked up if/when a more substantial use case appears. That said, there is one angle that I think may warrant consideration now: PASID support in the protocol. DMA_MAP, DMA_UNMAP, DMA_READ, DMA_WRITE operations must convey the PASID they're targeting, which will necessitate additional protocol fields. I understand the protocol is still unstable, so we might want to add PASID fields now to avoid going through an awkward protocol upgrade at a later point. Then again, I don't see PASID support in VFIO at this point, so there's also an argument to wait for PASID support to appear there. |
This is a worth goal - and potentially especially relevant in cases where the server side is acting as a proxy for some kind of real hardware underneath. What's foxing me about PRI here is really about who is actually acting as the IOMMU, I think. This isn't arguing for or against actually having ATS etc. support, just the nature of the implementation.
I would like to see that for sure. |
Ah, thanks for drawing attention to this angle. I had actually thought about this and remote IOMMU approaches are definitely possible. I previously ended up concluding that the IOMMU probably makes more sense on the qemu side, but failed to mention any of this... Here are some thoughts:
This might be feasible in the long run, but would certainly require more plumbing as the kernel would somehow have to communicate page requests from the device to the server (as opposed to just inspecting the address space and deciding that there's nothing there). Something like a page fault handler for hardware faults. |
libvfio-user doesn't work with a vIOMMU that remaps GPAs to IOVAs and wants to access guest RAM via
mmap()
(it works fine if guest RAM is accessed viavfu_sgl_read
andvfu_sgl_write
).From discussion with @jlevon and @mnissler-rivos on libvfio-user Slack (https://libvfio-user.slack.com/archives/C01AFGCSPTR/p1696411059172959):
The text was updated successfully, but these errors were encountered: