Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMT pinning is broken/wrong #9

Open
gnif opened this issue May 21, 2022 · 8 comments
Open

SMT pinning is broken/wrong #9

gnif opened this issue May 21, 2022 · 8 comments

Comments

@gnif
Copy link

gnif commented May 21, 2022

Hi, I do not use your scripts but we are seeing users in the Looking Glass discord that are who are having latency related issues due to how your script assigns CPUs to the VM.

The issue is that you are not replicating the host topology into the guest, if done properly the guest can know that the extra vCPU is sharing a core and even the L1/2/3 cache arrangement.

Here is how a guest sees a properly configured VM on a SMT host (Using Coreinfo).
image

Doing this the guest scheduler can make wise decisions on where to run each thread. Obviously you need to pin each CPU properly to the threads of each core to make this work well. If done correctly your cache mapping will also align with the physical hardware... see below:

Here is my host topology (AMD EPYC 7343):
image

My guest is pinned to CPU cores 8-15, which means

vCPU  0 &  1 = CPU  8 & 24
vCPU  2 &  3 = CPU  9 & 25
vCPU  4 &  5 = CPU 10 & 26
vCPU  6 &  7 = CPU 11 & 27
vCPU  8 &  9 = CPU 12 & 28
vCPU 10 & 11 = CPU 13 & 29
vCPU 12 & 13 = CPU 14 & 30
vCPU 14 & 15 = CPU 15 & 31

When done correctly you can see that my pinning aligns with the cache map, and allows the guest to make proper use of SMT.
image

Note AMD processors require the qemu CPU flag topoext so they can use SMT.
Note2: To get the cache to align you also have to set the QEMU cpu flags l3-cache=on,host-cache-info=on

@Onepamopa
Copy link

How do you output the cpu to cache map ?

@gnif
Copy link
Author

gnif commented May 22, 2022

I used lstopo on Linux for this graphic, and in Windows CoreInfo from SysInternals
https://docs.microsoft.com/en-us/sysinternals/downloads/coreinfo

@Onepamopa
Copy link

btw, where do I set topoext & l3-cache=on,host-cache-info=on ?

@gnif
Copy link
Author

gnif commented May 22, 2022

Issues are to direct the author of this project to a problem with their software, not to provide you with support.

@ayufan
Copy link
Owner

ayufan commented May 22, 2022

Thank you @gnif. This is known. However, as docs says you should only pass physical threads, not virtual ones: https://github.com/ayufan/pve-helpers#21-cpu_taskset. And depending on CPU the mapping being different.

Maybe one thing being missing is documenting how to do with the L3, as when it was written there was no need to support NUMA/many-complexes scenario.

Technically it is possible to replicate all SMT topology, but at least I did not find it useful, or required to do a physical-to-virtual cpu-pinning of everything. Doing that is theoretically possible, but only libvirt supports that well.

@gnif
Copy link
Author

gnif commented May 22, 2022

@ayufan if I am understanding you correct, you're saying to put two VMs on the same set of cores, but separate threads? If so this is a very very bad idea, the VMs will stall each other and they will be invalidating each others caches.

According to your own documentation:

VM 1:
cpu_taskset 1-5

VM 2:
cpu_taskset 7-11

Based on the configuration there VM 1 would be on thread 1 of cores 1-5, and VM 2 would be on thread 2 of cores 1-5.

There is no such thing as a "virtual core" on the host system, both threads of a core are equal in every way, they are two identical pipelines running through and sharing some hardware that can cause them to stall each other. There is no "primary" thread, or "real" vs "virtual" thread.

If the guest OS knows about the SMT model, the guest scheduler can ensure that high priority threads like those that service interrupts for GPUs are put onto cores that can guarantee the best possible latency.

Note I am not stating this because I think it's a problem, I am stating this because it is a problem. We have people coming into our discord reporting issues with Looking Glass that are a result of very poor configuration that result due to this script. Looking Glass relies on low latency servicing of it's threads, and the GPUs driver as it's goal is to be as low latency as possible.

but at least I did not find it useful

This is just it, you did not due to your use case, but I am stating for a fact it makes a huge difference under certain workloads and you need to fix your scripts for those using such workloads, or stop promoting them.

@ayufan
Copy link
Owner

ayufan commented May 22, 2022

If so this is a very very bad idea, the VMs will stall each other and they will be invalidating each others caches.

You are fully correct. Of course they will. I can imagine this to be a problem in case of Looking Glass which requires effectively two systems to have low latency.

In my case where I don't use Looking Glass and rather use a single VM at a time, but have all of them running latency was not a problem, since other VM is mostly idle.

How you advise users to handle many VMs? Probably in this setup you expect VMs to not share physical cores, but rather pass full SMT core to them.

Anyway, I see this being a problem and happy to document those caveats. Do you have a link where best to redirect people?

@gnif
Copy link
Author

gnif commented May 22, 2022

In my case where I don't use Looking Glass and rather use a single VM at a time, but have all of them running latency was not a problem, since other VM is mostly idle.

In this case I would suggest you 1/2 the CPUs you give to your VMs and give them both threads of each cores, you will see a general performance uplift due to better management of your hardware.

Do you have a link where best to redirect people?

Not really as we are just supporting people reporting issues with LG. Perhaps the VFIO discord/reddit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants