Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report AVX512_BF16 support in CPUID features #10991

Open
jseba opened this issue Oct 4, 2024 · 2 comments
Open

Report AVX512_BF16 support in CPUID features #10991

jseba opened this issue Oct 4, 2024 · 2 comments
Assignees
Labels
type: enhancement New feature or request

Comments

@jseba
Copy link
Contributor

jseba commented Oct 4, 2024

Description

We've been doing some performance analysis and have noticed that on bare-metal, a PyTorch image conversion from RGB to YUV will take over 1s for a sample image and on bare-metal it takes less than 50ms. This is fully on the CPU, no CUDA involved. I'm not super familiar with the CPU features that PyTorch uses or how to profile it, but looking at the CPU flags reported via /proc/cpuinfo on the host with what gVisor is reporting, the big missing AVX512 flag I'm seeing is for bfloat16 support, avx512_bf16.

From what I can tell reading the Intel manual, this flag is in the Structured Extended Feature Enumeration Sub-leaf, where EAX=0x07H and ECX=1 and Linux calls this block 12. I don't see support in features_amd64.go for this block at all and am wondering if there's already a tracking ticket for surfacing bf16 support in gVisor or a reason it's not supported (beyond just "hasn't been done yet" 😄).

I can take a pass at implementing it if there's any advice around adding entire new blocks, especially when it looks like there's a gap going from block 7 to block 12.

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

@jseba jseba added the type: enhancement New feature or request label Oct 4, 2024
@avagin avagin self-assigned this Oct 4, 2024
@avagin
Copy link
Collaborator

avagin commented Oct 4, 2024

I think it should look like this:

diff --git a/pkg/cpuid/features_amd64.go b/pkg/cpuid/features_amd64.go
index 4831fda31..d93010242 100644
--- a/pkg/cpuid/features_amd64.go
+++ b/pkg/cpuid/features_amd64.go
@@ -135,6 +135,18 @@ func (f Feature) set(s ChangeableSet, on bool) {
 			out.Edx &^= f.bit()
 		}
 		s.Set(In{Eax: uint32(extendedFeatureInfo)}, out)
+	case 8:
+		in := In{
+			Eax: extendedFeatureInfoSubLeaf.eax(),
+			Exc: extendedFeatureInfoSubLeaf.ecx(),
+		}
+		out := s.Query(in)
+		if on {
+			out.Eax |= f.bit()
+		} else {
+			out.Eax &^= f.bit()
+		}
+		s.Set(in, out)
 	}
 }
 
@@ -181,6 +193,8 @@ func (f Feature) check(fs FeatureSet) bool {
 	case 7:
 		_, _, _, dx := fs.query(extendedFeatureInfo)
 		return (dx & f.bit()) != 0
+	case 8: ax, _, _, _ := fs.query(extendedFeatureInfoSubLeaf)
+		return (ax & f.bit()) != 0
 	default:
 		return false
 	}
@@ -437,6 +451,17 @@ const (
 	X86FeatureSPEC_CTRL_SSBD
 )
 
+// Block 8 constants are the extended features sub-leaf bits in
+// CPUID.(EAX=07H,ECX=1):EAX.
+const (
+	_ Feature = 8*32 + iota // eax bit 0 is reserved.
+	_                       // eax bit 1 is reserved.
+	_                       // eax bit 2 is reserved.
+	_                       // eax bit 3 is reserved.
+	X86FeatureAVX_VNNI
+	X86FeatureAVX512_BF16
+)
+
 // These are the extended floating point state features. They are used to
 // enumerate floating point features in XCR0, XSTATE_BV, etc.
 const (
diff --git a/pkg/cpuid/native_amd64.go b/pkg/cpuid/native_amd64.go
index ac2fcbbcc..19666471e 100644
--- a/pkg/cpuid/native_amd64.go
+++ b/pkg/cpuid/native_amd64.go
@@ -49,6 +49,7 @@ const (
 	monitorMwaitParams            cpuidFunction = 0x5               // Returns information about monitor/mwait instructions.
 	powerParams                   cpuidFunction = 0x6               // Returns information about power management and thermal sensors.
 	extendedFeatureInfo           cpuidFunction = 0x7               // Returns extended feature bits.
+	extendedFeatureInfoSubLeaf    cpuidFunction = 0x7 | (0x1 << 32) // Returns extended feature sub-leaf bits.
 	_                                                               // Function 0x8 is reserved.
 	intelDCAParams                cpuidFunction = 0x9               // Returns direct cache access information. Intel only.
 	intelPMCInfo                  cpuidFunction = 0xa               // Returns information about performance monitoring features. Intel only.

@avagin avagin assigned jseba and unassigned avagin Oct 4, 2024
@jseba
Copy link
Contributor Author

jseba commented Oct 7, 2024

Awesome, I'll give that patch a test this week and try to get a PR together with some tests soon if that goes well 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants