Report AVX512_BF16 support in CPUID features #10991

jseba · 2024-10-04T00:52:18Z

Description

We've been doing some performance analysis and have noticed that on bare-metal, a PyTorch image conversion from RGB to YUV will take over 1s for a sample image and on bare-metal it takes less than 50ms. This is fully on the CPU, no CUDA involved. I'm not super familiar with the CPU features that PyTorch uses or how to profile it, but looking at the CPU flags reported via /proc/cpuinfo on the host with what gVisor is reporting, the big missing AVX512 flag I'm seeing is for bfloat16 support, avx512_bf16.

From what I can tell reading the Intel manual, this flag is in the Structured Extended Feature Enumeration Sub-leaf, where EAX=0x07H and ECX=1 and Linux calls this block 12. I don't see support in features_amd64.go for this block at all and am wondering if there's already a tracking ticket for surfacing bf16 support in gVisor or a reason it's not supported (beyond just "hasn't been done yet" 😄).

I can take a pass at implementing it if there's any advice around adding entire new blocks, especially when it looks like there's a gap going from block 7 to block 12.

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

The text was updated successfully, but these errors were encountered:

avagin · 2024-10-04T18:56:58Z

I think it should look like this:

diff --git a/pkg/cpuid/features_amd64.go b/pkg/cpuid/features_amd64.go
index 4831fda31..d93010242 100644
--- a/pkg/cpuid/features_amd64.go
+++ b/pkg/cpuid/features_amd64.go
@@ -135,6 +135,18 @@ func (f Feature) set(s ChangeableSet, on bool) {
 			out.Edx &^= f.bit()
 		}
 		s.Set(In{Eax: uint32(extendedFeatureInfo)}, out)
+	case 8:
+		in := In{
+			Eax: extendedFeatureInfoSubLeaf.eax(),
+			Exc: extendedFeatureInfoSubLeaf.ecx(),
+		}
+		out := s.Query(in)
+		if on {
+			out.Eax |= f.bit()
+		} else {
+			out.Eax &^= f.bit()
+		}
+		s.Set(in, out)
 	}
 }
 
@@ -181,6 +193,8 @@ func (f Feature) check(fs FeatureSet) bool {
 	case 7:
 		_, _, _, dx := fs.query(extendedFeatureInfo)
 		return (dx & f.bit()) != 0
+	case 8: ax, _, _, _ := fs.query(extendedFeatureInfoSubLeaf)
+		return (ax & f.bit()) != 0
 	default:
 		return false
 	}
@@ -437,6 +451,17 @@ const (
 	X86FeatureSPEC_CTRL_SSBD
 )
 
+// Block 8 constants are the extended features sub-leaf bits in
+// CPUID.(EAX=07H,ECX=1):EAX.
+const (
+	_ Feature = 8*32 + iota // eax bit 0 is reserved.
+	_                       // eax bit 1 is reserved.
+	_                       // eax bit 2 is reserved.
+	_                       // eax bit 3 is reserved.
+	X86FeatureAVX_VNNI
+	X86FeatureAVX512_BF16
+)
+
 // These are the extended floating point state features. They are used to
 // enumerate floating point features in XCR0, XSTATE_BV, etc.
 const (
diff --git a/pkg/cpuid/native_amd64.go b/pkg/cpuid/native_amd64.go
index ac2fcbbcc..19666471e 100644
--- a/pkg/cpuid/native_amd64.go
+++ b/pkg/cpuid/native_amd64.go
@@ -49,6 +49,7 @@ const (
 	monitorMwaitParams            cpuidFunction = 0x5               // Returns information about monitor/mwait instructions.
 	powerParams                   cpuidFunction = 0x6               // Returns information about power management and thermal sensors.
 	extendedFeatureInfo           cpuidFunction = 0x7               // Returns extended feature bits.
+	extendedFeatureInfoSubLeaf    cpuidFunction = 0x7 | (0x1 << 32) // Returns extended feature sub-leaf bits.
 	_                                                               // Function 0x8 is reserved.
 	intelDCAParams                cpuidFunction = 0x9               // Returns direct cache access information. Intel only.
 	intelPMCInfo                  cpuidFunction = 0xa               // Returns information about performance monitoring features. Intel only.

jseba · 2024-10-07T16:36:57Z

Awesome, I'll give that patch a test this week and try to get a PR together with some tests soon if that goes well 👍

jseba added the type: enhancement New feature or request label Oct 4, 2024

avagin self-assigned this Oct 4, 2024

avagin assigned jseba and unassigned avagin Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report AVX512_BF16 support in CPUID features #10991

Report AVX512_BF16 support in CPUID features #10991

jseba commented Oct 4, 2024

avagin commented Oct 4, 2024

jseba commented Oct 7, 2024

Report AVX512_BF16 support in CPUID features #10991

Report AVX512_BF16 support in CPUID features #10991

Comments

jseba commented Oct 4, 2024

Description

Is this feature related to a specific bug?

Do you have a specific solution in mind?

avagin commented Oct 4, 2024

jseba commented Oct 7, 2024