bf16 matmul's corresponding `tensor.pack` not properly optimized #320

yifeizh2 · 2024-09-05T02:41:31Z

Currently, the following 2 single-layer MLP have worst performance compared with GC v1.

dtype	batch size	hidden list	GC V1	`8c55a05` remove brgemm read lock
bf16	128	1024x1024	0.0286	0.0828
bf16	128	1024x512	0.0204	0.0670

We performed detailed breakdown as follows:

128x1024x1024	GC v1	`8c55a05`
matmul only	0.01766	0.01989
tiled pack (or reorder)	0.02634	0.04632
total	0.04418	0.077969

and

128x1024x512	GC v1	`8c55a05`
matmul only	0.01587	0.01591
tiled pack (or reorder)	0.01278	0.0398
total	0.02881	0.06917

Are there any further optimization opportunity for vnni pack?

The text was updated successfully, but these errors were encountered:

BRUCE11111 · 2024-09-05T02:50:34Z

VNNI reorder will be included in my to-do list. However, the current priority is to merge the physical register pass and the corresponding vector-based op fusion under static shape into master as soon as possible (within two weeks). Then support dynamic shape for the sake of another issue, and then optimize the instruction level of specific op like vnni reorder. I can switch priorities if there is a more urgent need.

ZhennanQin · 2024-09-05T02:54:22Z

I guess those VNNI reorder can be folded out if we have constant weight cache support? @niuxiaog Can you try to enable weight cache for both bench-gc and OV integration?

niuxiaog · 2024-09-05T03:07:06Z

I'm working on enabling it with OV and may finish in this week. For bench-gc, maybe next week.

lmontigny · 2024-09-24T13:40:32Z

waiting for dynamic shape

yifeizh2 assigned BRUCE11111 Sep 5, 2024

yifeizh2 added the bug Something isn't working label Sep 5, 2024

lmontigny added this to the 0.1 CPU - Performance tuning milestone Sep 5, 2024

yifeizh2 added performance Speedup expected and removed bug Something isn't working labels Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bf16 matmul's corresponding `tensor.pack` not properly optimized #320

bf16 matmul's corresponding `tensor.pack` not properly optimized #320

yifeizh2 commented Sep 5, 2024

BRUCE11111 commented Sep 5, 2024 •

edited

Loading

ZhennanQin commented Sep 5, 2024

niuxiaog commented Sep 5, 2024

lmontigny commented Sep 24, 2024

bf16 matmul's corresponding tensor.pack not properly optimized #320

bf16 matmul's corresponding tensor.pack not properly optimized #320

Comments

yifeizh2 commented Sep 5, 2024

BRUCE11111 commented Sep 5, 2024 • edited Loading

ZhennanQin commented Sep 5, 2024

niuxiaog commented Sep 5, 2024

lmontigny commented Sep 24, 2024

bf16 matmul's corresponding `tensor.pack` not properly optimized #320

bf16 matmul's corresponding `tensor.pack` not properly optimized #320

BRUCE11111 commented Sep 5, 2024 •

edited

Loading