[FP16]For OPTForCausalLM train on stock pytorch, aten::lt cost time on pvc-1100 worse than A100 * ratio #881

xiaowangintel · 2024-09-09T03:24:57Z

🐛 Describe the bug

For more details, please refer to https://jira.devtools.intel.com/browse/PYTORCHDGQ-5160?filter=-2.

Versions

pytorch commit:03480213dea1f60f6d12e7348904d2f3ef7314d0
torch-xpu-ops commit:718bc42c667539977e5eadb11ea4dec602544bf2
driver : hotfix_agama-ci-devel-881.19
pti : l_intel-pti-dev_p_0.9.0.38_offline.sh
basekit : l_BaseKit_p_2024.2.1.100_offline.sh

chuanqi129 added E2E performance labels Sep 10, 2024

chuanqi129 added this to the PT2.6 milestone Sep 10, 2024

xytintel added the loops_kernel Loops Kernel Backbone label Sep 10, 2024

fengyuan14 self-assigned this Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FP16]For OPTForCausalLM train on stock pytorch, aten::lt cost time on pvc-1100 worse than A100 * ratio #881

[FP16]For OPTForCausalLM train on stock pytorch, aten::lt cost time on pvc-1100 worse than A100 * ratio #881

xiaowangintel commented Sep 9, 2024

[FP16]For OPTForCausalLM train on stock pytorch, aten::lt cost time on pvc-1100 worse than A100 * ratio #881

[FP16]For OPTForCausalLM train on stock pytorch, aten::lt cost time on pvc-1100 worse than A100 * ratio #881

Comments

xiaowangintel commented Sep 9, 2024

🐛 Describe the bug

Versions