Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E2E] Torchbench amp_bf16 training Super_SloMo accuracy failed #905

Open
mengfei25 opened this issue Sep 12, 2024 · 2 comments
Open

[E2E] Torchbench amp_bf16 training Super_SloMo accuracy failed #905

mengfei25 opened this issue Sep 12, 2024 · 2 comments
Assignees
Milestone

Comments

@mengfei25
Copy link
Contributor

mengfei25 commented Sep 12, 2024

🐛 Describe the bug

Looks like there is a random issue for Super_SloMo, and it will be passed with WHL install from prebuild but failed with source build.
In latest weekly,
WHL Passed: https://github.com/intel/torch-xpu-ops/actions/runs/10742335908
Source build Failed: https://github.com/intel/torch-xpu-ops/actions/runs/10741560513

And I tested WHL locally multiple times and it is passed randomly.
image

Versions

torch-xpu-ops: 1206590

@chuanqi129 chuanqi129 added the E2E label Sep 18, 2024
@chuanqi129 chuanqi129 added this to the PT2.6 milestone Sep 18, 2024
@weishi-deng
Copy link
Contributor

This issue passed in the latest weekly test and local reproducer.

@chuanqi129
Copy link
Contributor

Hi @weishi-deng This is a random failure, we may need to figure out the root cause of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants