Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Tensor Parallel async_chunk=4 mismatch async_chunk=1 result when sequence length longer than 16K #174

Open
1 of 2 tasks
Achazwl opened this issue Nov 7, 2023 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@Achazwl
Copy link
Collaborator

Achazwl commented Nov 7, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Description of the Bug

TP linear async_chunk=4 mismatch async_chunk=1 result when sequence length longer than 16K, but match when <= 8K.

Environment Information

- GCC version: 7.5.0
- Torch version: 1.13.1
- Linux system version: Ubuntu 18.04.6 LTS
- CUDA version: 11.6
- Torch's CUDA version (as per `torch.cuda.version()`): 11.6

To Reproduce

CUDA_LAUNCH_BLOCKING can fix this

Expected Behavior

match

Screenshots

No response

Additional Information

No response

Confirmation

  • I have reviewed and verified all the information provided in this report.
@Achazwl Achazwl added the bug Something isn't working label Nov 7, 2023
@zkh2016 zkh2016 self-assigned this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants