Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression caused by read lock in brgemm #323

Open
zhczhong opened this issue Sep 5, 2024 · 2 comments · May be fixed by #350
Open

Performance regression caused by read lock in brgemm #323

zhczhong opened this issue Sep 5, 2024 · 2 comments · May be fixed by #350
Assignees
Labels
CPU performance Speedup expected

Comments

@zhczhong
Copy link
Member

zhczhong commented Sep 5, 2024

According to the perf test from @yifeizh2, the read lock will bring 10-20% performance regression

  data type bs  hidden_list  without lock with lock  
bf16 128 16x512 0.021597 0.01967 109.79%
bf16 128 512x256 0.025172 0.029239 86.09%
bf16 128 256x128 0.015969 0.026034 61.34%
bf16 128 512x1024 0.046416 0.055459 83.69%
bf16 128 1024x1024 0.082655 0.089712 92.13%
bf16 128 1024x512 0.066204 0.066523 99.52%
bf16 128 512x256 0.025992 0.029516 88.06%
@Menooker
Copy link
Contributor

read_lock_guard_t g(g_brgemm_lock);

The reader-writer lock will result in serialization in memory even all users of the lock are readers. I think we might need to re-design this, to make it totally lock-free.

@crazydemo
Copy link
Contributor

read_lock_guard_t g(g_brgemm_lock);

The reader-writer lock will result in serialization in memory even all users of the lock are readers. I think we might need to re-design this, to make it totally lock-free.

It might be because the current BRGEMM data is not exclusive to individual threads, resulting in significant contention between threads.

@crazydemo crazydemo linked a pull request Sep 20, 2024 that will close this issue
@crazydemo crazydemo linked a pull request Sep 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CPU performance Speedup expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants