-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Maybe we should use dynamic shared memory by default #138
Comments
The benchmark results for random selected benchmark sets, I think it's acceptable. Input arguments: (1, 16384, 16384, 'float16', 'int4', 'float16', 'float16', 'nt', False, None, False, False, None), Static latency: 0.08703966666666665, Dynamic latency: 0.087381, Difference: -0.00034133333333334626 let's do it. |
This has been merged by PR #133 |
Previously, we observed that two schedules with the same hint, but with different shared memory scopes—
shared.dyn
andshared
—exhibited different performance. Specifically,shared.dyn
consistently underperformed compared toshared
. As a result, our design has favored using static shared memory. However, the fix introduced in this commit resolved the issue by eliminating 20% of the redundant sync primitives inshared.dyn
. Consequently, their performance should now be comparable.Given this improvement, I suggest we consider converting the shared memory to
shared.dyn
to explore more tile candidates. However, it's important to benchmark the results to ensure that this change does not negatively impact performance.The text was updated successfully, but these errors were encountered: