Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phi3 MoE cuda kernel #21819

Merged
merged 10 commits into from
Aug 27, 2024
Merged

Phi3 MoE cuda kernel #21819

merged 10 commits into from
Aug 27, 2024

Conversation

wangyems
Copy link
Contributor

Description

Motivation and Context

docs/ContribOperators.md Outdated Show resolved Hide resolved
@wangyems wangyems marked this pull request as draft August 22, 2024 20:10
@wangyems wangyems marked this pull request as ready for review August 22, 2024 20:26
original gemm size causes out-of-SMEM for grouped gemm with Windows GPU pipeline
@tianleiwu
Copy link
Contributor

tianleiwu commented Aug 23, 2024

Test failed in A10:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1472103&view=logs&j=6df8fe70-7b8f-505a-8ef0-8bf93da2bac7&t=4f6ef737-111d-50d1-a46b-5f86d9a970bc&l=27022
ort_fastertransformer::generic_moe_gemm_kernelLauncher occupancy > 0 was false. GPU lacks the shared memory resources to run GroupedGEMM kernel

Maybe tune parameters for GroupedGEMM for different device?

@wangyems wangyems merged commit 1d059b8 into main Aug 27, 2024
93 checks passed
@wangyems wangyems deleted the wangye/phi3_moe branch August 27, 2024 16:21
prathikr pushed a commit that referenced this pull request Aug 27, 2024
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Your Name <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants