Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix mha for in the case that present kv is not consumed #21777

Closed
wants to merge 5 commits into from

Conversation

guschmue
Copy link
Contributor

No description provided.

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Aug 16, 2024
@guschmue guschmue marked this pull request as ready for review August 16, 2024 20:07
// since there is no buffer for it.
// We check by requesting the output and if not there we'll adjust context.outputCount
const presentKeyShape = [
parameters.batchSize,
Copy link
Contributor

@tianleiwu tianleiwu Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shape only works for MHA and GQA.
Attention output 1 shape is [2, B, N, T, H] instead of [B, N, T, H], since it concatenates present_key and present_value as present output.

I think here need extra code like

if (attention op) { // can we get operator name from context? Maybe we can use context.outputCount === 2 since MHA and GQA has 3 outputs if present_key are needed.
    // insert 2 at the beginning of present shape.
}

Copy link
Contributor

@tianleiwu tianleiwu Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need consider another special case for GQA that past and present shares buffers. In that case, the length is max sequence length.

@guschmue
Copy link
Contributor Author

with #21782 this one is no longer needed.

@guschmue guschmue closed this Aug 19, 2024
@guschmue guschmue deleted the gs/fix-unconsumed-mha-outputs branch September 12, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants