Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] ISS-60: Implement Self Extend #431

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

jonpsy
Copy link

@jonpsy jonpsy commented Oct 18, 2024

#60

  • Single Query case (MHA and GQA)
  • Batch query case (MHA and GQA)
  • Eval strategy defined
  • Test suite written
  • Main dev done

Copy link

google-cla bot commented Oct 18, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@jonpsy
Copy link
Author

jonpsy commented Oct 18, 2024

@jan-wassenberg would need your CR here, its in alpha stage right now. Let's go back and forth on this. Thanks

@jonpsy jonpsy force-pushed the feature/ISS-60/implement-self-extend branch from 4bfe885 to 5a2a7ee Compare October 18, 2024 08:12
@jan-wassenberg
Copy link
Member

Ooh nice :) Please note that we can only take pull requests on the dev branch. That code has just changed to replace template arguments with a runtime argument. Would you mind updating/rebasing your code to that?

@jonpsy jonpsy changed the base branch from main to dev October 19, 2024 04:45
@jonpsy jonpsy force-pushed the feature/ISS-60/implement-self-extend branch from 7bb4e0b to 8cf3966 Compare October 19, 2024 06:12
@jonpsy
Copy link
Author

jonpsy commented Oct 19, 2024

My bad, let me do the needful! Thanks for the pointer though.

@jonpsy
Copy link
Author

jonpsy commented Oct 19, 2024

Haha I took so long to understand how main branch worked now I have to re-do it with this new base branch

@jonpsy
Copy link
Author

jonpsy commented Oct 19, 2024

Note to self: Was able to compile gemma dev branch by commenting out
tls.stats.Notify(stats); line on file: compress-inl.h for clang version: arm64-apple-darwin23.4.0

Had to do this because the compiler has strict check against not sending non-trivial args to variadic method. Maybe I could've turned off that feature in clang using -Wno-non-pod-varargs but it didn't work for me.

^^ This issue should be resolved now

Copy link
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the code change. We are moving toward an "all in one file" model. Thanks for rebasing!

float* HWY_RESTRICT kv = kv_cache.kv_cache.get() + kv_offset;
const float* HWY_RESTRICT mha_kv =
activations_.q.Batch(interleaved_idx) + head * q_stride_ +
layer_config_.qkv_dim;

// When embedding position, we will use grouped key position
if (self_extend && pos > ngb_size) {
pos /= grp_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be expensive. Would you like to try precomputing and passing around a const hwy::Divisor& like we do with div_seq_len, see if that helps speed?

@@ -127,6 +127,10 @@ struct LayerConfig {
size_t conv1d_width = 0;
bool ff_biases = false;
bool softmax_attn_output_biases = false;
bool self_extend = false;
size_t ngb_size = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this n-gram block? Maybe expand to block_size for more clarity? We can also move these three new fields into a section (just newline before them) with a // Self-extension comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants