Implement attention sliding window for Mistral #341

jonatanklosko · 2024-02-21T16:21:09Z

Attention sliding window means that each token "attends to" N prior and N later tokens.

jonatanklosko · 2024-02-21T16:22:15Z

test/bumblebee/text/mistral_test.exs

+    # TODO test once we know the expected behaviour
+    # spec = Bumblebee.configure(spec, attention_window_size: 2)
+    spec = Bumblebee.configure(spec, attention_window_size: 1)


There are two conflicting interpretations of the size. I am waiting for a resolution upstream (huggingface/transformers#29176).

Implement attention sliding window for Mistral

23700cf

jonatanklosko commented Feb 21, 2024

View reviewed changes

jonatanklosko added 2 commits February 23, 2024 16:54

Merge branch 'main' into jk-attention-sliding-window

d8a26c9

Up

31398ee

jonatanklosko merged commit a6019fd into main Feb 23, 2024
2 checks passed

jonatanklosko deleted the jk-attention-sliding-window branch February 23, 2024 10:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement attention sliding window for Mistral #341

Implement attention sliding window for Mistral #341

jonatanklosko commented Feb 21, 2024

jonatanklosko Feb 21, 2024

Implement attention sliding window for Mistral #341

Implement attention sliding window for Mistral #341

Conversation

jonatanklosko commented Feb 21, 2024

jonatanklosko Feb 21, 2024

Choose a reason for hiding this comment