Code for projecting pre-trained BERT weights into Monarch matrices #3

sinamps · 2023-10-03T17:31:00Z

Hello, I would like to know if you have published the code to project the pre-trained weights of the BERT model into Monarch matrices. I cannot locate the code for this (I have also looked in the fly repo).
I can see the projection functions here, but I am interested in knowing how you use them specifically for BERT (or other transformers for NLP) to go from pre-trained weights to Monarch matrices. Thank you very much.

DanFu09 · 2023-10-03T17:58:12Z

Ah, we don't actually use those in our work - that file was just copy-pasted from the fly repo. In M2 we're training everything from scratch, since the gated convolutional layers are quite different in function from an attention layer. It would be interesting to figure out how to distill an attention layer into a gated convolution!

sinamps · 2023-10-03T18:52:13Z

Thank you for your prompt response @DanFu09. Would you happen to have any pointers on how that was done in the fly work? I am already working with those projection functions from the fly repo, but I want to make sure I correctly reproduce the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code for projecting pre-trained BERT weights into Monarch matrices #3

Code for projecting pre-trained BERT weights into Monarch matrices #3

sinamps commented Oct 3, 2023

DanFu09 commented Oct 3, 2023

sinamps commented Oct 3, 2023 •

edited

Loading

Code for projecting pre-trained BERT weights into Monarch matrices #3

Code for projecting pre-trained BERT weights into Monarch matrices #3

Comments

sinamps commented Oct 3, 2023

DanFu09 commented Oct 3, 2023

sinamps commented Oct 3, 2023 • edited Loading

sinamps commented Oct 3, 2023 •

edited

Loading