You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I would like to know if you have published the code to project the pre-trained weights of the BERT model into Monarch matrices. I cannot locate the code for this (I have also looked in the fly repo).
I can see the projection functions here, but I am interested in knowing how you use them specifically for BERT (or other transformers for NLP) to go from pre-trained weights to Monarch matrices. Thank you very much.
The text was updated successfully, but these errors were encountered:
Ah, we don't actually use those in our work - that file was just copy-pasted from the fly repo. In M2 we're training everything from scratch, since the gated convolutional layers are quite different in function from an attention layer. It would be interesting to figure out how to distill an attention layer into a gated convolution!
Thank you for your prompt response @DanFu09. Would you happen to have any pointers on how that was done in the fly work? I am already working with those projection functions from the fly repo, but I want to make sure I correctly reproduce the results.
Hello, I would like to know if you have published the code to project the pre-trained weights of the BERT model into Monarch matrices. I cannot locate the code for this (I have also looked in the fly repo).
I can see the projection functions here, but I am interested in knowing how you use them specifically for BERT (or other transformers for NLP) to go from pre-trained weights to Monarch matrices. Thank you very much.
The text was updated successfully, but these errors were encountered: