Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support newer versions of mistral (e.g. mistralai/Mistral-7B-Instruct-v0.2)? #41

Open
spring1915 opened this issue Jan 10, 2024 · 2 comments

Comments

@spring1915
Copy link

No description provided.

@tomaarsen
Copy link
Owner

Hello!

mistralai/Mistral-7B-Instruct-v0.2 should be supported in the same way that Mistral-7B-v0.1 is :)

Also, consider using the new Attention Sinks implementation in transformers directly: The SinkCache. See how to use it here:
https://colab.research.google.com/drive/1S0oIPaqxAVp0oWEwTadhZXDjhWiTyF12?usp=sharing

You should be able to replace HuggingFaceH4/zephyr-7b-beta with mistralai/Mistral-7B-Instruct-v0.2.

  • Tom Aarsen

@spring1915
Copy link
Author

spring1915 commented Jan 10, 2024

Great! Thanks @tomaarsen for sharing.

I have another question as you're an expert in the field. I used the standard streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) in mode.generate() for llama2-7b in Colab, and the model streaming worked well but often stopped in the process or when I ran a second request. I also encountered this issue when inferring with the model on AWS large instances (ml.g5.48x) with DeepSpeed. Can you give me a hint into the causes? I googled but haven't found a satisfactory answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants