You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have another question as you're an expert in the field. I used the standard streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True) in mode.generate() for llama2-7b in Colab, and the model streaming worked well but often stopped in the process or when I ran a second request. I also encountered this issue when inferring with the model on AWS large instances (ml.g5.48x) with DeepSpeed. Can you give me a hint into the causes? I googled but haven't found a satisfactory answer.
No description provided.
The text was updated successfully, but these errors were encountered: