Long Context Models - Possible to split the ctx memory across GPUs? #1639
Unanswered
Alumniminium
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey, what's the approach here? I just got myself a 2nd RTX 3090 with 24GB VRAM to use a 7B 64k model, and still get OOMs
What's the proper way to invoke llama? I tried the tensor split at a 1,1 ratio - i have no idea if that's right
Beta Was this translation helpful? Give feedback.
All reactions