Long Context Models - Possible to split the ctx memory across GPUs? #1639

Alumniminium · 2023-12-14T15:40:07Z

Alumniminium
Dec 14, 2023

Hey, what's the approach here? I just got myself a 2nd RTX 3090 with 24GB VRAM to use a 7B 64k model, and still get OOMs

What's the proper way to invoke llama? I tried the tensor split at a 1,1 ratio - i have no idea if that's right