You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I'm trying to train whisper fine-tune with multi-gpu
and I don't know what RANK to set
I just set WORLD_SIZE is numer of gpu and MASTER_ADDR is localhost, MASTER_PORT is idle port
When WORLD_SIZE is more than 2 and RANK is set 0, training is hanging
Probably it hanged in setting torch.distributed.TCPStore() part..
anyone who solved this problem?
let me know hint please
The text was updated successfully, but these errors were encountered:
Hi I'm trying to train whisper fine-tune with multi-gpu
and I don't know what
RANK
to setI just set
WORLD_SIZE
is numer of gpu andMASTER_ADDR
is localhost,MASTER_PORT
is idle portWhen
WORLD_SIZE
is more than 2 andRANK
is set 0, training is hangingProbably it hanged in setting torch.distributed.TCPStore() part..
anyone who solved this problem?
let me know hint please
The text was updated successfully, but these errors were encountered: