Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation loss not computed #16

Open
nitinnairk opened this issue Aug 12, 2019 · 5 comments
Open

Validation loss not computed #16

nitinnairk opened this issue Aug 12, 2019 · 5 comments

Comments

@nitinnairk
Copy link

Is there a reason why validation loss is not computed nor logged when the model is trained with more than one GPU?

@lopuhin
Copy link
Owner

lopuhin commented Aug 12, 2019

IIRC I didn't manage to make it work for some reason, so I think I ended up running validation from a separate process - but also I didn't get to train long enough to overfit.

@nitinnairk
Copy link
Author

Could you share that validation script?
I'm using this GPT model to train a different language altogether. Therefore, having the validation loss would be of great help!

@lopuhin
Copy link
Owner

lopuhin commented Aug 13, 2019

If you pass --only-validate option, then the validation loss would be computed - the only caveat is that you need to make sure you're not using multiple GPUs (e.g. limit to one gpu with CUDA_VISIBLE_DEVICES=0 environment variable)::

transformer-lm/lm/main.py

Lines 251 to 256 in fa3f529

if only_validate:
if world_size != 1:
print('multi-GPU validation is not supported yet')
sys.exit(1)
if is_main:
print(f'Validation loss: {get_valid_loss():.4f}')

@nitinnairk
Copy link
Author

Got it! Thanks
Should I close this issue given the actual issue of multi-GPU validation computation is still not solved?

@lopuhin
Copy link
Owner

lopuhin commented Aug 14, 2019

Let's leave it open until it's supported. Thanks for report, I hope this issue will be useful in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants