Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected Suspension occurs while training #6

Open
cloud-tifa opened this issue Aug 6, 2020 · 0 comments
Open

Unexpected Suspension occurs while training #6

cloud-tifa opened this issue Aug 6, 2020 · 0 comments

Comments

@cloud-tifa
Copy link

cloud-tifa commented Aug 6, 2020

Hi Guillaume,

First of all, thanks for the enlightening work on PRIMAL.

I cloned the code and attempted to train a new model with *.py file transformed by *.ipynb. Model inference works fine, so I proceed to attempting to train my own model.This is followed by installation of all dependencies and compilation of cpp_mstar. The code does work but it will be suspended with GPU and memory occupied but CPU not occupied. The training program didn't report any error even exception. This problem happens almost every training after a random number of episodes.

What I have modified is:

import keras.backend.tensorflow_backend as KTF
 
config = tf.ConfigProto()
config.gpu_options.allow_growth = True 
config.gpu_options.per_process_gpu_memory_fraction = 0.8 
sess = tf.Session(config=config)
KTF.set_session(sess)  
EXPERIENCE_BUFFER_SIZE = 64 #default is 128 
NUM_META_AGENTS        = 2 #default is 3

I have already created a conda environment for PRIMAL with cuda=10.0, cudnn=7.6.5, tensorflow-gpu=1.14 .

It would be great if you can provide some assistance to tackle the issues.

Best Wishes,
Hongjun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant