Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in DDQN #22

Open
xiboli opened this issue Nov 7, 2023 · 2 comments
Open

Memory Leak in DDQN #22

xiboli opened this issue Nov 7, 2023 · 2 comments

Comments

@xiboli
Copy link

xiboli commented Nov 7, 2023

Thank you so much that you program the Double DQN algorithm. However when I run this algorithm I faced a memory increase consistantly when trainning. Do you have any idea where the memory leak could happen?

https://github.com/ChuaCheowHuan/reinforcement_learning/blob/master/DQN_variants/DDQN/double_dqn_cartpole.py#L339

@xiboli
Copy link
Author

xiboli commented Nov 9, 2023

I have found that the huber_loss with GradientDescentOptimizer cause the memory leak, and when I changed to reduce mean with RMSPropOptimizer it disappears. Can you explain why you use the huber loss with gradient descent optimizer? Thank you so much.

       with tf.variable_scope('loss'):
            self.loss = tf.reduce_mean(tf.squared_difference(td_target, predicted_Q_val))  # tf.losses.huber_loss(td_target, predicted_Q_val)
        with tf.variable_scope('optimizer'):
            self.optimizer = tf.train.RMSPropOptimizer(self.learning_rate).minimize(self.loss) #tf.train.GradientDescentOptimizer(self.learning_rate).minimize(self.loss)

@bcnichols
Copy link

Thanks for your observation--I wasn't aware that the "leak" was associated with the Huber loss function and sadly don't know why this should be, but will make a note to check it out once things here subside to a dull roar, so to speak.

Until we can evaluate the impact of a change of loss function production code at the moment is avoiding batch inputs with model.fit(), instead fitting in a loop and saving/clearing/reloading the model periodically, which stopgap manages to prevent memory (64 GBytes) being completely consumed before convergence obtains.

If it's of any interest the restart algo is triggered by the following command placed at a convenient spot in the model.fit() loop:

agent.save_restart(repeat,idx)

where "agent" is a class instance containing the model and its methods, as follows:

  def save_restart(self, repeat, idx):
    self.last_model = f'{self.model_path}/{self.model_name}_{repeat}_{idx}'
    self.model.save(self.last_model)
    tf.keras.backend.clear_session()
    self.load_last_model()

  def load_last_model(self):
    model = load_model(self.last_model, custom_objects = self.custom_objects, compile=False)
    model.compile(optimizer = self.optimizer, loss = self.loss())

  def save(self, repeat, idx):
    self.last_model = f'{self.model_path}/{self.model_name}_{repeat}_{idx}'
    self.model.save(self.last_model)


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants