Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint dirname argument can also be a callable #870

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BenjaminBossan
Copy link
Collaborator

Solves #848

Description

For Checkpoint, as of now, dirname can only be a string. With this
update, it can also be a callable with no arguments that returns a string.

What this solves is that the directory that a model is saved in can now
contain a dynamic element. This way, if you run, e.g., grid search with
n_jobs>1 + checkpoint, each checkpoint instance can have its own
directory name (e.g. using a function that returns a random name), while
the files inside the directory still follow the same naming.

Without such a possibility, if a user runs grid search with n_jobs>1 and
checkpoint with load_best=True, the loaded model would always be
whatever happens to be the latest one stored, which can result
in (silent) errors.

If anyone has a better idea how to solve the underlying problem, I'm
open to it.

Implementation

As a consequence of the dirname now not being known at __init__ time, I
removed the validation of the filenames from there. We still validate
them inside initialize, which is sufficient in my opinion.

In theory, we could call the dirname function inside __init__ to
validate it, and then call it again inside initialize to actually set
it, but I don't like that. The reason is that we would call a
function that is possible non-deterministic or might have side effects
twice, with unknown consequences. This should be avoided if possible.

Example

Before, code like this would fail:

params = {
    'module__num_units': [10, 20, 30],
}

net = NeuralNetClassifier(
    ClassifierModule,
    callbacks=[Checkpoint(f_history=None, load_best=True)]
)
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', n_jobs=3)
gs.fit(X, y)

# error message:
# RuntimeError: Error(s) in loading state_dict for ClassifierModule:
#	size mismatch for dense0.weight: copying a param with shape torch.Size([20, 20]) from checkpoint, the shape in current model is torch.Size([10, 20]).

With this feature, we can do:

import random
def make_dirname():
    return f"test-dir-{random.randint(0, 1_000_000)}"

net = NeuralNetClassifier(
    ClassifierModule,
    callbacks=[Checkpoint(f_history=None, load_best=True, dirname=make_dirname)]
)
gs = GridSearchCV(net, params, refit=False, cv=3, scoring='accuracy', n_jobs=3)
gs.fit(X, y)

and it works.

Solves #848

Description

As of now, dirname can only be a string. With this update, it can also
be a callable with no arguments that returns a string.

What this solves is that the directory that a model is saved in can now
contain a dynamic element. This way, if you run, e.g., grid search with
n_jobs>1 + checkpoint, each checkpoint instance can have its own
directory name (e.g. using a function that retuns a random name), while
the files inside the directory still follow the same naming.

Without such a possibility, if a user runs grid search with n_jobs>1 and
checkpoint with load_best=True, the loaded model would always be
whatever happens to be the latest one stored, which can result
in (silent) errors.

Implementation

As a consequence of the dirname now not being known at __init__ time, I
removed the validation of the filenames from there. We still validate
them inside initialize(), which is sufficient in my opinion.

In theory, we could call the dirname function inside __init__ to
validate it, and then call it again inside initialize to actually set
it, but I don't like that. The reason is that we would call a
function that is possible non-deterministic or might have side effects
twice, with unknown consequences. This should be avoided if possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant