Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option for continue training from checkpoint #49

Merged
merged 8 commits into from
Feb 6, 2024
Merged

Conversation

PicoCentauri
Copy link
Contributor

@PicoCentauri PicoCentauri commented Feb 5, 2024

@frostedoyster
Copy link
Collaborator

Here is the finalized continuation of training for the SOAP-BPNN.
Not only this allows to continue training with the same dataset, but it also allows to "fine-tune" a pre-trained model on a new dataset (this is tested and it works).
When continuing training with a new dataset, we add new capabilities, but we require the species to be a subset of the original species. There are a few ways to get around this in the future, and I will open an issue for it.
The composition weights are only recalculated for new targets (this should be a warning in the docs, but we can't do it before we have actual usage docs for the SOAP-BPNN). I'll open an issue for this as well.

Two considerations:

  • I'm using None as a string because that's what hydra is feeding me @PicoCentauri
  • I'm using type: ignore on the compositions weights in the SOAP-BPNN because I think the linter is getting confused by the mechanics of register_buffer

@frostedoyster frostedoyster marked this pull request as ready for review February 6, 2024 13:13
"--continue",
dest="continue_from",
type=str,
required=False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding a default=None, here changes it from a string.

Copy link
Contributor Author

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work but actually the test is not using the continue flag.

Also maybe add some more tests for the new quite complex functions you added.

Comment on lines +341 to +342
"""Add a new output to the model."""
# add a new row to the composition weights tensor
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be both in the docstring?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both?

shutil.copy(RESOURCES_PATH / "bpnn-model.pt", "bpnn-model.pt")
shutil.copy(RESOURCES_PATH / "options_continue.yaml", "options_continue.yaml")

command = ["metatensor-models", "train", "options_continue.yaml"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not using the continue flag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, added it

from metatensor.torch.atomistic import ModelCapabilities


def merge_capabilities(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a test for this function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my bad

Comment on lines 1 to 18
architecture:
name: soap_bpnn
model:
restart: bpnn-model.pt
training:
batch_size: 2
num_epochs: 1

# Section defining the parameters for structure and target data
training_set:
structures:
read_from: "qm9_reduced_100.xyz"
targets:
energy:
key: "U0"

test_set: 0.1
validation_set: 0.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather try to overwrite the options in the test instead of adding more files. Basically works like this:

    options = OmegaConf.load(RESOURCES_PATH / "options.yaml")
    options["foo"] = "bar"
    OmegaConf.save(config=options, f="options.yaml")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

@frostedoyster frostedoyster merged commit 89f5d36 into main Feb 6, 2024
7 of 8 checks passed
@frostedoyster frostedoyster deleted the restart branch February 6, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants