Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use LAMMPS to train a larger system? #476

Open
stargolike opened this issue Jun 20, 2024 Discussed in #475 · 6 comments
Open

How to use LAMMPS to train a larger system? #476

stargolike opened this issue Jun 20, 2024 Discussed in #475 · 6 comments
Labels

Comments

@stargolike
Copy link

Discussed in #475

Originally posted by stargolike June 20, 2024
I used a system of 200 atoms for training, and selected hidden irreps: '64x0e+64x1o'.
But when I want to use LAMMPS for MD simulation, I can only run a system with 1k atoms. If I use a larger system, there will be a memory out error, such as

RuntimeError: CUDA out of memory. Tried to allocate 2.36 GiB. GPU 0 has a total capacity of 47.45 GiB of which 1.43 GiB is free. Including non-PyTorch memory, this process has 46.01 GiB memory in use. Of the allocated memory 37.87 GiB is allocated by PyTorch, and 7.86 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

The graphics card I am currently using is the RTX8000, which can be used for training.
I would like to ask what this memory is related to and how can I have a larger system for MD

@stargolike
Copy link
Author

And I also found a problem that for the same system and model, ase can run, while lammps is not available.

@ilyes319
Copy link
Contributor

ilyes319 commented Jun 21, 2024

If you want to fit a larger number of atoms on a GPU, you should try to decrease your cutoff. What is your cutoff size?

@stargolike
Copy link
Author

stargolike commented Jun 21, 2024

If you want to fit a larger number of atoms on a GPU, you should try to decrease your cutoff. What is your cutoff size?

dear ilyes,I am using the default cutoff. and it's my lammps inputfile

#------------------------------Basic settings--------------------------
units         metal
atom_style    atomic
atom_modify   map yes
newton        on
read_data     1920atom_7.5m
pair_style mace
pair_coeff * * MACE_model_run-123.model-lammps.pt H O Cl Zn

dump 1 all custom 100 toEquil.lammpstrj id type x y z vx vy vz
thermo 1
run 1000

i understand your meaning,i should change pair_style mace and add the cutoff.

@ilyes319
Copy link
Contributor

I meant during training, you should try to use a smaller cutoff.

@stargolike
Copy link
Author

I meant during training, you should try to use a smaller cutoff.

sorry,i can't understand. it's my config. and it's not some parameters about cutoff

name: MACE_model
config_type_weights: {"Default":1.0}
model: "MACE"
hidden_irreps: '64x0e + 64x1o'
r_max: 4.0
train_file: train.xyz
test_file: test.xyz
valid_file: val.xyz
batch_size: 10
energy_key: "energy"
forces_key: "forces"
ema: yes
ema_decay: 0.99 
amsgrad: yes
restart_latest: yes
max_num_epochs: 100
device: cuda 
loss: "huber"

@CheukHinHoJerry
Copy link
Collaborator

CheukHinHoJerry commented Jul 12, 2024

What ilyes meant was when you train your mace model you should use a smaller cutoff. The r_max parameter is the cutoff of the mace model. Though it is already quite small that I am not sure you can further decrease it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants