Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force #87

Open
JustinAiras opened this issue Nov 3, 2022 · 11 comments
Labels
help wanted Extra attention is needed

Comments

@JustinAiras
Copy link

JustinAiras commented Nov 3, 2022

I've been using OpenMM 7.7.0 and OpenMM-Torch 0.8 successfully to run a PyTorch model, however, when I add an RMSD biasing force to the system as well as the TorchForce, I get a segmentation fault upon creating the Context. This RMSD biasing force has also worked independently without issue. My system setup is as follows:

# Import openmm libraries
from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout

# Import OpenMM-Torch
from openmmtorch import TorchForce

# Import torch_cluster (from PyTorch-Geometric)
from torch_cluster import radius_graph

# Import struct / force fields
pdb = PDBFile('struct.pdb')
ff = ForceField('amber14-all.xml')

# Build system
system = ff.createSystem(pdb.topology, nonbondedMethod=NoCutoff, constraints=HBonds)

# Initialize the TorceForce
ml_model = TorchForce('model.pt')
scaler = 1

# Create TorchForce as a CustomCVForce
U_ml = CustomCVForce('scaler*ml_model')

# Add parameters to the CustomCVForce
U_ml.addCollectiveVariable('ml_model', ml_model)
U_ml.addGlobalParameter('scaler', scaler)

# Add force to the system
system.addForce(U_ml)

# Loading reference positions for RMSD force
ref_coords = pdb.positions

# Get atom indices of backbone heavy atoms for RMSD calculation
atom_idx = []
idx = 0
for atom in pdb.topology.atoms():
    if atom.name == 'CA':
        atom_idx.append(idx)
    if atom.name == 'C':
        atom_idx.append(idx)
    if atom.name == 'N':
        atom_idx.append(idx)
    if atom.name == 'O':
        atom_idx.append(idx)
    idx = idx + 1

# Set RMSD calculation / initialize k_rmsd / rmsd_0
rmsd = RMSDForce(ref_coords, atom_idx)
k_rmsd = 1000  # (kJ / mol / nm^2)
rmsd_0 = 0.2   # (nm)

# Create harmonic RMSD-biasing force as CustomCVForce 
U_rmsd = CustomCVForce('0.5*k_rmsd*(rmsd - rmsd_0)^2')

# Add parameters to the CustomCVForce
U_rmsd.addCollectiveVariable('rmsd', rmsd)
U_rmsd.addGlobalParameter('k_rmsd', k_rmsd)
U_rmsd.addGlobalParameter('rmsd_0', rmsd_0)

# Add force to the system
system.addForce(U_rmsd)

# Create the integrator / platform
integrator = LangevinMiddleIntegrator(340*kelvin, 1/picosecond, 0.0025*picoseconds)
platform = Platform.getPlatformByName('Reference')

# Build simulation
sim = Simulation(pdb.topology, system, integrator, platform)

As stated above, building the Context with Simulation results in a segmentation fault. I've tried implementing this in various other ways that have led to the same result. The following lists other ways of implementing these forces that I've tried:

  • Using OpenMM 8.0 Beta and OpenMM-Torch 1.0 Beta
  • Adding the TorchForce directly without using CustomCVForce system.addForce(ml_model)
  • Adding the TorchForce and RMSD force as collective variables of a single CustomCVForce U_rmsd_ml = CustomCVForce('scaler*ml_model + 0.5*k_rmsd*(rmsd - rmsd_0)^2')
  • Effectively turning off the TorchForce by setting scaler = 0
  • Building the Context without using Simulation context = Context(system, integrator, platform)
  • Using the CPU platform
  • Switching the order in which I add the forces

All of this results in the same segmentation fault when the Context is built. Again, the model will run without issue when added independently to the system, as will the RMSD-biasing force. Any help with this issue would be greatly appreciated!

The files struct.pdb and model.pt can be found in the following zipped folder: struct_model.zip

@raimis raimis added the help wanted Extra attention is needed label Nov 3, 2022
@raimis
Copy link
Contributor

raimis commented Nov 3, 2022

Could you share struct.pdb and a script to generate model.pt. So, it is possible to reproduce the issue.

@raimis
Copy link
Contributor

raimis commented Nov 3, 2022

Also, could you add the imports to the script? So it is possible to run it.

@JustinAiras
Copy link
Author

I've edited my original post to include the imports and the files struct.pdb and model.pt.

@peastman
Copy link
Member

peastman commented Nov 7, 2022

Your script runs fine for me using the latest code for OpenMM and for this plugin. I notice your model uses the torch_cluster package. How did you install it? Possibly it was compiled in a way that's incompatible with this plugin. Can you post the output of conda list?

Try running your script inside gdb. Let it run until it hits the segfault, then type bt to get a stack trace for where it happened and post it here.

@JustinAiras
Copy link
Author

I installed torch_cluster into a clean conda environment with OpenMM 8.0 beta and OpenMM-Torch 1.0 beta as follows:

conda create -n torch_omm8b openmm openmm-torch -c "conda-forge/label/openmm_rc" -c "conda-forge/label/openmm-torch_rc"

conda install scipy
conda install mdtraj -c conda-forge

pip install torch-cluster -f https://data.pyg.org/whl/torch-1.11.0+cu112.html

The following text file contains the output from conda list:
conda_list_omm8b_env.txt

and the following text file contains the backtrace from running my script in gdb:
gdb_bt_omm8b_env.txt

@peastman
Copy link
Member

peastman commented Nov 7, 2022

That build is likely incompatible with packages from conda-forge. Try installing it like this instead.

conda install -c conda-forge pytorch_cluster

@raimis
Copy link
Contributor

raimis commented Nov 9, 2022

I have created the environment:

conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge pytorch_cluster

The scirt works with problem.

@JustinAiras try to create a new environment as indicated with the latest (22.9.0) conda.

@JustinAiras
Copy link
Author

I've run the exact set of commands you've provided using conda 22.9.0, but after from torch_cluster import radius_graph I get the following error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/__init__.py", line 18, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library
    ctypes.CDLL(path)
  File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/_grid_cuda.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowEllb

@raimis
Copy link
Contributor

raimis commented Nov 10, 2022

@JustinAiras this might be a conda issue (#88 (comment)). Could you try to install with mamba?

@JustinAiras
Copy link
Author

Thank you, installing with mamba solved my most immediate issue, and I now can run MD with a TorchForce and RMSD-biasing force without encountering a segmentation fault.

I installed mamba into the base environment of a clean miniconda install, and created a new environment as follows:

mamba create -n torch_omm8b openmm openmm-torch pytorch_cluster -c "conda-forge/label/openmm_rc" -c "conda-forge/label/openmm-torch_rc" -c conda-forge

Note that this also worked with a mambaforge installation, but differences in cluster permissions required me to use miniconda. Also note that pytorch_cluster needs to be installed at the same time as openmm-torch as I get the following error if doing otherwise:

- nothing provides __cuda needed by pytorch-1.12.1-cuda102py310ha664643_201

For my purposes (I only need to use the CPU platform), installing with the above command resolves my issue. However, I still get issues if I try to use the CUDA platform. Upon building the simulation, I get the following error:

  File "/home/gridsan/jairas/work/small_prot_MD/chignolin/MD/torch_md/best_model/umbrella/rmsd_bias/GPU/torch_umb.py", line 79, in <module>
    sim = Simulation(pdb.topology, system, integrator, platform)
  File "/home/gridsan/jairas/miniconda3/envs/torch_omm8b/lib/python3.9/site-packages/openmm/app/simulation.py", line 101, in __init__
    self.context = mm.Context(self.system, self.integrator, platform)
  File "/home/gridsan/jairas/miniconda3/envs/torch_omm8b/lib/python3.9/site-packages/openmm/openmm.py", line 3530, in __init__
    _openmm.Context_swiginit(self, _openmm.new_Context(*args))
openmm.OpenMMException: Error loading CUDA module: CUDA_ERROR_UNSUPPORTED_PTX_VERSION (222)

Given similarities to how CUDA is installed on the cluster I use and those discussed in issue #88 (comment), I suspect the solution to this problem might lie somewhere there.

@sef43
Copy link

sef43 commented Nov 15, 2022

This sounds like an issue with the CUDA toolkit version, see this issue from OpenMM: 3585
You will need to find out what drivers and CUDA version are installed on the cluster you are using, probably by running nvidia-smi on a compute node.
And then tell conda to install a compatible cudatoolkit.
e.g. mamba install -c conda-forge openmm cudatoolkit=10.X

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants