Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force #87

JustinAiras · 2022-11-03T14:40:34Z

I've been using OpenMM 7.7.0 and OpenMM-Torch 0.8 successfully to run a PyTorch model, however, when I add an RMSD biasing force to the system as well as the TorchForce, I get a segmentation fault upon creating the Context. This RMSD biasing force has also worked independently without issue. My system setup is as follows:

# Import openmm libraries
from openmm.app import *
from openmm import *
from openmm.unit import *
from sys import stdout

# Import OpenMM-Torch
from openmmtorch import TorchForce

# Import torch_cluster (from PyTorch-Geometric)
from torch_cluster import radius_graph

# Import struct / force fields
pdb = PDBFile('struct.pdb')
ff = ForceField('amber14-all.xml')

# Build system
system = ff.createSystem(pdb.topology, nonbondedMethod=NoCutoff, constraints=HBonds)

# Initialize the TorceForce
ml_model = TorchForce('model.pt')
scaler = 1

# Create TorchForce as a CustomCVForce
U_ml = CustomCVForce('scaler*ml_model')

# Add parameters to the CustomCVForce
U_ml.addCollectiveVariable('ml_model', ml_model)
U_ml.addGlobalParameter('scaler', scaler)

# Add force to the system
system.addForce(U_ml)

# Loading reference positions for RMSD force
ref_coords = pdb.positions

# Get atom indices of backbone heavy atoms for RMSD calculation
atom_idx = []
idx = 0
for atom in pdb.topology.atoms():
    if atom.name == 'CA':
        atom_idx.append(idx)
    if atom.name == 'C':
        atom_idx.append(idx)
    if atom.name == 'N':
        atom_idx.append(idx)
    if atom.name == 'O':
        atom_idx.append(idx)
    idx = idx + 1

# Set RMSD calculation / initialize k_rmsd / rmsd_0
rmsd = RMSDForce(ref_coords, atom_idx)
k_rmsd = 1000  # (kJ / mol / nm^2)
rmsd_0 = 0.2   # (nm)

# Create harmonic RMSD-biasing force as CustomCVForce 
U_rmsd = CustomCVForce('0.5*k_rmsd*(rmsd - rmsd_0)^2')

# Add parameters to the CustomCVForce
U_rmsd.addCollectiveVariable('rmsd', rmsd)
U_rmsd.addGlobalParameter('k_rmsd', k_rmsd)
U_rmsd.addGlobalParameter('rmsd_0', rmsd_0)

# Add force to the system
system.addForce(U_rmsd)

# Create the integrator / platform
integrator = LangevinMiddleIntegrator(340*kelvin, 1/picosecond, 0.0025*picoseconds)
platform = Platform.getPlatformByName('Reference')

# Build simulation
sim = Simulation(pdb.topology, system, integrator, platform)

As stated above, building the Context with Simulation results in a segmentation fault. I've tried implementing this in various other ways that have led to the same result. The following lists other ways of implementing these forces that I've tried:

Using OpenMM 8.0 Beta and OpenMM-Torch 1.0 Beta
Adding the TorchForce directly without using CustomCVForce system.addForce(ml_model)
Adding the TorchForce and RMSD force as collective variables of a single CustomCVForce U_rmsd_ml = CustomCVForce('scaler*ml_model + 0.5*k_rmsd*(rmsd - rmsd_0)^2')
Effectively turning off the TorchForce by setting scaler = 0
Building the Context without using Simulation context = Context(system, integrator, platform)
Using the CPU platform
Switching the order in which I add the forces

All of this results in the same segmentation fault when the Context is built. Again, the model will run without issue when added independently to the system, as will the RMSD-biasing force. Any help with this issue would be greatly appreciated!

The files struct.pdb and model.pt can be found in the following zipped folder: struct_model.zip

The text was updated successfully, but these errors were encountered:

raimis · 2022-11-03T17:03:34Z

Could you share struct.pdb and a script to generate model.pt. So, it is possible to reproduce the issue.

raimis · 2022-11-03T17:13:30Z

Also, could you add the imports to the script? So it is possible to run it.

JustinAiras · 2022-11-03T18:12:33Z

I've edited my original post to include the imports and the files struct.pdb and model.pt.

peastman · 2022-11-07T20:36:26Z

Your script runs fine for me using the latest code for OpenMM and for this plugin. I notice your model uses the torch_cluster package. How did you install it? Possibly it was compiled in a way that's incompatible with this plugin. Can you post the output of conda list?

Try running your script inside gdb. Let it run until it hits the segfault, then type bt to get a stack trace for where it happened and post it here.

JustinAiras · 2022-11-07T22:24:51Z

I installed torch_cluster into a clean conda environment with OpenMM 8.0 beta and OpenMM-Torch 1.0 beta as follows:

conda create -n torch_omm8b openmm openmm-torch -c "conda-forge/label/openmm_rc" -c "conda-forge/label/openmm-torch_rc"

conda install scipy
conda install mdtraj -c conda-forge

pip install torch-cluster -f https://data.pyg.org/whl/torch-1.11.0+cu112.html

The following text file contains the output from conda list:
conda_list_omm8b_env.txt

and the following text file contains the backtrace from running my script in gdb:
gdb_bt_omm8b_env.txt

peastman · 2022-11-07T22:32:41Z

That build is likely incompatible with packages from conda-forge. Try installing it like this instead.

conda install -c conda-forge pytorch_cluster

raimis · 2022-11-09T11:06:59Z

I have created the environment:

conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge pytorch_cluster

The scirt works with problem.

@JustinAiras try to create a new environment as indicated with the latest (22.9.0) conda.

JustinAiras · 2022-11-09T18:56:47Z

I've run the exact set of commands you've provided using conda 22.9.0, but after from torch_cluster import radius_graph I get the following error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/__init__.py", line 18, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library
    ctypes.CDLL(path)
  File "/home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/airasj/anaconda3/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/_grid_cuda.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowEllb

raimis · 2022-11-10T15:49:14Z

@JustinAiras this might be a conda issue (#88 (comment)). Could you try to install with mamba?

JustinAiras · 2022-11-14T16:51:47Z

Thank you, installing with mamba solved my most immediate issue, and I now can run MD with a TorchForce and RMSD-biasing force without encountering a segmentation fault.

I installed mamba into the base environment of a clean miniconda install, and created a new environment as follows:

mamba create -n torch_omm8b openmm openmm-torch pytorch_cluster -c "conda-forge/label/openmm_rc" -c "conda-forge/label/openmm-torch_rc" -c conda-forge

Note that this also worked with a mambaforge installation, but differences in cluster permissions required me to use miniconda. Also note that pytorch_cluster needs to be installed at the same time as openmm-torch as I get the following error if doing otherwise:

- nothing provides __cuda needed by pytorch-1.12.1-cuda102py310ha664643_201

For my purposes (I only need to use the CPU platform), installing with the above command resolves my issue. However, I still get issues if I try to use the CUDA platform. Upon building the simulation, I get the following error:

  File "/home/gridsan/jairas/work/small_prot_MD/chignolin/MD/torch_md/best_model/umbrella/rmsd_bias/GPU/torch_umb.py", line 79, in <module>
    sim = Simulation(pdb.topology, system, integrator, platform)
  File "/home/gridsan/jairas/miniconda3/envs/torch_omm8b/lib/python3.9/site-packages/openmm/app/simulation.py", line 101, in __init__
    self.context = mm.Context(self.system, self.integrator, platform)
  File "/home/gridsan/jairas/miniconda3/envs/torch_omm8b/lib/python3.9/site-packages/openmm/openmm.py", line 3530, in __init__
    _openmm.Context_swiginit(self, _openmm.new_Context(*args))
openmm.OpenMMException: Error loading CUDA module: CUDA_ERROR_UNSUPPORTED_PTX_VERSION (222)

Given similarities to how CUDA is installed on the cluster I use and those discussed in issue #88 (comment), I suspect the solution to this problem might lie somewhere there.

sef43 · 2022-11-15T10:00:53Z

This sounds like an issue with the CUDA toolkit version, see this issue from OpenMM: 3585
You will need to find out what drivers and CUDA version are installed on the cluster you are using, probably by running nvidia-smi on a compute node.
And then tell conda to install a compatible cudatoolkit.
e.g. mamba install -c conda-forge openmm cudatoolkit=10.X

raimis added the help wanted Extra attention is needed label Nov 3, 2022

FranklinHu1 mentioned this issue Nov 7, 2022

Segmentation fault when creating simulation context with simple TorchForce force #88

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force #87

Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force #87

JustinAiras commented Nov 3, 2022 •

edited

Loading

raimis commented Nov 3, 2022

raimis commented Nov 3, 2022

JustinAiras commented Nov 3, 2022

peastman commented Nov 7, 2022

JustinAiras commented Nov 7, 2022

peastman commented Nov 7, 2022

raimis commented Nov 9, 2022

JustinAiras commented Nov 9, 2022

raimis commented Nov 10, 2022

JustinAiras commented Nov 14, 2022

sef43 commented Nov 15, 2022 •

edited

Loading

Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force #87

Segmentation fault upon creating the Context when adding both RMSD biased force and Torch Force #87

Comments

JustinAiras commented Nov 3, 2022 • edited Loading

raimis commented Nov 3, 2022

raimis commented Nov 3, 2022

JustinAiras commented Nov 3, 2022

peastman commented Nov 7, 2022

JustinAiras commented Nov 7, 2022

peastman commented Nov 7, 2022

raimis commented Nov 9, 2022

JustinAiras commented Nov 9, 2022

raimis commented Nov 10, 2022

JustinAiras commented Nov 14, 2022

sef43 commented Nov 15, 2022 • edited Loading

JustinAiras commented Nov 3, 2022 •

edited

Loading

sef43 commented Nov 15, 2022 •

edited

Loading