run_multimer_jobs issue #342

J-Held · 2024-05-20T20:28:26Z

I am trying to run the run_multimer_jobs script on GPU using this command:

run_multimer_jobs.py
--mode=all_vs_all
--num_cycle=3
--num_predictions_per_model=1
--output_path=/storage/home/jbh249/scratch/output/models/
--data_dir=/storage/home/jbh249/scratch/alphaDatabase/ \
--protein_lists=/storage/home/jbh249/scratch/candidates.txt
--monomer_objects_dir=/storage/home/jbh249/scratch/output/features

The job terminates almost immediately with this error:

/storage/home/jbh249/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData.
warnings.warn(
2024-05-20 16:09:31.137214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-20 16:09:35.260966: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I0520 16:09:35.261087 23134891243328 utils.py:271] checking if output_dir exists /storage/home/jbh249/scratch/output/models/
Traceback (most recent call last):
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 462, in
app.run(main)
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 437, in main
all_proteins = read_all_proteins(FLAGS.protein_lists[0])
TypeError: 'NoneType' object is not subscriptable

Qrouger · 2024-05-21T08:23:58Z

Hi @J-Held, the first part of your errors says that can't use GPU cause you have a problem with your TensorRT. But the script don't crash cause of that, but probably cause of yours command. Take care of your backslash and personally I prefer write the command in line with one space to avoid writing errors. Like this :
run_multimer_jobs.py --mode=all_vs_all --num_cycle=3 --num_predictions_per_model=1 --output_path=/storage/home/jbh249/scratch/output/models/ --data_dir=/storage/home/jbh249/scratch/alphaDatabase/ --protein_lists=/storage/home/jbh249/scratch/candidates.txt --monomer_objects_dir=/storage/home/jbh249/scratch/output/features

Quentin

dingquanyu · 2024-05-21T08:42:27Z

Hi @J-Held

I agree with @Qrouger 's suggestion. It's likely that your command is not correctly formatted so that protein_lists wasn't parsed correctly. What you wrote after the \ is not parsed at all.

Yours
Dingquan

J-Held · 2024-05-21T19:52:06Z

Yes, that was it. Thank you @Qrouger and @dingquanyu!

Regarding the GPU, it looks like I'm getting many of the error messages brought up in #339, but the job appears to still be running. Is it just going to time out? Output log below:

I0521 10:54:40.655257 22582644975424 run_multimer_jobs.py:389] Modeling new interaction for /storage/home/jbh249/scratch/output/models/HrpN_and_WAK3
I0521 10:54:41.184001 22582644975424 xla_bridge.py:660] Unable to initialize backend 'cuda': Unable to load cuDNN. Is it installed?
I0521 10:54:41.203725 22582644975424 xla_bridge.py:660] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
I0521 10:54:41.204897 22582644975424 xla_bridge.py:660] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
W0521 10:54:41.205006 22582644975424 xla_bridge.py:724] CUDA backend failed to initialize: Unable to load cuDNN. Is it installed? (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0521 10:54:43.223712 22582644975424 utils.py:378] Model model_1_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:44.160407 22582644975424 utils.py:378] Model model_2_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:45.103848 22582644975424 utils.py:378] Model model_3_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.035488 22582644975424 utils.py:378] Model model_4_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.962665 22582644975424 utils.py:378] Model model_5_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.962839 22582644975424 utils.py:384] Using random seed 1682205902281770834 for the data pipeline
I0521 10:54:47.012253 22582644975424 run_multimer_jobs.py:323] now running prediction on HrpN_and_WAK3
I0521 10:54:47.012355 22582644975424 run_multimer_jobs.py:324] output_path is /storage/home/jbh249/scratch/output/models/HrpN_and_WAK3
I0521 10:54:47.012434 22582644975424 predict_structure.py:125] Checking for existing results
I0521 10:54:47.012791 22582644975424 predict_structure.py:139] Running model model_1_multimer_v3_pred_0 on HrpN_and_WAK3
I0521 10:54:47.013137 22582644975424 model.py:165] Running predict with shape(feat) = {'aatype': (1144,), 'residue_index': (1144,), 'seq_length': (), 'msa': (2257, 1144), 'num_alignments': (), 'template_aatype': (4, 1144), 'template_all_atom_mask': (4, 1144, 37), 'template_all_atom_positions': (4, 1144, 37, 3), 'asym_id': (1144,), 'sym_id': (1144,), 'entity_id': (1144,), 'deletion_matrix': (2257, 1144), 'deletion_mean': (1144,), 'all_atom_mask': (1144, 37), 'all_atom_positions': (1144, 37, 3), 'assembly_num_chains': (), 'entity_mask': (1144,), 'num_templates': (), 'cluster_bias_mask': (2257,), 'bert_mask': (2257, 1144), 'seq_mask': (1144,), 'msa_mask': (2257, 1144)}

Qrouger · 2024-05-21T20:00:16Z

No, he just run slowly on CPU.

Quentin.

dingquanyu · 2024-05-30T11:55:58Z

Available platform names are: CUDA

Hi @J-Held

Glad it worked. These messages are not actually errors but some logs that reflect the status of you modelling job. Since you have this Available platform names are: CUDA printed out, it should be successfully running on you GPU. But I would still suggest running nvidia-smi just to double check if the programme is actually consuming your GPU RAM.

Yours
Dingquan

dingquanyu closed this as completed May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_multimer_jobs issue #342

run_multimer_jobs issue #342

J-Held commented May 20, 2024

Qrouger commented May 21, 2024

dingquanyu commented May 21, 2024

J-Held commented May 21, 2024

Qrouger commented May 21, 2024

dingquanyu commented May 30, 2024

run_multimer_jobs issue #342

run_multimer_jobs issue #342

Comments

J-Held commented May 20, 2024

Qrouger commented May 21, 2024

dingquanyu commented May 21, 2024

J-Held commented May 21, 2024

Qrouger commented May 21, 2024

dingquanyu commented May 30, 2024