Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_multimer_jobs issue #342

Closed
J-Held opened this issue May 20, 2024 · 5 comments
Closed

run_multimer_jobs issue #342

J-Held opened this issue May 20, 2024 · 5 comments

Comments

@J-Held
Copy link

J-Held commented May 20, 2024

I am trying to run the run_multimer_jobs script on GPU using this command:

run_multimer_jobs.py
--mode=all_vs_all
--num_cycle=3
--num_predictions_per_model=1
--output_path=/storage/home/jbh249/scratch/output/models/
--data_dir=/storage/home/jbh249/scratch/alphaDatabase/ \
--protein_lists=/storage/home/jbh249/scratch/candidates.txt
--monomer_objects_dir=/storage/home/jbh249/scratch/output/features

The job terminates almost immediately with this error:

/storage/home/jbh249/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/Bio/Data/SCOPData.py:18: BiopythonDeprecationWarning: The 'Bio.Data.SCOPData' module will be deprecated in a future release of Biopython in favor of 'Bio.Data.PDBData.
warnings.warn(
2024-05-20 16:09:31.137214: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-05-20 16:09:35.260966: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I0520 16:09:35.261087 23134891243328 utils.py:271] checking if output_dir exists /storage/home/jbh249/scratch/output/models/
Traceback (most recent call last):
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 462, in
app.run(main)
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/storage/home/jbh249/micromamba/envs/AlphaPulldown/bin/run_multimer_jobs.py", line 437, in main
all_proteins = read_all_proteins(FLAGS.protein_lists[0])
TypeError: 'NoneType' object is not subscriptable

@Qrouger
Copy link

Qrouger commented May 21, 2024

Hi @J-Held, the first part of your errors says that can't use GPU cause you have a problem with your TensorRT. But the script don't crash cause of that, but probably cause of yours command. Take care of your backslash and personally I prefer write the command in line with one space to avoid writing errors. Like this :
run_multimer_jobs.py --mode=all_vs_all --num_cycle=3 --num_predictions_per_model=1 --output_path=/storage/home/jbh249/scratch/output/models/ --data_dir=/storage/home/jbh249/scratch/alphaDatabase/ --protein_lists=/storage/home/jbh249/scratch/candidates.txt --monomer_objects_dir=/storage/home/jbh249/scratch/output/features

Quentin

@dingquanyu
Copy link
Collaborator

Hi @J-Held

I agree with @Qrouger 's suggestion. It's likely that your command is not correctly formatted so that protein_lists wasn't parsed correctly. What you wrote after the \ is not parsed at all.

Yours
Dingquan

@J-Held
Copy link
Author

J-Held commented May 21, 2024

Yes, that was it. Thank you @Qrouger and @dingquanyu!

Regarding the GPU, it looks like I'm getting many of the error messages brought up in #339, but the job appears to still be running. Is it just going to time out? Output log below:

I0521 10:54:40.655257 22582644975424 run_multimer_jobs.py:389] Modeling new interaction for /storage/home/jbh249/scratch/output/models/HrpN_and_WAK3
I0521 10:54:41.184001 22582644975424 xla_bridge.py:660] Unable to initialize backend 'cuda': Unable to load cuDNN. Is it installed?
I0521 10:54:41.203725 22582644975424 xla_bridge.py:660] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
I0521 10:54:41.204897 22582644975424 xla_bridge.py:660] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
W0521 10:54:41.205006 22582644975424 xla_bridge.py:724] CUDA backend failed to initialize: Unable to load cuDNN. Is it installed? (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
I0521 10:54:43.223712 22582644975424 utils.py:378] Model model_1_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:44.160407 22582644975424 utils.py:378] Model model_2_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:45.103848 22582644975424 utils.py:378] Model model_3_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.035488 22582644975424 utils.py:378] Model model_4_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.962665 22582644975424 utils.py:378] Model model_5_multimer_v3 is running 0 prediction with default MSA depth
I0521 10:54:46.962839 22582644975424 utils.py:384] Using random seed 1682205902281770834 for the data pipeline
I0521 10:54:47.012253 22582644975424 run_multimer_jobs.py:323] now running prediction on HrpN_and_WAK3
I0521 10:54:47.012355 22582644975424 run_multimer_jobs.py:324] output_path is /storage/home/jbh249/scratch/output/models/HrpN_and_WAK3
I0521 10:54:47.012434 22582644975424 predict_structure.py:125] Checking for existing results
I0521 10:54:47.012791 22582644975424 predict_structure.py:139] Running model model_1_multimer_v3_pred_0 on HrpN_and_WAK3
I0521 10:54:47.013137 22582644975424 model.py:165] Running predict with shape(feat) = {'aatype': (1144,), 'residue_index': (1144,), 'seq_length': (), 'msa': (2257, 1144), 'num_alignments': (), 'template_aatype': (4, 1144), 'template_all_atom_mask': (4, 1144, 37), 'template_all_atom_positions': (4, 1144, 37, 3), 'asym_id': (1144,), 'sym_id': (1144,), 'entity_id': (1144,), 'deletion_matrix': (2257, 1144), 'deletion_mean': (1144,), 'all_atom_mask': (1144, 37), 'all_atom_positions': (1144, 37, 3), 'assembly_num_chains': (), 'entity_mask': (1144,), 'num_templates': (), 'cluster_bias_mask': (2257,), 'bert_mask': (2257, 1144), 'seq_mask': (1144,), 'msa_mask': (2257, 1144)}

@Qrouger
Copy link

Qrouger commented May 21, 2024

No, he just run slowly on CPU.

Quentin.

@dingquanyu
Copy link
Collaborator

Available platform names are: CUDA

Hi @J-Held

Glad it worked. These messages are not actually errors but some logs that reflect the status of you modelling job. Since you have this Available platform names are: CUDA printed out, it should be successfully running on you GPU. But I would still suggest running nvidia-smi just to double check if the programme is actually consuming your GPU RAM.

Yours
Dingquan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants