Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation error #18

Open
seanwangsalad opened this issue Mar 17, 2024 · 6 comments
Open

Installation error #18

seanwangsalad opened this issue Mar 17, 2024 · 6 comments

Comments

@seanwangsalad
Copy link

Ubuntu 22.04
CUDA 11.8
AF 2.3

Dear developers,
When running the following command, I get the following errors:
sudo singularity build tcrmodel2.sif singularity/tcrmodel2_singularity.def
Screenshot 2024-03-16 at 10 35 12 PM

Screenshot 2024-03-16 at 10 31 13 PM

However it still builds thetcrmodel2.sif file.

However when I do:
bash run_tcrmodel2_singularity.sh
It returns the error:
Traceback (most recent call last):
File "/opt/tcrmodel2/run_tcrmodel2.py", line 9, in
from absl import app, flags
ModuleNotFoundError: No module named 'absl'

Would really appreciate some help in figuring out if this is an installation or run error.

@rui-yin
Copy link
Collaborator

rui-yin commented Mar 17, 2024

Hi Sean,

Indeed, from the output you shared, it looks like the singularity container is not properly built (that's why you see the module not found error when running the bash script). The error message seems to have come from the cuda 23.5.0 not being compatible with python 3.12. However, in the tcrmodel2_singularity.def file, the Python version we specified is Python 3.10 (see this line).

Maybe you can double-check the Python version in the tcrmodel2_singularity.def file?

Best,
Rui

@seanwangsalad
Copy link
Author

Hi Rui, should I specify 3.12? It is 3.10 in the .def file. Error seems to be coming from pin-1

Thanks,

  • Sean

@rui-yin
Copy link
Collaborator

rui-yin commented Mar 18, 2024

Hi Sean,

I think 3.10 is what you want, not 3.12. So what you have in the .def looks good. It's interesting that what you specify (python version 3.10) is different from what's being installed.

Maybe try changing line 38 from:
conda install -qy conda==23.5.0 \

to:
conda install -qy conda==23.5.0 python=3.10 \

and everything else stays the same.

Let me know how it goes!

Best,
Rui

@bsaleme
Copy link

bsaleme commented May 2, 2024

Hello Rui, I have also run into the same problem listed above. I have made the change you recommended (specifying the python version after conda version) and I successfully built the image but then I run into another issue with "jax.extend" module not being found. I added a picture of the error below. Any recommendations ?

Screen Shot 2024-05-02 at 2 59 48 PM

@bpierce12
Copy link
Member

@bsaleme it looks like your issue is the same as what others have noted for ColabFold, and is related to the jax version:
YoshitakaMo/localcolabfold#212
Downgrading to jaxlib to 0.4.23, as noted in that thread, will hopefully help with that problem.

@andreas-wilm
Copy link

Hi all,

I also ended in version dependency hell as well while upgrading the other day. The real culprit is that the definitions file always pulls the latest Alphafold commit. It's best to go with a certain commit/tag/release and then change library versions as needed. One (official) example is this GCP Dockerfile.

I ended up changing the TCRModel2 definitions file accordingly (conda 24.1.2, jax 0.4.13, cuda 11.8 and AF commit 032e2f2 from Feb 2024). Happy to share the file or issue a PR if of interest.

Andreas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants