Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use AlphaPullDown for 5K proteins? #406

Open
Rohit-Satyam opened this issue Oct 1, 2024 · 5 comments
Open

Can I use AlphaPullDown for 5K proteins? #406

Rohit-Satyam opened this issue Oct 1, 2024 · 5 comments

Comments

@Rohit-Satyam
Copy link

I was wondering if AlphaPulldown will complain if given 5000 proteins to perform all-vs-all PPI. Pardon my ignorance but in the Document I couldn't find a place where we can skip running structure prediction and use the ones made available by Alphafold database to reduce the runtime.

@jkosinski
Copy link
Collaborator

Hi Rohit-Satyam,

AP will not complain but it will take a lot of time or resources. We thought of using monomeric models for speeding up the calculations but we haven’t tested that, in principle should work by using the models as custom monomeric templates and reducing the number of recycles. @DimaMolod , would using custom monomeric templates work for all against all mode?

@DimaMolod
Copy link
Collaborator

Hi, yes, using monomeric templates would work, but it will not speed up predictions as searching for a template is rapid compared to the rest of the workflow. Reducing cycles to 1 will help, but the quality of predictions will (expectedly) deteriorate. Please also note that 5000 in all-vs-all will result in 12.5*10^6 predictions, which is too much even for the most powerful HPCs in the world

@jkosinski
Copy link
Collaborator

There is a chance that the quality of models would still be acceptable with recycle 1 as the monomeric templates may speed up "convergence". But indeed, we haven't tested that yet; it's just a speculation.

@jkosinski
Copy link
Collaborator

Hi, yes, using monomeric templates would work, but it will not speed up predictions as searching for a template is rapid compared to the rest of the workflow. Reducing cycles to 1 will help, but the quality of predictions will (expectedly) deteriorate. Please also note that 5000 in all-vs-all will result in 12.5*10^6 predictions, which is too much even for the most powerful HPCs in the world

Doable in a week on this just everyone else at xAI needs to go for holidays https://www.reddit.com/r/artificial/comments/1f8iidw/musks_xai_supercomputer_goes_online_with_100000/

@Rohit-Satyam
Copy link
Author

I get it that it's quite ambitious. We don't have that many GPUs (600 available but shared) but we have 1000 CPUs. Will somehow reduce the list of proteins for all vs all comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants