Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESMFold database header issues #51

Open
valentynbez opened this issue Mar 25, 2024 · 1 comment
Open

ESMFold database header issues #51

valentynbez opened this issue Mar 25, 2024 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@valentynbez
Copy link

valentynbez commented Mar 25, 2024

When I extract FASTA from highquality_clust30 I receive the following headers.

>ESMFOLD V0 PREDICTION FOR MGYP000138429313
>ESMFOLD V0 PREDICTION FOR MGYP001595280761
...

I use FoldComp for a downstream application, and per FASTA specification in this case each sequence will have a header ESMFOLD, which is not unique. The unique id is stored in the comment.
I can run sed on it, but this solution feels hacky.
The highquality_clust30.lookup looks appropriate:

0       MGYP002174220927        0
1       MGYP000064029927        0

Do you have recommendations on how to get proper FASTA headers?

Cheers
V

@khb7840
Copy link
Member

khb7840 commented Aug 8, 2024

Sorry for the late response. I've changed the default to use id/filename when extracting sequences in 412c7a8 and introduced use-title flag if title is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants