Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error compresing PDB #35

Open
valentynbez opened this issue May 29, 2023 · 2 comments
Open

Error compresing PDB #35

valentynbez opened this issue May 29, 2023 · 2 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@valentynbez
Copy link

valentynbez commented May 29, 2023

Hello,

I was trying to compress PDB and I constantly get the same error.
I tried changing all extensions from .ent to .pdb and rewriting pdb's using ProDy, so that everything unnecessary is removed from the pdb itself.

Compressing files in correct_pdb using 32 threads
Output directory: pdb_foldcomp
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
Aborted (core dumped)

If I try per-file compression, it only writes a single file and quits. It would also be nice to see what file is being processed, in case it's an error with pdb contents.

Cheers,
V

@khb7840
Copy link
Member

khb7840 commented May 31, 2023

Thanks for the feedback. I'll implement a verbosity option for logging error with processed file name.
As initial foldcomp was designed to handle predicted structures without discontinuity, we haven't checked all the possible error cases from the real data. To check the cause of error, it would be helpful if you could share the preprocessing script to handle the PDB.

@khb7840 khb7840 added bug Something isn't working enhancement New feature or request labels May 31, 2023
@valentynbez
Copy link
Author

valentynbez commented May 31, 2023

Thanks for the answer, I would be really grateful for help and I think having a foldcomp db of experimental structures gonna be awesome!
I tried different possibilities, here is a snippet for my test data (https://www.rcsb.org/structure/7db5):

from prody import parsePDBStream, writePDB
from pathlib import Path
import re

file = "databases/pdb_structures/7db5.pdb"
outfolder = "."

file = Path(file)
filename = file.name
outfolder = Path(outfolder)
outfile = outfolder / filename

with open(str(file)) as f:
    pdb = parsePDBStream(f)

# get only first chain of the pdb file 
first_chain = [str(chain_id).split()[1] for chain_id in pdb.iterChains()][0]
with open(str(file)) as f:
    pdb = parsePDBStream(f, chain=first_chain)
writePDB(str(outfile), pdb)

# overwrite first line in the outfile
with open(str(outfile), "r") as f:
    lines = f.readlines()

# adding a TITLE, replacing a REMARK
lines[0] = "TITLE     " + filename.split(".")[0] + "\n"
with open(str(outfile), "w") as f:
    for i, line in enumerate(lines):
        f.writelines(line)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants