Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized atom type #255

Open
CLG68 opened this issue Aug 4, 2024 · 9 comments
Open

Unrecognized atom type #255

CLG68 opened this issue Aug 4, 2024 · 9 comments

Comments

@CLG68
Copy link

CLG68 commented Aug 4, 2024

Hi,

With some molecules I get (Unimol Docking V2):

/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/CHEMBL-3740791-1.sdf-Cc1ccnc(N(CCC(=O)[O-])C(=O)c2ccc3c(c2)nc(CNc2ccc(C(N)=[NH2+])cc2F)n3C)c1-RMSD:173.775
[02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0)
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/Enamine-Z3019139935-2.sdf-Cc1cc(N2CCC(O)(C[NH+]3CCOCC3)CC2)nc(N(C)c2ccccc2)[nH+]1-RMSD:171.117
3%|█▎ | 63/1959 [01:50<50:00, 1.58s/it][02:07:56] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:57] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:58] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
/media/christian/VS1/VS/Results_Unimol/MC4R_protein/Poses/Sublibrary_05/ChemDiv-V014-0652-1.sdf-CC(C)CCN(CC(=O)Nc1cc(C(C)(C)C)nn1-c1ccc(Cl)cc1)C(=O)C(C)(C)CCl-RMSD:173.7905
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)
3%|█▎ | 64/1959 [01:54<1:02:15, 1.97s/it][02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)
[02:08:00] UFFTYPER: Unrecognized atom type: S_6+6 (0)

It does it even if I use the latest version of RDKIT.

@CLG68
Copy link
Author

CLG68 commented Aug 5, 2024

Maybe it is related to this: rdkit/rdkit#6365 but I'm currently using the latest RDKIT so it should have been fixed.
I also get:
UFFTYPER: Unrecognized atom type: S_5+6

@CLG68
Copy link
Author

CLG68 commented Aug 12, 2024

I screened 100000 structures from a focussed library from a Panther/ShaEP VS, on Unimol docking V2. I had a hard time with rescoring the resuts as 650 poses either had "nan" as coordinates or were out the binding pocket. So I made a script to clean the results before rescoring. Maybe this is coming from the problem I repported (UFFTYPER: Unrecognized atom type: S_5+6)? Do you know how to correct this problem?

Thanks
Christian

@ZhouGengmo
Copy link
Collaborator

[02:07:59] UFFTYPER: Unrecognized atom type: S_6+6 (0)

It looks like there is an issue with RDKit when loading the file. Could you provide a file that produces this error? We can test it further.

@CLG68
Copy link
Author

CLG68 commented Aug 13, 2024

Thank you v much for helping with this.
I attached the target, the json file, the ref ligand used for generating the json file as well as ex of structures giving me errors or problematic results. The source-structures are extracted from my library. The generated-poses are from Unimol docking V2. The structures that give me a problem with valence do not generate a binding pose. I had to create a script to clean the docking results as the poses with no coordinates or outside of the binding pocket were creating problems with scoring in the training with Brutenib... ShaEP was just thinking forever.

The library is from the top 1% scores from a Panther/ShaEP VS. My cleaning script flagged 670 poses of around 100k minus all the poses not generated because of the valence problem.

For RDKit, I tried the version suggested on your read.me file and also the latest version. Updating to the latest version did not solve the problem.

Best,
Christian
Unimol-Docking-V2_clg68.zip

@CLG68
Copy link
Author

CLG68 commented Aug 19, 2024

Hi,
Was it ok in a zip archive or it would be better as individual files?
Thank you,
Christian

@ZhouGengmo
Copy link
Collaborator

Sorry for the delayed response.

Regarding the bug in RDKit, it seems that the bug mentioned in the original issue still exists. I am using an almost up-to-date version (2024.3.1, installed via pip), but when I run the example code from the issue:

mol = Chem.MolFromSmiles("S(F)(F)(F)(F)F")
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
conf = mol.GetConformer()
print(conf)

The output is:

<rdkit.Chem.rdchem.Conformer object at 0x7fc17b931b60>
[09:45:04] UFFTYPER: Unrecognized atom type: S_6+6 (0)

I also ran the example file you provided. The command I used is as follows:

python demo.py --mode single --conf-size 10 --cluster \
    --input-protein Unimol-Docking-V2_clg68/MC4R_protein.pdb \
    --input-ligand Unimol-Docking-V2_clg68/MC4R_ref-ligand.sdf \
    --input-docking-grid Unimol-Docking-V2_clg68/docking_grid.json \
    --output-ligand-name ligand_predict \
    --output-ligand-dir predict_sdf \
    --steric-clash-fix \
    --model-dir unimol_docking_v2_240517.pt

There was no Unrecognized atom type: S_6+6 (0) error, and the script ran as expected. Part of the output message is:

[09:55:28] Warning: molecule is tagged as 2D, but at least one Z coordinate is not zero. Marking the mol as 3D.
predict_sdf/ligand_predict.sdf-Cn1nnc(CC2(C3CCCCC3)CCN(C(=O)C(Cc3ccc(Cl)cc3)NC(=O)C3Cc4ccccc4CN3)CC2)n1-RMSD:4.5583

@CLG68
Copy link
Author

CLG68 commented Aug 29, 2024

Thank you very much for running some tests with my files. Many docking poses are missing/rejected from the screen because of the "Unrecognized atom type error", of poses without coordinates and molecules docked outside the binding pocket; so I'm really interested in resolving this problem. I'll try RDKit 2024.3.1, and investigate the "is tagged as 2D" message. Hopefully it will solve the "Unrecognized atom type: S_6+6 (0)" problem.

Best,
Christian

@yuanqm55
Copy link

I encountered the same problem.

@CLG68
Copy link
Author

CLG68 commented Sep 19, 2024

I still have to solve that one... I'll try the problematic files with different versions of RDKit and I'll let you know if one works better.. If not, I could always try to sanitize the problematic files.

I created a bash script to identify and remove the problematic poses/files, post-screening. Just to give you an idea for one of my screen:

Ligands_Focused-library: 61235 (input files)
...
missing: 0
nan: 35
no-coordinates: 17
outside_binding-site: 394
...
Poses available: 60789
Rejected files: 446

so if I compare the number of files generated during screening to the number of files screened, the number is the same (missing=0). However, my script end up removing 446 files. To select the files, I extracted the 10th line in the sdf which should contain details about 1 atom. If this line contains "nan" instead of coordinates, the file is removed from the Poses folder, it is also the case if this atom is outside the binding pocket (+ a little buffer) as defined in the json file or if the coordinates make no sense ...molecule nowhere near the receptor (no-coordinates).

@CLG68 CLG68 closed this as completed Sep 19, 2024
@CLG68 CLG68 reopened this Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants