Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'seq' value is ;[]', empty for the LBA dataset #62

Open
lzhangUT opened this issue Feb 10, 2023 · 0 comments
Open

'seq' value is ;[]', empty for the LBA dataset #62

lzhangUT opened this issue Feb 10, 2023 · 0 comments

Comments

@lzhangUT
Copy link

Hi,
I am working on the LBA dataset trying to reproduce your results.
I downloaded your LBA dataset in the LMDB format, the download and load dataset function works fine, but the 'seq' value in the dataset is '[]'- empty for each protein.

  1. why is that?
  2. I tried to generate the sequence by myself using your get_chain_sequences function in the sequence.py in the protein folder:

def get_chain_sequences(df):
"""Return list of tuples of (id, sequence) for different chains of monomers in a given dataframe."""
# Keep only CA of standard residues
df = df[df['name'] == 'CA'].drop_duplicates()
df = df[df['resname'].apply(lambda x: Poly.is_aa(x, standard=True))]
df['resname'] = df['resname'].apply(Poly.three_to_one)
chain_sequences = []
for c, chain in df.groupby(['ensemble', 'subunit', 'structure', 'model', 'chain']):
seq = ''.join(chain['resname'])
chain_sequences.append((tuple([str(x) for x in c]), seq))
return chain_sequences

It also returns empty list for sequence, so I think there is a bug here.

  1. I modified the function a little bit, so I can the get the protein sequences. While for some proteins, there are multiple chains, how to process the multiple chains to use for training or which chain to choose to pair with ligand SMILES to be used for training?

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant