Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README for training data? #1

Open
jaekor91 opened this issue Jul 14, 2020 · 2 comments
Open

README for training data? #1

jaekor91 opened this issue Jul 14, 2020 · 2 comments

Comments

@jaekor91
Copy link

jaekor91 commented Jul 14, 2020

Hello, could you add a REAME to the data/training data folder with a description of what each file is for?

Could you add the original training data to this directory?

https://github.com/gifford-lab/antibody-2019/tree/master/data/training%20data/Hold%20out%20classification

Also, could you explain why there are "J" in the sequences in the following (and other) file? Are they used as padding tokens for NNs? If so, are paddings added randomly?

https://raw.githubusercontent.com/gifford-lab/antibody-2019/master/data/training%20data/Full%20regression/Lucentis_b/data.tsv

Lastly, could you share the entire phagemid template sequence used? I am interested in looking at the entire antibody sequence available in addition to using the flanking sequence.

Thank you!

@jaekor91
Copy link
Author

@igsaber -- Thank you for uploading the classification training data! I noticed that there are >60K sequences, rather than 51,130 noted in the supplementary table. Could you explain the source of discrepancy? Could you also share observed counts for individual sequences in R2 and R3?

@wjs20
Copy link

wjs20 commented Oct 13, 2020

Hi, I also am quite confused by the data, could you upload a README please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants