-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
d_model parameter #39
Comments
This will probably help tackle overfitting (#34) |
Cool, I will try it, could I run it in our cluster now? :D |
Yes, of course, after you update the dependencies you should be able to submit a training job. Adding some more details on Slack. |
Hi @williamstark01, the reason why I set d_model = 6 is that we use one hot to encode a DNA sequence, so every sequence has the shape of [2000, 6]. Are there any ways to change a large value that is currently used? |
That's a good question. With label encoding the single value is converted to a tensor of shape (1, embed_dim) (simply a vector of length embed_dim, a better name for d_model). I haven't used one-hot encoded DNA sequences with transformers before and I'm not sure how they can be converted to embeddings. Maybe label encoding is the best option, but it's probably worth researching a bit to see whether other similar projects are using another approach with one-hot encoding. |
Currently, I just put the sequence which has been encoded by one-hot to the transformer model. def forward(self, x):
# generate token embeddings
token_embeddings = self.token_embedding(x) Should I add this |
I thought some more about this, and I'm not sure there is a single correct answer to how we should process the base characters. Using the bases as tokens and their one-hot encodings directly may work, but we lose the learnable embeddings which may map the bases to a higher dimensional space which may represent meaningful features. Then again, since we have so few different tokens, this may not be consequential. For generating embeddings we also have an additional option of using n-grams as tokens instead of single bases directly. Maybe we can continue using one-hot encodings without embeddings for now, but at some point it's probably worth taking a look at similar projects to get additional insights on this: https://github.com/jerryji1993/DNABERT |
Ok, I will check it. |
I think that the
d_model
parameter (the embedding dimension) would take a significantly larger value than the value currently used. It is usually a multiple of num_heads, which usually take the value 8, so maybe an initial value of 32 would make sense here? Or is there a specific reasoning behind using a smaller value for it?The text was updated successfully, but these errors were encountered: