Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laplace smoothing for EMA codebook update #272

Open
VaishnaviSPatil opened this issue Nov 15, 2023 · 0 comments
Open

Laplace smoothing for EMA codebook update #272

VaishnaviSPatil opened this issue Nov 15, 2023 · 0 comments

Comments

@VaishnaviSPatil
Copy link

Hi,

I understand that to calculate the normalized weights for the embeddings we divide by the Laplace smoothed cluster sizes as seen in the code here.

However, for the embeddings whose cluster sizes are zero, the Laplace smoothing replaces it with a very small value (some function of epsilon). When these updated cluster sizes are used to normalize (by dividing the running ema_dw with the updated cluster size) and update the embeddings, the corresponding embeddings with zero cluster sizes are updated to a very high value. These updated embeddings then have an ever lower probability of being chosen in the future.

Is my understanding of this issue correct or am I missing something? If this is indeed correct is there a way to mitigate this to have a higher perplexity score?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant