Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About limitations on the task #2

Open
WuTianming opened this issue Aug 28, 2023 · 0 comments
Open

About limitations on the task #2

WuTianming opened this issue Aug 28, 2023 · 0 comments

Comments

@WuTianming
Copy link

Hi, and thank you for your great work!

I was wondering if the early exit techniques introduced in the paper can be extended to be used with language modeling, or do they only apply to classification tasks? I think the only difference is that (1) language modeling has a rather large answer space at tens of thousands of vocabularies, and that (2) language models usually output a probability distribution to be sampled. Maybe it is because the conservative predictions are not strong enough when facing such a large number of possible sampling outcomes?

I see that you have a later work (CALM) addressing the case on language models by enforcing the early-exit objective during training, but I think the approaches used in CATs are more desirable because it is distribution-free and model-agnostic.

Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant