You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering if the early exit techniques introduced in the paper can be extended to be used with language modeling, or do they only apply to classification tasks? I think the only difference is that (1) language modeling has a rather large answer space at tens of thousands of vocabularies, and that (2) language models usually output a probability distribution to be sampled. Maybe it is because the conservative predictions are not strong enough when facing such a large number of possible sampling outcomes?
I see that you have a later work (CALM) addressing the case on language models by enforcing the early-exit objective during training, but I think the approaches used in CATs are more desirable because it is distribution-free and model-agnostic.
Thank you for your time!
The text was updated successfully, but these errors were encountered:
Hi, and thank you for your great work!
I was wondering if the early exit techniques introduced in the paper can be extended to be used with language modeling, or do they only apply to classification tasks? I think the only difference is that (1) language modeling has a rather large answer space at tens of thousands of vocabularies, and that (2) language models usually output a probability distribution to be sampled. Maybe it is because the conservative predictions are not strong enough when facing such a large number of possible sampling outcomes?
I see that you have a later work (CALM) addressing the case on language models by enforcing the early-exit objective during training, but I think the approaches used in CATs are more desirable because it is distribution-free and model-agnostic.
Thank you for your time!
The text was updated successfully, but these errors were encountered: