You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why were many models not evaluated on tasks like HellaSwag, OpenBookQA, MS MARCO, and summarization tasks like XSUM and CNN/Daily Mail? Is it because they are not suitable for these tasks?
The text was updated successfully, but these errors were encountered:
Yes - many scenarios on HELM Classic require logprobs from the model, because they use an adapter that requires logprobs. Many recent model APIs do not provide logprobs.
Why were many models not evaluated on tasks like HellaSwag, OpenBookQA, MS MARCO, and summarization tasks like XSUM and CNN/Daily Mail? Is it because they are not suitable for these tasks?
The text was updated successfully, but these errors were encountered: