Help me understand the HELM Classic Leaderboard's missing results #2994

PaulJoeMaliakel · 2024-09-16T12:25:58Z

Why were many models not evaluated on tasks like HellaSwag, OpenBookQA, MS MARCO, and summarization tasks like XSUM and CNN/Daily Mail? Is it because they are not suitable for these tasks?

yifanmai · 2024-09-19T03:11:38Z

Yes - many scenarios on HELM Classic require logprobs from the model, because they use an adapter that requires logprobs. Many recent model APIs do not provide logprobs.

yifanmai added the user question label Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help me understand the HELM Classic Leaderboard's missing results #2994

Help me understand the HELM Classic Leaderboard's missing results #2994

PaulJoeMaliakel commented Sep 16, 2024

yifanmai commented Sep 19, 2024

Help me understand the HELM Classic Leaderboard's missing results #2994

Help me understand the HELM Classic Leaderboard's missing results #2994

Comments

PaulJoeMaliakel commented Sep 16, 2024

yifanmai commented Sep 19, 2024