Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] brier score #782

Closed
wants to merge 8 commits into from
Closed

[wip] brier score #782

wants to merge 8 commits into from

Conversation

bmosaicml
Copy link
Contributor

@bmosaicml bmosaicml commented Dec 6, 2023

Brier score seems of questionable usefulness. COPA results:

First number for each model is Brier score. Below we find that accuracy AND brier score both go up with model size (not good because brier score is lower = better)

| Category   | Benchmark   | Subtask   |   Accuracy | Number few shot   | Model                   |
|:-----------|:------------|:----------|-----------:|:------------------|:------------------------|
|            | copa        |           |   0.337194 | 0-shot            | EleutherAI/gpt-neo-125m |
|            | copa        |           |   0.63     | 0-shot            | EleutherAI/gpt-neo-125m |
|            | copa        |           |   0.381412 | 0-shot            | EleutherAI/gpt-neo-1.3B |
|            | copa        |           |   0.71     | 0-shot            | EleutherAI/gpt-neo-1.3B |

@dakinggg dakinggg marked this pull request as draft December 6, 2023 19:38
cont_tok_logits = output_logits[batch_idx].index_select(dim=0, index=cont_idx - 1)
# labels have been shifted left by one index, so the cont_idx needs to be shifted as well.
cont_tok_targ = labels[batch_idx].index_select(dim=0, index=cont_idx - 1)
mean_logit_of_targ_tok = cont_tok_logits.index_select(dim=1, index=cont_tok_targ).diagonal().mean()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the part I need some help double checking. This computes the mean output logit of the target tokens, then below we compute exp(-mean_logit). On line 45 we normalize across the C choices

@mansheej

@dakinggg
Copy link
Collaborator

Feel free to reopen if you are still doing this

@dakinggg dakinggg closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants