[wip] brier score #782

bmosaicml · 2023-12-06T19:37:31Z

Brier score seems of questionable usefulness. COPA results:

First number for each model is Brier score. Below we find that accuracy AND brier score both go up with model size (not good because brier score is lower = better)

| Category   | Benchmark   | Subtask   |   Accuracy | Number few shot   | Model                   |
|:-----------|:------------|:----------|-----------:|:------------------|:------------------------|
|            | copa        |           |   0.337194 | 0-shot            | EleutherAI/gpt-neo-125m |
|            | copa        |           |   0.63     | 0-shot            | EleutherAI/gpt-neo-125m |
|            | copa        |           |   0.381412 | 0-shot            | EleutherAI/gpt-neo-1.3B |
|            | copa        |           |   0.71     | 0-shot            | EleutherAI/gpt-neo-1.3B |

bmosaicml · 2023-12-06T19:42:17Z

llmfoundry/eval/metrics/nlp.py

+            cont_tok_logits = output_logits[batch_idx].index_select(dim=0, index=cont_idx - 1)
+            # labels have been shifted left by one index, so the cont_idx needs to be shifted as well.
+            cont_tok_targ = labels[batch_idx].index_select(dim=0, index=cont_idx - 1)
+            mean_logit_of_targ_tok = cont_tok_logits.index_select(dim=1, index=cont_tok_targ).diagonal().mean()


this is the part I need some help double checking. This computes the mean output logit of the target tokens, then below we compute exp(-mean_logit). On line 45 we normalize across the C choices

@mansheej

dakinggg · 2024-05-16T23:58:06Z

Feel free to reopen if you are still doing this

brier score wip

1e65cdc

dakinggg marked this pull request as draft December 6, 2023 19:38

brier score wip

6013a30

bmosaicml force-pushed the brier_score branch from 1e65cdc to 6013a30 Compare December 6, 2023 19:39

bmosaicml added 2 commits December 6, 2023 14:39

merge

ef1f994

merge

7b04a1d

bmosaicml commented Dec 6, 2023

View reviewed changes

bmosaicml added 4 commits December 7, 2023 15:18

add mmlu variants

b5fefd1

add new file

5f1b921

fix full mmlu

00c3a7d

fix

b76bf00

dakinggg closed this May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] brier score #782

[wip] brier score #782

bmosaicml commented Dec 6, 2023 •

edited

Loading

bmosaicml Dec 6, 2023

dakinggg commented May 16, 2024

[wip] brier score #782

[wip] brier score #782

Conversation

bmosaicml commented Dec 6, 2023 • edited Loading

bmosaicml Dec 6, 2023

Choose a reason for hiding this comment

dakinggg commented May 16, 2024

bmosaicml commented Dec 6, 2023 •

edited

Loading