[WIP] Error logging callback #2533

bmosaicml · 2023-09-12T20:11:09Z

What does this PR do?

This PR adds a callback that logs ICL outputs during eval. It modifies the custom metrics to keep track of incorrect model outputs. Each metric is responsible for specifying the table schema for logging cached responses as well as specifying how to format the cached responses using the tokenizer.

The EvalOutputLogging callback is then responsible for logging the cached results in table format after each evaluation.

Design doc

What issue(s) does this change relate to?

Before submitting

Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

…r into error_logging_callback

@mvpatel2000

Add pytorch nightly and CUDA 12.1 support for composer docker images What issue(s) does this change relate to? Related to https://mosaicml.atlassian.net/browse/GRT-2305 Tests docker image: mosaicml/ci-staging:72744756-794c-4390-94db-72c212dd5e00 (cuda 12.1, pytorch 2.1.0) mcli connect temp-test-ZAVxMh Python 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> print(torch.version) <module 'torch.version' from '/usr/lib/python3/dist-packages/torch/version.py'> >>> print(torch.__version__) 2.1.0.dev20230623+cu121 >>> print(torch.version.cuda) 12.1 Integration Test @mvpatel2000 has validated that this trains on initial mpt-2 experiments and speeds up training by +7-8% from 0.25 MFU to 0.27 MFU

* fix autoresume with slashed directory * Revert "fix autoresume with slashed directory" This reverts commit 3dfb5f5. revert * fix * fix precommit * Update in_context_learning_evaluation.py * Update in_context_learning_evaluation.py * Update in_context_learning_evaluation.py * add tests

Signed-off-by: Prithvi Kannan <[email protected]> Co-authored-by: Evan Racah <[email protected]> Co-authored-by: eracah <[email protected]>

Upstreams and generalizes the callback that logs generations to wandb from foundry to composer.

…2476) Upgrade torch docker nightly version to 08-23-23 so that we get nccl version 0.18.3 which was merged on 08-18-23.

Test pytorch 2.1.0 docker images on ci/cd mosaicml#2469

Will be removed in v0.18.

* Update RTD build config with build.os * Remove python.version --------- Co-authored-by: Bandish Shah <[email protected]>

…r into error_logging_callback

maxisawesome · 2024-04-01T23:00:57Z

Successfully merged here

bmosaicml and others added 8 commits August 22, 2023 13:47

implement cot

55cf010

fix tests

91033fa

Merge branch 'dev' into add_cot_eval

d9ba6e2

debug print statement

3ed0ade

prelim commit

ec6fc17

fix max answer lengths for cot

a59b644

add output logger

97b1218

create eval output logger

7174e75

bmosaicml requested review from a team, eracah and dakinggg as code owners September 12, 2023 20:11

bmosaicml and others added 19 commits September 12, 2023 16:31

fix pyright; git push

fdbd53b

Merge branch 'dev' into error_logging_callback

909d07b

change dist reduce fx

9f4e3d2

Merge branch 'error_logging_callback' of github.com:bmosaicml/compose…

dce297c

…r into error_logging_callback

change dist reduce fx

ea4e7ee

fix pyright

5630c23

Merge branch 'dev' into error_logging_callback

30623f7

Add torch 2.1.0 args for github release-docker workflow

0c333b6

Log system metrics on each event (mosaicml#2412)

da4e19f

Signed-off-by: Prithvi Kannan <[email protected]> Co-authored-by: Evan Racah <[email protected]> Co-authored-by: eracah <[email protected]>

Fix torch 2.1.0 docker tag (mosaicml#2472)

60d3dc6

Upstream Generate Callback (mosaicml#2449)

15385b2

Upstreams and generalizes the callback that logs generations to wandb from foundry to composer.

Upgrade torch nightly docker image for 0.18.3 NCCL version (mosaicml#…

ec59026

…2476) Upgrade torch docker nightly version to 08-23-23 so that we get nccl version 0.18.3 which was merged on 08-18-23.

Test pytorch 2.1.0 docker images on ci/cd (mosaicml#2469)

a5ec1ac

Test pytorch 2.1.0 docker images on ci/cd mosaicml#2469

Fix huggingface tokenizer loading for slow tokenizers (mosaicml#2483)

145aeb8

Deprecate Fused LayerNorm (mosaicml#2475)

816a61b

Will be removed in v0.18.

Transformers upgrade (mosaicml#2489)

de68763

Update RTD build config with build.os (mosaicml#2490)

c4488b5

* Update RTD build config with build.os * Remove python.version --------- Co-authored-by: Bandish Shah <[email protected]>

bmosaicml and others added 26 commits December 20, 2023 15:28

Merge branch 'add_custom_stopping_criteria' into error_logging_callback

be32781

merge

e512a21

fix

a1af91a

finish

5f23b3e

finish

42fb431

fix bug

aa05076

Merge branch 'add_custom_stopping_criteria' into error_logging_callback

076731d

finish

89669c6

Merge branch 'error_logging_callback' of github.com:bmosaicml/compose…

95a7d28

…r into error_logging_callback

bug fix

dce4ef0

add keys

cb3c69d

Merge branch 'add_custom_stopping_criteria' into error_logging_callback

cea85d4

add correcT

7371e66

modify sync

c7f5198

diff split

786c64c

Merge branch 'add_custom_stopping_criteria' into error_logging_callback

559beee

fix typo

7f20954

Merge branch 'add_custom_stopping_criteria' into error_logging_callback

bd10cdd

edit condition

adf5bab

Merge branch 'dev' into error_logging_callback

4afd292

Merge branch 'dev' into error_logging_callback

b674b85

fix

059d071

fix

a674dfc

fix

e986ca3

fix

ef56d03

fix

936ebfc

This was referenced Feb 8, 2024

Output eval logging (batch level) #2977

Merged

Output eval logging batch mosaicml/llm-foundry#961

Merged

maxisawesome closed this Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Error logging callback #2533

[WIP] Error logging callback #2533

bmosaicml commented Sep 12, 2023 •

edited

Loading

maxisawesome commented Apr 1, 2024

[WIP] Error logging callback #2533

[WIP] Error logging callback #2533

Conversation

bmosaicml commented Sep 12, 2023 • edited Loading

What does this PR do?

What issue(s) does this change relate to?

Before submitting

maxisawesome commented Apr 1, 2024

bmosaicml commented Sep 12, 2023 •

edited

Loading