Output eval logging (batch level) (mosaicml#2977)

* prelim commit * fix max answer lengths for cot * add output logger * create eval output logger * fix pyright; git push * change dist reduce fx * change dist reduce fx * fix pyright * Add nightly docker image (mosaicml#2452) Add pytorch nightly and CUDA 12.1 support for composer docker images What issue(s) does this change relate to? Related to https://mosaicml.atlassian.net/browse/GRT-2305 Tests docker image: mosaicml/ci-staging:72744756-794c-4390-94db-72c212dd5e00 (cuda 12.1, pytorch 2.1.0) mcli connect temp-test-ZAVxMh Python 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> print(torch.version) <module 'torch.version' from '/usr/lib/python3/dist-packages/torch/version.py'> >>> print(torch.__version__) 2.1.0.dev20230623+cu121 >>> print(torch.version.cuda) 12.1 Integration Test @mvpatel2000 has validated that this trains on initial mpt-2 experiments and speeds up training by +7-8% from 0.25 MFU to 0.27 MFU * Fix local eval (mosaicml#2465) * fix autoresume with slashed directory * Revert "fix autoresume with slashed directory" This reverts commit 3dfb5f5. revert * fix * fix precommit * Update in_context_learning_evaluation.py * Update in_context_learning_evaluation.py * Update in_context_learning_evaluation.py * add tests * Add torch 2.1.0 args for github release-docker workflow * Log system metrics on each event (mosaicml#2412) Signed-off-by: Prithvi Kannan <[email protected]> Co-authored-by: Evan Racah <[email protected]> Co-authored-by: eracah <[email protected]> * Fix torch 2.1.0 docker tag (mosaicml#2472) * Upstream Generate Callback (mosaicml#2449) Upstreams and generalizes the callback that logs generations to wandb from foundry to composer. * Upgrade torch nightly docker image for 0.18.3 NCCL version (mosaicml#2476) Upgrade torch docker nightly version to 08-23-23 so that we get nccl version 0.18.3 which was merged on 08-18-23. * Test pytorch 2.1.0 docker images on ci/cd (mosaicml#2469) Test pytorch 2.1.0 docker images on ci/cd mosaicml#2469 * Fix huggingface tokenizer loading for slow tokenizers (mosaicml#2483) * Deprecate Fused LayerNorm (mosaicml#2475) Will be removed in v0.18. * Transformers upgrade (mosaicml#2489) * Update RTD build config with build.os (mosaicml#2490) * Update RTD build config with build.os * Remove python.version --------- Co-authored-by: Bandish Shah <[email protected]> * Upgrade torch docker version and github workflow tests (mosaicml#2488) * upgrade node version (mosaicml#2492) # What does this PR do? Security vulnerability in `semver` seen due to node. This PR upgrades the node version to bump up semver from 7.5.1 to 7.5.2 # Tests Action Run: https://github.com/mosaicml/composer/actions/runs/6017539089 Correct version of semver seen after upgrade: ``` #14 [pytorch_stage 7/24] RUN npm list -g semver --depth=1 #14 2.223 /usr/lib #14 2.223 `-- [email protected] #14 2.223 `-- [email protected] #14 2.223 #14 DONE 2.4s ``` * Gating tying modules w/ FSDP for torch 2.0 (mosaicml#2467) * Gating tying modules w/ FSDP * Changing weight tying filtering to be less aggressive * precommit formatting * Removing min_params (mosaicml#2494) * Removing min_params * formatting? * removing overlap with another commit * Fix torchmetrics backwards compatibility issue (mosaicml#2468) * add fix * fix tests * qwf * dsfg * add key * remove short * add map test * remove comment * filter warning * simplify wrapping * checkdown * fix torchmetrics * 300 * fix tests * remove metric * cleanup * bug fixes * fix lint * fix lint * fix test * lint * remove cuda * fix tests * fix ignore * fix loading * fix test * save ckpt --------- Co-authored-by: Mihir Patel <[email protected]> Co-authored-by: Daniel King <[email protected]> Co-authored-by: Your Name <[email protected]> * Adding some fixes to FSDP tests (mosaicml#2495) * Adding some fixes to FSDP tests * Add filter warnings * fail count (mosaicml#2496) * Remove PR curve metrics from backward compatibility test and skip torch 1.13 (mosaicml#2497) * filter warning (mosaicml#2500) * bump version (mosaicml#2498) * Skip metrics in state dict (mosaicml#2501) * skip metrics in state dict * fix unit tests * Add peak memory stats (mosaicml#2504) * add peak memory stats * fix tests * fix sharded ckpt (mosaicml#2505) * Bump gitpython from 3.1.31 to 3.1.34 (mosaicml#2509) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.31 to 3.1.34. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](gitpython-developers/GitPython@3.1.31...3.1.34) --- updated-dependencies: - dependency-name: gitpython dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Annotate `torch_prof_remote_file_name` as Optional (mosaicml#2512) The `torch_prof_remote_file_name` argument of `Profiler` is passed as the `remote_file_name` argument of `TorchProfiler`, which supports passing `None` to disable uploading trace files. Prior to this commit, passing `None` to `Profiler` to do this whilst using a static type checker led to a type error. * fix: when there is no train_metrics, do not checkpoint (mosaicml#2502) * Remove metric saving (mosaicml#2514) * no metric save * fix docs * checkdown * fix tests * filter warning * move to device * fix device gpu * Update composer/core/state.py Co-authored-by: Daniel King <[email protected]> --------- Co-authored-by: Daniel King <[email protected]> * Fix daily tests by removing gpu marker (mosaicml#2515) * Refactor mosaic_fsdp.py (mosaicml#2506) * Refactor mosaic_fsdp.py * Format file * Rename monkey patch function * Fix import path * Format files * Fix version * fix pr (mosaicml#2517) * Add custom sharding to ChunkShardingSpec (mosaicml#2507) * Refactor mosaic_fsdp.py * Format file * Rename monkey patch function * Fix import path * Format files * Fix version * Fix import path * Monkey patch ChunkShardingSpec to dynamically detect sharding dim * Format file * Add non divisible functionality to ChunkShardingSpec * Format file * Format file * Update nightly docker image to torch nightly 09-03-23 (mosaicml#2518) * Update pre-commit in setup.py (mosaicml#2522) * Add FSDP custom wrap with torch 2.1 (mosaicml#2460) * add torch2 * add code * tag more changes * Update composer/trainer/mosaic_fsdp.py Co-authored-by: Vitaliy Chiley <[email protected]> * monkeypatch init * raise pins * add print * more logs * change if statements * remove imports * remove imports * fix init * fix versioning * add hybrid shard * checkdown * revert hsdp * add peak memory stats * lint * imports * Update composer/trainer/mosaic_fsdp.py Co-authored-by: Daniel King <[email protected]> * fix wrap * fix gate * lint * test * change thresh * import typing * fix checks * nuke pyright * typo * Update composer/trainer/mosaic_fsdp.py Co-authored-by: Brian <[email protected]> * Update composer/trainer/mosaic_fsdp.py Co-authored-by: Brian <[email protected]> * Update composer/trainer/mosaic_fsdp_utils.py Co-authored-by: Brian <[email protected]> * resolve comments * add comments * add comments --------- Co-authored-by: Vitaliy Chiley <[email protected]> Co-authored-by: Daniel King <[email protected]> Co-authored-by: Brian <[email protected]> * Fix GCSObjectStore bug where hmac keys auth doesn't work (mosaicml#2519) * prelim commit * add output logger * create eval output logger * change dist reduce fx * Bump gitpython from 3.1.34 to 3.1.35 (mosaicml#2525) Bumps [gitpython](https://github.com/gitpython-developers/GitPython) from 3.1.34 to 3.1.35. - [Release notes](https://github.com/gitpython-developers/GitPython/releases) - [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES) - [Commits](gitpython-developers/GitPython@3.1.34...3.1.35) --- updated-dependencies: - dependency-name: gitpython dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pytest from 7.4.0 to 7.4.2 (mosaicml#2523) Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.0 to 7.4.2. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@7.4.0...7.4.2) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Upgrade to mlflow version 2.5.0 (mosaicml#2528) * disable cifar daily (mosaicml#2527) * mosaicml logger robustness improvements (mosaicml#2530) * Fix metrics keys sort in DecoupledAdamW for OptimizerMonitor FSDP metric agreggation (mosaicml#2531) * Fix github actions for GCS integration testing (mosaicml#2532) * fix github actions * make gpu test * change dist reduce fx * fix pyright * Fix GCS tests (mosaicml#2535) * add PR tests * fix test * remove pr daily * remove pr daily * finish error logging cb * fix * add import to init * add import to init * add import to init * add file writing * add file writing * add file writing * add file writing * add file writing * move tensors to cpu * remove tensors * remove tensors * remove tensors * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * add prompt to qa * try debugging dist sync issue * nit * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * debugging * fix syncing of non tensor state * added gpu test * fix error * finish testing callback * fix all errors * test commit * roll back test commit * remove ranks * re-tesT * add custome gen kwargs and stopping on eos token * modify test * modify test * finish * finish * finish * finish * finish pr * implement early stop * add tesT * merge * fix * finish * finish * fix bug * finish * bug fix * add keys * add correcT * modify sync * diff split * fix typo * edit condition * broken wip * design demonstration commit * simplify pr * further simplify * wip * add comments * add other icl metrics * wip * change dict method, add more stuff to logging * fix typos, change some comments * decode tensors, fix wrong dict key * fix mc * 1 to 0 lol * wip linting * adjust to step logging * adjust logging names * add mflow, rm batch keys * add comments, check for dict in huggingface model update_metric * add user specified logging * move metric_name duplication to update_metric * wip fix testing * fix input shape error * rm init * rm eval_after_all * step=None * step=state.timestamp.batch.value * update name to include step * linting, wip on test * fix test * pyright wip * add non-batch warning * pyright * debug * rm this commit that wasn't the right branch * log at the end of training * rm silly wandb table logging * add run_name * add docstring * add debug logging * more logging * rm info logging * improve comments * Update composer/callbacks/eval_output_logging_callback.py Co-authored-by: Evan Racah <[email protected]> * rm logging bool * fix logging for schema tasks * fix schema / mc tasks * yapf * rm reshape * fix tests * cleanup test * pyright * pyright * docstring * pyright * update tests * rm attention mask requirement * Update composer/metrics/nlp.py Co-authored-by: Mihir Patel <[email protected]> * Update composer/metrics/nlp.py Co-authored-by: Mihir Patel <[email protected]> * rm todo * lint * lint * lint * more lint --------- Signed-off-by: Prithvi Kannan <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Jeremy Dohmann <[email protected]> Co-authored-by: Jeremy D <[email protected]> Co-authored-by: Charles Tang <[email protected]> Co-authored-by: Rishab Parthasarathy <[email protected]> Co-authored-by: Prithvi Kannan <[email protected]> Co-authored-by: Evan Racah <[email protected]> Co-authored-by: eracah <[email protected]> Co-authored-by: Irene Dea <[email protected]> Co-authored-by: Daniel King <[email protected]> Co-authored-by: nik-mosaic <[email protected]> Co-authored-by: bandish-shah <[email protected]> Co-authored-by: Bandish Shah <[email protected]> Co-authored-by: bcui19 <[email protected]> Co-authored-by: Mihir Patel <[email protected]> Co-authored-by: Your Name <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Scott Stevenson <[email protected]> Co-authored-by: furkanbiten <[email protected]> Co-authored-by: Brian <[email protected]> Co-authored-by: Vitaliy Chiley <[email protected]> Co-authored-by: Nicholas Garcia <[email protected]> Co-authored-by: Mikhail Kolesov <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Tessa Barton <[email protected]>
mvpatel2000 · Mar 9, 2024 · 594eaef · 594eaef
1 parent c5869d2
commit 594eaef
Show file tree

Hide file tree

Showing 11 changed files with 533 additions and 11 deletions.
diff --git a/composer/callbacks/__init__.py b/composer/callbacks/__init__.py
@@ -9,6 +9,7 @@
 from composer.callbacks.activation_monitor import ActivationMonitor
 from composer.callbacks.checkpoint_saver import CheckpointSaver
 from composer.callbacks.early_stopper import EarlyStopper
+from composer.callbacks.eval_output_logging_callback import EvalOutputLogging
 from composer.callbacks.export_for_inference import ExportForInferenceCallback
 from composer.callbacks.free_outputs import FreeOutputs
 from composer.callbacks.generate import Generate
@@ -35,6 +36,7 @@
     'CheckpointSaver',
     'MLPerfCallback',
     'EarlyStopper',
+    'EvalOutputLogging',
     'ExportForInferenceCallback',
     'ThresholdStopper',
     'ImageVisualizer',

diff --git a/composer/callbacks/eval_output_logging_callback.py b/composer/callbacks/eval_output_logging_callback.py
@@ -0,0 +1,115 @@
+# Copyright 2022 MosaicML Composer authors
+# SPDX-License-Identifier: Apache-2.0
+
+"""Log model outputs and expected outputs during ICL evaluation."""
+
+import warnings
+from copy import deepcopy
+from typing import Any, Dict, List, Sequence, Union
+
+import torch
+
+from composer.core import Callback, State
+from composer.loggers import ConsoleLogger, Logger
+from composer.utils.dist import all_gather_object
+
+
+class EvalOutputLogging(Callback):
+    """Logs eval outputs for each sample of each ICL evaluation dataset.
+
+    ICL metrics are required to support caching the model's responses including information on whether model was correct.
+    Metrics are responsible for returning the results of individual datapoints in a dictionary of lists.
+    The callback will log the metric name, the depadded and detokenized input, any data stored in state.metric_outputs, and
+    any keys from the batch pased into `batch_keys_to_log`. It will do so after every eval batch.
+    """
+
+    def __init__(self, log_tokens=False, *args, **kwargs):
+        super().__init__(self, *args, **kwargs)
+        self.log_tokens = log_tokens
+        self.columns = None
+        self.name = None
+        self.rows = []
+
+    def eval_batch_end(self, state: State, logger: Logger) -> None:
+        if not isinstance(state.batch, Dict):
+            warnings.warn(
+                f'''EvalOutputLogging only supports batches that are dictionary. \
+                Found batch for type {type(state.batch)}. \
+                Not logging eval outputs.''',
+            )
+            return
+
+        assert state.outputs is not None
+        assert state.metric_outputs is not None
+        logging_dict: Dict[str, Union[List[Any], torch.Tensor, Sequence[torch.Tensor]]] = deepcopy(state.metric_outputs)
+
+        # If batch mode is not generate, outputs will be logits
+        if state.batch['mode'] == 'generate':
+            # Outputs are already detokenized
+            logging_dict['outputs'] = state.outputs
+
+        input_ids = state.batch['input_ids']
+        logged_input = []
+        assert state.dataloader is not None
+
+        # Depad and decode input_ids
+        for input_list in input_ids.tolist():
+            dataset = state.dataloader.dataset  # pyright: ignore[reportGeneralTypeIssues]
+            depadded_input = [tok for tok in input_list if tok != dataset.pad_tok_id]
+            logged_input.append(dataset.tokenizer.decode(depadded_input))
+        logging_dict['input'] = logged_input
+
+        # Log token indices if toggled
+        if self.log_tokens:
+            logging_dict['input_tokens'] = input_ids.tolist()
+            if not state.batch['mode'] == 'generate':
+                if isinstance(state.outputs, torch.Tensor):  # pyright
+                    logging_dict['label_tokens'] = state.outputs.tolist()
+
+        # Add run_name as a column
+        run_name_list = [state.run_name for _ in range(0, len(logging_dict['input']))]
+        logging_dict['run_name'] = run_name_list
+
+        # NOTE: This assumes _any_ tensor logged are tokens to be decoded.
+        #       This might not be true if, for example, logits are logged.
+
+        # Detokenize data in rows
+        for key, value in logging_dict.items():
+            # All types in list are the same
+            if isinstance(value[0], torch.Tensor):
+                logging_dict[key] = [
+                    state.dataloader.dataset.tokenizer.decode(t)  # pyright: ignore[reportGeneralTypeIssues]
+                    for t in value
+                ]
+            elif isinstance(value[0], list):
+                if isinstance(value[0][0], torch.Tensor):
+                    tokenizer = state.dataloader.dataset.tokenizer  # pyright: ignore[reportGeneralTypeIssues]
+                    logging_dict[key] = [[tokenizer.decode(choice) for choice in t] for t in value]
+
+        # Convert logging_dict from kv pairs of column name and column values to a list of rows
+        # Example:
+        # logging_dict = {"a": ["1a", "2a"], "b": ["1b", "2b"]}
+        # will become
+        # columns = {"a", "b"}, rows = [["1a", "1b"], ["2a", "2b"]]
+        columns = list(logging_dict.keys())
+        rows = [list(item) for item in zip(*logging_dict.values())]
+
+        assert state.dataloader_label is not None
+        if not self.name:
+            # If only running eval, step will be 0
+            # If running training, step will be current training step
+            step = state.timestamp.batch.value
+            self.name = f'{state.dataloader_label}_step_{step}'
+            self.columns = columns
+        self.rows.extend(rows)
+
+    def eval_end(self, state: State, logger: Logger) -> None:
+        list_of_rows = all_gather_object(self.rows)
+        rows = [row for rows in list_of_rows for row in rows]
+        for dest_logger in logger.destinations:
+            if not isinstance(dest_logger, ConsoleLogger):
+                dest_logger.log_table(self.columns, rows, name=self.name, step=state.timestamp.batch.value)
+
+        self.rows = []
+        self.name = None
+        self.columns = None
diff --git a/composer/core/state.py b/composer/core/state.py
@@ -549,6 +549,8 @@ def __init__(
         self.eval_metric_values: Dict[str, float] = {}
         self.total_loss_dict: Dict[str, float] = {}
 
+        self.metric_outputs: Dict[str, Any] = {}
+
     def _dataset_of(self, dataloader: Optional[Union[Evaluator, DataSpec, DataLoader, Iterable]]) -> Optional[Dataset]:
         """Get the dataset contained by the given dataloader-like object.
 

diff --git a/composer/loggers/in_memory_logger.py b/composer/loggers/in_memory_logger.py
@@ -87,8 +87,11 @@ def log_table(
                 conda_package='pandas',
                 conda_channel='conda-forge',
             ) from e
-        table = pd.DataFrame.from_records(data=rows, columns=columns).to_json(orient='split', index=False)
-        assert isinstance(table, str)
+        table = pd.DataFrame.from_records(data=rows,
+                                          columns=columns).to_json(orient='split', index=False, force_ascii=False)
+        assert table is not None
+        # Merged assert is different
+        # assert isinstance(table, str)
         self.tables[name] = table
 
     def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:

diff --git a/composer/loggers/wandb_logger.py b/composer/loggers/wandb_logger.py
@@ -112,6 +112,8 @@ def __init__(
         self.run_dir: Optional[str] = None
         self.run_url: Optional[str] = None
 
+        self.table_dict = {}
+
     def _set_is_in_atexit(self):
         self._is_in_atexit = True
 
@@ -130,7 +132,7 @@ def log_table(
         if self._enabled:
             import wandb
             table = wandb.Table(columns=columns, rows=rows)
-            wandb.log({name: table}, step)
+            wandb.log({name: table}, step=step)
 
     def log_metrics(self, metrics: Dict[str, Any], step: Optional[int] = None) -> None:
         if self._enabled: