Skip to content

Releases: mosaicml/composer

v0.15.0

20 Jun 19:04
9983bcd
Compare
Choose a tag to compare

🚀 Composer v0.15.0

What's New

  1. Exact Eval (#2218)

    Composer now supports exact evaluation! Now, evaluation will give the exact same results regardless of the number of GPUs by removing any duplicated samples from the dataloader.

  2. Monolithic Checkpoint Loading (#2288)

    When training large models, loading the model and optimizer on every rank can use up all the system memory. With FSDP, Composer can now load the model and optimizer on only rank 0 and broadcast it to all other ranks. To enable:

    from composer import Trainer
    
    # Construct Trainer
    trainer = Trainer(
       ...,
       fsdp_config={
          load_monolith_rank0_only: True
       },
    )
    
    # Train!
    trainer.fit()

    and ensure the model on rank 0 is on CPU/GPU (as opposed to meta).

  3. Spin Dataloaders

    By default, Composer spins dataloaders back to the current timestamp to ensure deterministic resumption. However, dataloader spinning can be very slow, so Trainer now has a new flag to disable spinning if determinism is not required. To enable:

    from composer import Trainer
    
    # Construct Trainer
    trainer = Trainer(
       ...,
       spin_dataloaders=False,
    )
    
    # Train!
    trainer.fit()

Deprecations

  • HealthChecker is now deprecated and will be removed in v0.17.0

Bug Fixes

What's Changed

New Contributors

Read more

v0.14.1

05 May 05:46
7da93f8
Compare
Choose a tag to compare

Bug Fixes

Fixes a bug related to sentpiece tokenizers and ICL eval.

What's Changed

Full Changelog: v0.14.0...v0.14.1

v0.14.0

03 May 15:42
5ba2a60
Compare
Choose a tag to compare

🚀 Composer v0.14.0

Composer v0.14.0 is released! Install via pip:

pip install composer==0.14.0

The legacy package name still works via pip:

pip install mosaicml==0.14.0

New Features

  1. 🆕 PyTorch 2.0 Support (#2172)

    We're thrilled to announce official support for PyTorch 2.0! We've got all initial unit tests passing and run through our examples. We've also made some updates to start taking advantage of all the great new features.

    Initial support also includes:

    • Support for torch.compile

      Model Dataset Without compile thoughput/samples_per_sec With compile thoughput/samples_per_sec Performance %
      ResNet50 ImageNet 5557 7424 33.60%
      DeepLab V3 ADE20K 81.60 98.82 21.10%
      HF BERT C4 3360 4259 26.75%
      HF Causal LM C4 50.61 103.29 100.05%

      To start using, simply add compile_config argument to the Trainer:

        # To use default `torch.compile` config
        trainer = Trainer(
           ...,
           compile_config={},
        )
      
        # To use custom `torch.compile` config, provide an argument as a dictionary, for example:
        trainer = Trainer(
           ...,
           compile_config={'mode': 'reduce-overhead'},
        )
        

      The Trainer also supports pre-compiled models passed via the models argument. If the model has been pre-compiled, the compile_config argument is ignored if provided.

      Note: We recommend baselining your model with and without torch.compile as there are scenarios where enabling compile does not yield any throughput improvements and in some cases where this can lead to a regression.

    • PyTorch 2.0 Docker Images

      We've added the following new official MosaicML Docker Images with PyTorch 2.0 support:

      Linux Distro Flavor PyTorch Version CUDA Version Python Version Docker Tags
      Ubuntu 20.04 Base 2.0.0 11.7.1 (Infiniband) 3.10 mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04
      Ubuntu 20.04 Base 2.0.0 11.7.1 (EFA) 3.10 mosaicml/pytorch:2.0.0_cu117-python3.10-ubuntu20.04-aws
      Ubuntu 20.04 Base 2.0.0 cpu 3.10 mosaicml/pytorch:2.0.0_cpu-python3.10-ubuntu20.04
      Ubuntu 20.04 Vision 2.0.0 11.7.1 (Infiniband) 3.10 mosaicml/pytorch_vision:2.0.0_cu117-python3.10-ubuntu20.04
      Ubuntu 20.04 Vision 2.0.0 cpu 3.10 mosaicml/pytorch_vision:2.0.0_cpu-python3.10-ubuntu20.04
  2. 🦾 New Callbacks

    • Activation monitor (#2066)

      Monitors activations in the network. Every interval batches it will attach a forwards hook and logs the max, average, l2 norm, and kurtosis for the input and output activations. To enable:

      from composer import Trainer
      from composer.callbacks import ActivationMonitor
      
      # Construct Trainer
      trainer = Trainer(
         ...,
         callbacks=[ActivationMonitor()],
      )
      
      # Train!
      trainer.fit()
    • Slack Logger (#2133)

      You can now send custom training metrics using Slack! To enable:

      from composer import Trainer
      from composer.loggers import SlackLogger
      
      transform = transforms.Compose([transforms.ToTensor()])
      
      
      trainer = Trainer(
         ...
         loggers=[
             SlackLogger(
                 log_interval="10ba", # or 1ep, 2ep 
                 include_keys=["algorithm_traces*", "loss*"],
                 formatter_func=(lambda data, **kwargs:
                    [
                        {
                            "type": "section", "text": {"type": "mrkdwn", "text": f"*{k}:* {v}"}
                        }
                        for k, v in data.items()
                    ])
             )
         ],
      )
      
      trainer.fit()

      Please see PR #2133 for additional details.

API changes

  • The grad_accum argument has been removed from Trainer, users are now required to use device_train_microbatch_size instead (#2040)

Deprecations

  • We no longer support PyTorch 1.11 and 1.12 due to security vulnerabilities. New features will not be tested against these versions.

Bug Fixes

  • Eval subset num batches bug fix (#2028)
  • Protect for missing slack_sdk import (#2031)
  • Adjust HuggingFaceModel token embedding resizing to only occur when necessary (#2027)
  • Update FSDP meta weight tying tests to include precision testing (#2050)
  • Backward Compat with Torchmetrics (#2046)
  • Busy wait for local rank 0 download to avoid timeout on large file download (#2054)
  • Fix OCIObjectStore save_overwrite=False bug (#2053)
  • Busy wait so that non local rank zeros don't timeout while local rank zero downloads a monolithic checkpoint (#2071)
  • Skip extra downloads when not using a format string (#2073)
  • fix name_or_path usage in HF save/load usage (#2075)
  • Fix EMA resumption issue with calling trainer.eval() before trainer.fit() (#2088)
  • Patch EMA with FSDP (#2091)
  • Updating gradient clipping to be torch 2.0 compatible (#2089)
  • Adding checks for weight tying s.t. we don't think None attributes are weight tied (#2103)
  • gate the extra forward call specifically for fsdp (#2102)
  • Allow user to set ONNX opset version when Exporting for Inference (#2101)
  • Runtime estimator (#2124)
  • Use state_dict Torchmetrics Serialization (#2116)
  • Fix filelock in checkpoint download (#2184)

What's Changed

Read more

v0.13.5

24 Apr 20:54
Compare
Choose a tag to compare

Full Changelog: v0.13.4...v0.13.5

  • Add support for EMA + FSDP

v0.13.4

05 Apr 02:55
Compare
Choose a tag to compare

Full Changelog: v0.13.3...v0.13.4

Bumps streaming version pin to <1.0

v0.13.3

04 Apr 20:35
Compare
Choose a tag to compare

🚀 Composer v0.13.3

Introducing the composer PyPi package!

Composer v0.13.3 is released!

Composer can also now be installed using the new composer PyPi package via pip:

pip install composer==0.13.3

The legacy package name still works via pip:

pip install mosaicml==0.13.3

Bug Fixes

What's Changed

Full Changelog: v0.13.2...v0.13.3

v0.13.2

31 Mar 23:45
Compare
Choose a tag to compare

🚀 Composer v0.13.2

Introducing the composer PyPi package!

Composer v0.13.2 is released!

Composer can also now be installed using the new composer PyPi package via pip:

pip install composer==0.13.2

The legacy package name still works via pip:

pip install mosaicml==0.13.2

Bug Fixes

  • test and fix composer package name usage in composer_collect_env (#2049)
  • Backward Compat with Torchmetrics by @mvpatel2000 (#2046)
  • Fix OCIObjectStore save_overwrite=False bug (#2053)
  • busy wait for the rank 0 download (#2071)
  • Skip extra downloads when not using a format string (#2073)

What's Changed

Full Changelog: v0.13.1...v0.13.2

v0.13.1

07 Mar 03:11
Compare
Choose a tag to compare

🚀 Composer v0.13.1

Introducing the composer PyPi package!

Composer v0.13.1 is released!

Composer can also now be installed using the new composer PyPi package via pip:

pip install composer==0.13.1

The legacy package name still works via pip:

pip install mosaicml==0.13.1

Note: The mosaicml==0.13.0 PyPi package was yanked due to some minor packaging issues discovered after release. The package was re-released as Composer v0.13.1, thus these release notes contain details for both v0.13.0 and v0.13.1.

New Features

  1. 🤙 New and Updated Callbacks

    • New HealthChecker Callback (#2002)

      The callback will log a warning if the GPUs on a given node appear to be in poor health (low utilization). The callback can also be configured to send a Slack message!

      from composer import Trainer
      from composer.callbacks import HealthChecker
      
      # Warn if GPU utilization difference drops below 10%
      health_checker = HealthChecker(
          threshold = 10
      )
      
      # Construct Trainer
      trainer = Trainer(
          ...,
          callbacks=health_checker,
      )
      
      # Train!
      trainer.fit()
    • Updated MemoryMonitor to use GigaBytes (GB) units (#1940)

    • New RuntimeEstimator Callback (#1991)

      Estimate the remaining runtime of your job! Approximates the time remaining by observing the throughput and comparing to the number of batches remaining.

      from composer import Trainer
      from composer.callbacks import RuntimeEstimator
      
      # Construct trainer with RuntimeEstimator callback
      trainer = Trainer(
          ...,
          callbacks=RuntimeEestimator(),
      )
      
      # Train!
      trainer.fit()
    • Updated SpeedMonitor throughput metrics (#1987)

      Expands throughput metrics to track relative to several different time units and per device:

      • throughput/batches_per_sec and throughput/device/batches_per_sec
      • throughput/tokens_per_sec and throughput/device/tokens_per_sec
      • throughput/flops_per_sec and throughput/device/flops_per_sec
      • throughput/device/samples_per_sec

      Also adds throughput/device/mfu metric to compute per device MFU. Simply enable the SpeedMonitor callback per usual to log these new metrics! Please see SpeedMonitor documentation for more information.

  2. ⣿ FSDP Sharded Checkpoints (#1902)

    Users can now specify the state_dict_type in the fsdp_config dictionary to enable sharded checkpoints. For example:

    from composer import Trainer
    
    fsdp_confnig = {
        'sharding_strategy': 'FULL_SHARD',
        'state_dict_type': 'local',
    }
    
    trainer = Trainer(
        ...,
        fsdp_config=fsdp_config,
        save_folder='checkpoints',
        save_filename='ba{batch}_rank{rank}.pt',
        save_interval='10ba',
    )

    Please see the PyTorch FSDP docs and Composer's Distributed Training notes for more information.

  3. 🤗 HuggingFace Improvements

    • Update HuggingFaceModel class to support encoder-decoder batches without decoder_input_ids (#1950)
    • Allow evaluation metrics to be passed to HuggingFaceModel directly (#1971)
    • Add a utility function to load a Composer checkpoint of a HuggingFaceModel and write out the expected config.json and pytorch_model.bin in the HuggingFace pretrained folder (#1974)
  4. 🛟 Nvidia H100 Alpha Support - Added amp_fp8 data type

    In preparation for H100's arrival, we've added the amp_fp8 precision type. Currently setting amp_fp8 specifies a new precision context using transformer_engine.pytorch.fp8_autocast. For more details, please see Nvidia's new Transformer Engine and the specific fp8 recipe we utilize.

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        precision='amp_fp8',
    )

API changes

  • The torchmetrics package has been upgraded to 0.11.x.

    The torchmetrics.Accuracy metric now requires a task argument which can take on a value of binary, multiclass or multilabel. Please see Torchmetrics Accuracy docs for details.

    Additonally, since specifying value='multiclass' requires an additional field of num_classes to be specified, we've had to update ComposerClassifier to accept the additional num_classes argument. Please see PR's #2017 and #2025 for additional details

  • Surgery algorithms used in functional form return a value of None (#1543)

Deprecations

  • Deprecate HFCrossEntropy and Perplexity (#1857)
  • Remove Jenkins CI (#1943, #1954)
  • Change Deprecation Warnings to Warnings for specifying ProgressBarLogger and ConsoleLogger to loggers (#1846)

Bug Fixes

  • Fixed an issue introduced in 0.12.1 where HuggingFaceModel crashes if config.return_dict = False (#1948)
  • Refactor EMA to improve memory efficiency (#1941)
  • Make wandb checkpoint logging compatible with wandb model registry (#1973)
  • Fix ICL race conditions (#1978)
  • Update epoch metric name to trainer/epoch (#1986)
  • reset scaler (#1999)
  • Bug/sync optimization logger across ranks (#1970)
  • Update Docker images to fix resolve vulnerability scan issues (#2007)
  • Fix eval duplicate logging issue (#2018)
  • extend test and patch bug (#2028)
  • Protect for missing slack_sdk import (#2031)

Known Issues

  • Docker Image Security Vulnerability
    • CVE-2022-45907: The mosaicml/pytorch:1.12.1*, mosaicml/pytorch:1.11.0*, mosaicml/pytorch_vision:1.12.1* and mosaicml/pytorch_vision:1.11.0* images are impacted and currently supported for legacy use cases. We recommend users upgrade to images with PyTorch >1.13. The affected images will be removed in the next Composer release.

What's Changed

Read more

v0.13.0

07 Mar 03:10
3618c63
Compare
Choose a tag to compare

This release has been yanked due to a minor packaging issue, please skip directly to Composer v0.13.1

What's Changed

New Contributors

Full Changelog: v0.12.1...v0.13.0

v0.12.1

05 Feb 09:19
Compare
Choose a tag to compare

🚀 Composer v0.12.1

Composer v0.12.1 is released! Install via pip:

pip install --upgrade mosaicml==0.12.1

New Features

  1. 📚 In-Context Learning (#1876)

    With Composer and MosaicML Cloud you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. Please see our "Blazingly Fast LLM Evaluation for In-Context Learning" blog post for more details!

  2. 💾 Added support for Coreweave Object Storage (#1915)

    Coreweave object store is compatible with boto3. Uploading objects to Coreweave object store is almost exactly like writing to using S3, except an endpoint_url must be set via the S3_ENDPOINT_URLenvironment variable. For example:

    import os
    os.environ['S3_ENDPOINT_URL'] = 'https://object.las1.coreweave.com'
    
    from composer.trainer import Trainer
    
    # Save checkpoints every epoch to s3://my_bucket/checkpoints
    trainer = Trainer(
        model=model,
        train_dataloader=train_dataloader,
        max_duration='10ep',
        save_folder='s3://my_bucket/checkpoints',
        save_interval='1ep',
        save_overwrite=True,
        save_filename='ep{epoch}.pt',
        save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
     )
    
     trainer.fit()

    Please see our checkpointing documentation for more details.

  3. 🪵 Automatic logging of Trainer hparams (#1855)

    Hyperparameter arguments passed to the Trainer are now automatically logged. Simply set the Trainer argument auto_log_hparams=True.

Bug Fixes

  • Update Docker images to use ‘posix_prefix’ paths (#1854)
  • Disable new notebook in CI (#1875)
  • [Fix] Enable logging of metrics from Callbacks to ConsoleLogging (#1884)
  • Ensure loggers run init event before callbacks in Engine (#1890)
  • Raise an error in FSDP meta tensor initialization if there's no initialization functions, fix associated flaky FSDP test (#1905)
  • Add primitive list support (#1906)
  • Add logic for shifting labels before computing metrics (#1913)
  • Fixes mis specified dependency (#1919)
  • pin setuptools in build requirements (#1926)
  • Pin pip<23 in Docker images (#1936)
  • Fix bug in trainer.eval and add test cases for test_console_logger (#1937)

What's Changed

Read more