Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #64

Merged
merged 117 commits into from
Apr 4, 2024
Merged

merge upstream #64

merged 117 commits into from
Apr 4, 2024

Commits on Aug 12, 2023

  1. Fixed final value of cosine decay lr (#1011)

    * Fixed final value of cosine decay lr
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    kshitijkg and github-actions authored Aug 12, 2023
    Configuration menu
    Copy the full SHA
    eda5aae View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    01b5e22 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2023

  1. Update Dockerfile (#1014)

    * Update Dockerfile
    
    * Update Dockerfile
    xu-song authored Aug 22, 2023
    Configuration menu
    Copy the full SHA
    d8bcd97 View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2023

  1. README Update (#1017)

    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Aug 28, 2023
    Configuration menu
    Copy the full SHA
    43ea51c View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2023

  1. Bump transformers version and update enwik8 link (#1024)

    * Update transformers version
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update the enwik8 URL to the one HF uses, the old one is down.
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    dashstander and github-actions authored Sep 13, 2023
    Configuration menu
    Copy the full SHA
    2922bef View commit details
    Browse the repository at this point in the history

Commits on Sep 15, 2023

  1. Configuration menu
    Copy the full SHA
    960ed3d View commit details
    Browse the repository at this point in the history
  2. Fix broken link (#1022)

    * Update README.md
    
    Fix broken link
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 15, 2023
    Configuration menu
    Copy the full SHA
    97e376c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7821aa7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    cdc94ee View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    737c913 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c883e8c View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2023

  1. Fix bf16 for zero > 0 and pipeline parallelism > 0 (#1032)

    * Fix bugs so we can use bf16 with zero > 0
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Typo
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Typo
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * With the DeepSpeed updates there may be no need to do grad_accum in fp32
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Add warning about necessity of fp32 grad_accum with bf16, pp>0, and zero1
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    dashstander and github-actions authored Sep 18, 2023
    Configuration menu
    Copy the full SHA
    d9166bf View commit details
    Browse the repository at this point in the history
  2. Remove support for lazy dataset implementation (#1033)

    * Remove lazy dataset implementation option
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 18, 2023
    Configuration menu
    Copy the full SHA
    fcd5f92 View commit details
    Browse the repository at this point in the history
  3. Fix SequentialWrapper Generation (pipe_parallel_size = 0) (#1031)

    * Fix SequentialGeneration
    
    * Fix SequentialGeneration
    xu-song authored Sep 18, 2023
    Configuration menu
    Copy the full SHA
    70af6e8 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2023

  1. Configuration menu
    Copy the full SHA
    8903a96 View commit details
    Browse the repository at this point in the history
  2. Fix register_buffer parameter (#1036)

    * Fix register_buffer parameter
    
    * Fix register_buffer parameter
    xu-song authored Sep 20, 2023
    Configuration menu
    Copy the full SHA
    0ce77ab View commit details
    Browse the repository at this point in the history
  3. Add flash 2.x message to README.md (#1037)

    * Add flash 2.x message to README.md
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Quentin-Anthony and github-actions authored Sep 20, 2023
    Configuration menu
    Copy the full SHA
    444c0ef View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2023

  1. Add s3 checkpoint syncing (#1010)

    * add s3 checkpoint syncing
    
    * Update NeoXArgs docs automatically
    
    * remove CPCargo requirement
    
    * Update NeoXArgs docs automatically
    
    * Make s3 imports try-except and separate requirements to s3 file
    
    * Update NeoXArgs docs automatically
    
    * Announce feature
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 23, 2023
    Configuration menu
    Copy the full SHA
    f9503b7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    390d37c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e431ff5 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2023

  1. Remove the NeoX implementation of GPT2Tokenizer (#1042)

    * Try out just using the HF implementation
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Rely solely on HF tokenizer.
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    dashstander and github-actions authored Sep 25, 2023
    Configuration menu
    Copy the full SHA
    2ab05be View commit details
    Browse the repository at this point in the history
  2. Pre-compute RoPE embeddings in fp32 (#1041)

    * Pre-commit
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Sequence dimension is 0
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 25, 2023
    Configuration menu
    Copy the full SHA
    3bfedf4 View commit details
    Browse the repository at this point in the history

Commits on Sep 27, 2023

  1. Patch LR Annealing Bug (#1046)

    * Ensure that LR annealing is correct even after loading from checkpoint. Patch from Eric Nguyen
    
    Co-authored-by: Eric Nguyen <[email protected]>
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Test whether we need the whole patch
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Turns out we do not need the entire patch, just one line
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: Eric Nguyen <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    3 people authored Sep 27, 2023
    Configuration menu
    Copy the full SHA
    ba51ca0 View commit details
    Browse the repository at this point in the history
  2. Improve FLOPS Calculation (#1044)

    * Use Megatron-DeepSpeed flops calculation
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Use Megatron-DeepSpeed flops calculation
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * Direct comparison of FLOPS calculations
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Remove test logging
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 27, 2023
    Configuration menu
    Copy the full SHA
    5f36401 View commit details
    Browse the repository at this point in the history

Commits on Sep 28, 2023

  1. adding boilerplate coverity scan to submit to public analysis (#1047)

    * adding boilerplate coverity scan to submit to public analysis
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 28, 2023
    Configuration menu
    Copy the full SHA
    5fa85ad View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2023

  1. Add section to the README detailing how to start distributed jobs (#1048

    )
    
    * Add documentation about kicking off distributed jobs
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Add documentation about kicking off distributed jobs
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Add documentation about kicking off distributed jobs
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Added more info on run command modification and cleaned up a bit
    
    * slight cleanup
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Sep 29, 2023
    Configuration menu
    Copy the full SHA
    f44db66 View commit details
    Browse the repository at this point in the history
  2. Fix readme typos (#1049)

    * Fix readme typo
    
    * Update NeoXArgs docs automatically
    
    * More typos
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Quentin-Anthony and github-actions authored Sep 29, 2023
    Configuration menu
    Copy the full SHA
    2c60645 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b14d6f7 View commit details
    Browse the repository at this point in the history
  4. Update CITATION.cff (#1053)

    * Update CITATION.cff
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Quentin-Anthony and github-actions authored Sep 29, 2023
    Configuration menu
    Copy the full SHA
    93cac79 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2023

  1. Configuration menu
    Copy the full SHA
    7a8569f View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2023

  1. Organize the tools directory (#1055)

    * Re-organize the  folder
    
    Co-authored-by: Stella Biderman <[email protected]>
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Add README.md files for each subdirectory.
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Clarify the difference between HF scripts
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    * Fix tools paths
    
    * Update NeoXArgs docs automatically
    
    * flesh out ckpts README
    
    * Update NeoXArgs docs automatically
    
    * Fix tools paths for megatron imports
    
    * Update NeoXArgs docs automatically
    
    * Delete tools/ckpts/merge_mp_partitions.py since it's based on a very old Megatron
    
    * Update NeoXArgs docs automatically
    
    * Add blurb to bash tools README
    
    * Update NeoXArgs docs automatically
    
    * Flesh out datasets README
    
    * Update NeoXArgs docs automatically
    
    * formatting
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: Stella Biderman <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    4 people authored Oct 2, 2023
    Configuration menu
    Copy the full SHA
    3f43f07 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2023

  1. Add documentation about using labelled datasets (#1056)

    * Add documentation and an informative error
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    dashstander and github-actions authored Oct 4, 2023
    Configuration menu
    Copy the full SHA
    f6ac04d View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2023

  1. LR scheduler fix no longer breaks inference (#1060)

    * Add lr_scheduler check for inference.
    
    Signed-off-by: Dashiell Stander <[email protected]>
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Signed-off-by: Dashiell Stander <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    dashstander and github-actions authored Oct 17, 2023
    Configuration menu
    Copy the full SHA
    e001a04 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2023

  1. Lion Optimizer (#1062)

    * initial commit
    
    * test set, fixed readme and docstring
    
    * Refactor Lion implementation
    
    ---------
    
    Co-authored-by: kamathis4 <[email protected]>
    andylolu2 and adi-kmt authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    b02d989 View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2023

  1. fix lion optimizer documentation (#1067)

    * fix lion optimizer documentation
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    jahatef and github-actions authored Oct 31, 2023
    Configuration menu
    Copy the full SHA
    e277bc7 View commit details
    Browse the repository at this point in the history
  2. Fix preprocess_data.py link (#1064)

    * Fix preprocess_data.py link
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Quentin-Anthony and github-actions authored Oct 31, 2023
    Configuration menu
    Copy the full SHA
    f574f22 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2023

  1. Edge-casing for multi-GPU HF-to-NeoX conversion (#1065)

    * edge-casing for multiGPU hf to sequential case
    
    * cleanup whitespace
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Nov 1, 2023
    Configuration menu
    Copy the full SHA
    fcc5af5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8c9fc00 View commit details
    Browse the repository at this point in the history
  3. Pin version of lm_eval (#1070)

    * Pin lm_eval version
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    haileyschoelkopf and github-actions authored Nov 1, 2023
    Configuration menu
    Copy the full SHA
    a10f69c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    41f019e View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2023

  1. Update README.md

    StellaAthena authored Nov 5, 2023
    Configuration menu
    Copy the full SHA
    90aa131 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2023

  1. When processing mlp.dense_4h_to_h.bias and attention.dense.bias, tp_r…

    …anks are not reflected, so strange results always appear when tp_ranks is greater than 1.
    kyuheejang committed Nov 7, 2023
    Configuration menu
    Copy the full SHA
    04dc2ba View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1072 from kyuheejang/Fixing-neox-to-huggingface

    �Fixing convert neox to huggingface bug
    StellaAthena authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    f214358 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2023

  1. Configuration menu
    Copy the full SHA
    d8028f8 View commit details
    Browse the repository at this point in the history

Commits on Nov 16, 2023

  1. Update neox_args.py (#1081)

    * Update neox_args.py
    
    These attention configuration options were missing from the docs. This will fix that.
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    jahatef and github-actions authored Nov 16, 2023
    Configuration menu
    Copy the full SHA
    10bf788 View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2023

  1. Update README.md (#1082)

    * Update README.md
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    StellaAthena and github-actions authored Nov 22, 2023
    Configuration menu
    Copy the full SHA
    f48d3a6 View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2023

  1. Update README.md

    StellaAthena authored Nov 30, 2023
    Configuration menu
    Copy the full SHA
    efea81f View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2023

  1. Extend ci suite (#1080)

    * Use `.yml` extensions in README to reflect extensions used in `configs/` folder
    
    * Rename `save_interval` -> `checkpoint_factor`
    
    * Mark expected failures in existing tests
    
    * Fix minor typos
    
    * Allow creation of checkpoint at iteration 0 when `do_train=False`
    
    Helpful for unit tests because it allows use of a randomly initialised model
    
    * Delete duplicated `test_fused_kernels.py`
    
    Primary version lives in `tests/model/test_fused_kernels.py`
    
    * Avoid initializing CUDA whenever `megatron` is imported
    
    Resolves `Cannot re-initialize CUDA in forked subprocess` error when running distributed unit tests
    
    * Extend suite of unit tests
    mkerin authored Dec 4, 2023
    Configuration menu
    Copy the full SHA
    3be59a4 View commit details
    Browse the repository at this point in the history
  2. Patch coverity scan (#1090)

    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    update build command to avert empty cwd in build metrics
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    adding verbose to debug curl
    
    * Update coverity_scan.yml
    
    debug print trace to examine build metrics xml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update coverity_scan.yml
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Dec 4, 2023
    Configuration menu
    Copy the full SHA
    a2b2020 View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2023

  1. Corrects FLOPs formula as per 1093 (#1094)

    * Update logging.py
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    StellaAthena and github-actions authored Dec 6, 2023
    Configuration menu
    Copy the full SHA
    050f560 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2023

  1. Update CODEOWNERS

    Remove myself as a code owner as I shouldn't be approving PRs.
    StellaAthena authored Dec 19, 2023
    Configuration menu
    Copy the full SHA
    f19b2ec View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2023

  1. Bump transformers from 4.30.2 to 4.36.0 in /requirements (#1097)

    Bumps [transformers](https://github.com/huggingface/transformers) from 4.30.2 to 4.36.0.
    - [Release notes](https://github.com/huggingface/transformers/releases)
    - [Commits](huggingface/transformers@v4.30.2...v4.36.0)
    
    ---
    updated-dependencies:
    - dependency-name: transformers
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Dec 20, 2023
    Configuration menu
    Copy the full SHA
    07166da View commit details
    Browse the repository at this point in the history
  2. Pins old DeeperSpeed until bug is fixed (#1095)

    * Pins old DeeperSpeed until bug is fixed
    
    There is a bug in upstream DeepSpeed detailed [here](microsoft/DeepSpeed#4781) that we didn't catch before synching with main. This pins the prior commit so the bug doesn't impact users.
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    StellaAthena and github-actions authored Dec 20, 2023
    Configuration menu
    Copy the full SHA
    9283eff View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2023

  1. Update README.md

    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    9eef954 View commit details
    Browse the repository at this point in the history
  2. Update README.md

    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    a48e09e View commit details
    Browse the repository at this point in the history
  3. Update NeoXArgs docs automatically

    github-actions committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    613e5a6 View commit details
    Browse the repository at this point in the history
  4. Update README.md

    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    be7eeda View commit details
    Browse the repository at this point in the history
  5. Update README.md

    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    2117afc View commit details
    Browse the repository at this point in the history
  6. Update NeoXArgs docs automatically

    github-actions committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    8dba5b6 View commit details
    Browse the repository at this point in the history
  7. Add QK Normalization (#1100)

    * add qk normalization
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    f161245 View commit details
    Browse the repository at this point in the history
  8. Update README.md

    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    7fb3b3c View commit details
    Browse the repository at this point in the history
  9. Update README.md

    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    a7509f0 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    8eaac4e View commit details
    Browse the repository at this point in the history
  11. Update NeoXArgs docs automatically

    github-actions committed Dec 22, 2023
    Configuration menu
    Copy the full SHA
    4d5a811 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    05cc29c View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    e25446e View commit details
    Browse the repository at this point in the history
  14. Merge pull request #1102 from EleutherAI/StellaAthena-patch-4

    More readme updates
    StellaAthena authored Dec 22, 2023
    Configuration menu
    Copy the full SHA
    287f9f7 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2023

  1. Lm eval 0.4.0 support (#1101)

    * add lm-eval v0.4.0
    
    * rename evaluate.py to avoid shadowing HF evaluate library
    
    * document new evaluate.py filename
    
    * Update NeoXArgs docs automatically
    
    * handle results format differently
    
    * Update NeoXArgs docs automatically
    
    * Update hanging evaluate.py scripts
    
    * Update NeoXArgs docs automatically
    
    * Add triviaqa to default eval_tasks
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Dec 23, 2023
    Configuration menu
    Copy the full SHA
    b27e409 View commit details
    Browse the repository at this point in the history
  2. Update README.md

    StellaAthena authored Dec 23, 2023
    Configuration menu
    Copy the full SHA
    1148a0f View commit details
    Browse the repository at this point in the history

Commits on Dec 26, 2023

  1. Update neox_args.py (#1107)

    * Update neox_args.py
    
    Changed some default values to correspond to values that we generally recommend people use.
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    StellaAthena and github-actions authored Dec 26, 2023
    Configuration menu
    Copy the full SHA
    e5a7ea7 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2024

  1. Fix repo for CI (#1106)

    * Fix syntax errors
    
    * Make pre-commit fixes across repo
    
    * Ensure correct version of clang-format in CI
    
    ---------
    
    Co-authored-by: Yang Zhang <[email protected]>
    yang and yang authored Jan 4, 2024
    Configuration menu
    Copy the full SHA
    eca6b1a View commit details
    Browse the repository at this point in the history
  2. Fix install, Dockerfile, CI (#1104)

    * Add missing jinja2 dep
    
    Missing transitive dep of lm_eval
    
    * Fix Dockerfile
    
    Only devel has nvcc, needed to build packages
    
    And don't rebuild fused kernels if no relevant change
    
    * Ensure Dockerfile builds in CI
    
    Also ensures that install actually works
    
    ---------
    
    Co-authored-by: Yang Zhang <[email protected]>
    yang and yang authored Jan 4, 2024
    Configuration menu
    Copy the full SHA
    98716eb View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2024

  1. Fused Rotary Embeddings (fixed) (#1108)

    * Create fused_rotary_positional_embedding.cpp
    
    * Create fused_rotary_positional_embedding.h
    
    * Create fused_rotary_positional_embedding_cuda.cu
    
    * Update fused_rotary_positional_embedding.h
    
    Ports the fix from NVIDIA/apex#1750 into this branch.
    
    * Update neox_args.py
    
    * Update setup.py
    
    * Update initialize.py
    
    * Update setup.py
    
    * Update __init__.py
    
    * Update test_fused_kernels.py
    
    * Update setup.py
    
    * Create fused_rope.py
    
    * Update fused_rotary_positional_embedding.h
    
    * Update fused_rotary_positional_embedding.cpp
    
    * Update fused_rotary_positional_embedding.cpp
    
    * Update transformer.py
    
    * Update transformer.py
    
    Just checked and this should work for bf16. Or, at least, the reason I originally thought it wouldn't doesn't apply.
    
    * Update transformer.py
    
    * Create 125M_fused_rope.yml
    
    * Update 125M_fused_rope.yml
    
    * Update transformer.py
    
    Add `self.rope_fusion = neox_args.rope_fusion` so that `ParallelSelfAttention` knows if we're using rope fusion.
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * Fix fused rope
    
    Just needed to bring in the latest headers/sources,
    and call into it the right way from transformers.py.
    
    * Add rope_fusion arg to all ymls
    
    ---------
    
    Co-authored-by: Stella Biderman <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    Co-authored-by: Yang Zhang <[email protected]>
    5 people authored Jan 5, 2024
    Configuration menu
    Copy the full SHA
    77605ca View commit details
    Browse the repository at this point in the history
  2. Add pythia 14M and 31M configs (#1111)

    * Add pythia 14M config
    
    * Create 31M.yml
    segyges authored Jan 5, 2024
    Configuration menu
    Copy the full SHA
    f14782a View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2024

  1. Add docker compose and change containerized setup instructions to use…

    … it (#1113)
    
    * Add pythia 14M config
    
    * Create 31M.yml
    
    * Add docker compose, update readme docker instructions to utilize it
    
    * Add logging limits to docker-compose files
    
    * Change data mount from /gpt-neox/data to /data/
    
    This prevents possible errors if the user already has a /data/ directory in their /gpt-neox/ folder
    
    * Update README.md
    
    Makes the code blocks into blocks in the changed parts
    
    * Make the docker-compose spinup tidier
    
    * Avoid config bloat by only providing the updated paths
    
    * Apply precommit
    
    ---------
    
    Co-authored-by: Quentin Anthony <[email protected]>
    segyges and Quentin-Anthony authored Jan 9, 2024
    Configuration menu
    Copy the full SHA
    e6e944a View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2024

  1. Configuration menu
    Copy the full SHA
    92b1b6f View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2024

  1. Bump jinja2 from 3.1.2 to 3.1.3 in /requirements (#1120)

    Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
    - [Release notes](https://github.com/pallets/jinja/releases)
    - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
    - [Commits](pallets/jinja@3.1.2...3.1.3)
    
    ---
    updated-dependencies:
    - dependency-name: jinja2
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Jan 13, 2024
    Configuration menu
    Copy the full SHA
    90f70ff View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. Enable passing of --account to srun / SlurmLauncher (#1126)

    * add `account` to Deepspeed args
    
    * Add handling of `account` when `deepspeed_slurm` is set
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    haileyschoelkopf and github-actions authored Jan 19, 2024
    Configuration menu
    Copy the full SHA
    6399155 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2024

  1. update copyrights (#1128)

    * update copyrights
    
    * Update NeoXArgs docs automatically
    
    * nvidia copyright years
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    jahatef and github-actions authored Jan 24, 2024
    Configuration menu
    Copy the full SHA
    7a8fa2f View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2024

  1. fused layernorm (#1105)

    * Add simple util for CUDA timings
    
    * Add fused layernorm kernel from Megatron
    
    Closes #952
    
    * change default fused layernorm to false
    
    * Update test_setup.yml
    
    * Update test_train_base.yml
    
    ---------
    
    Co-authored-by: Yang Zhang <[email protected]>
    Co-authored-by: jahatef <[email protected]>
    Co-authored-by: Jacob Hatef <[email protected]>
    4 people authored Jan 26, 2024
    Configuration menu
    Copy the full SHA
    3d8fec0 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2024

  1. Contributing Guide (#1138)

    * contributing guide
    
    * Update NeoXArgs docs automatically
    
    * Update CONTRIBUTING.md
    
    * Update NeoXArgs docs automatically
    
    * Remove microsoft references and link on main readme
    
    * Update NeoXArgs docs automatically
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Jan 29, 2024
    Configuration menu
    Copy the full SHA
    e5602c3 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2024

  1. Configuration menu
    Copy the full SHA
    1c133bf View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. Update lm_eval v0.4 to PyPI dependencies (#1141)

    * Update requirements.txt
    
    * Update requirements.txt
    
    * Update NeoXArgs docs automatically
    
    * add note to neox_args.py
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Feb 1, 2024
    Configuration menu
    Copy the full SHA
    032ec8c View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2024

  1. Remove gas (beano) (#1144)

    * Remove 'gas' configuration variable
    
    * Remove gas from configs and config documentation
    
    * Update training.py
    segyges authored Feb 5, 2024
    Configuration menu
    Copy the full SHA
    91c44bc View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2024

  1. Improve Conversion Utilities (#1124)

    * draft: unify sequential + PPModule conversion scripts
    
    * Update NeoXArgs docs automatically
    
    * draft: pull out model param names / model definition
    
    * Update NeoXArgs docs automatically
    
    * tested: neox models with TP = 1, PipelineModule, work
    
    * Update NeoXArgs docs automatically
    
    * draft: Llama + GQA QKV resharding
    
    * Update NeoXArgs docs automatically
    
    * update Llama conversion script to support Mistral and GQA
    
    * Update NeoXArgs docs automatically
    
    * test Mistral-7B conversion
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * push documentation on imports / Llama loading
    
    * push further readme updates (Mistral included)
    
    * Preventconversions for unsupported featurees, disclaim in README
    
    * Update NeoXArgs docs automatically
    
    * revert PR#1072 RowParallel bias conversion error
    
    * remove sequential_to_hf and module_to_hf scripts, deprecated in favor of convert_neox_to_hf.py
    
    * Update NeoXArgs docs automatically
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Feb 8, 2024
    Configuration menu
    Copy the full SHA
    f7373f8 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2024

  1. Fixes distributed tests, and skips tests that are broken. (#1149)

    * Fixes distributed tests, and skips tests that are broken.
    
    * Update NeoXArgs docs automatically
    
    * improve pytest msgs and remove commented code
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Feb 21, 2024
    Configuration menu
    Copy the full SHA
    412cf6e View commit details
    Browse the repository at this point in the history
  2. Memory profiling (#1153)

    * Fixes distributed tests, and skips tests that are broken.
    
    * memory profiling for gpt-neox. Only works for pp=0, pp=1+ needs DS commits.
    
    * Update NeoXArgs docs automatically
    
    * adds memory profiling for pipeline parallel
    
    * Update NeoXArgs docs automatically
    
    * fix spacing
    
    * Update NeoXArgs docs automatically
    
    * fix spacing again
    
    * Update NeoXArgs docs automatically
    
    * get rid of unwanted changes
    
    * Update NeoXArgs docs automatically
    
    * get rid of file
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * add nsight systems support
    
    * remove tests changes again
    
    * Update NeoXArgs docs automatically
    
    * add tests
    
    * Update NeoXArgs docs automatically
    
    * Update training.py
    
    * Update NeoXArgs docs automatically
    
    * Add assertion message
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Feb 21, 2024
    Configuration menu
    Copy the full SHA
    46d179c View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2024

  1. add profiling to readme (#1154)

    * add profiling to readme
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    jahatef and github-actions authored Feb 23, 2024
    Configuration menu
    Copy the full SHA
    eee03b2 View commit details
    Browse the repository at this point in the history
  2. Python version update (#1122)

    * Switch default command for docker image
    
    * Rename pythia paths docker file for clarity
    
    * Update docker build to use python 3.10
    
    * Update github workflows to use ubuntu 22.04 and python 3.10
    
    * Bump pytorch library patch versions
    
    * Add pytest-html for reasonably formatted test reports
    
    * Fix build after torch and cuda version bump
    
    * Fix apex install for newer version
    
    1) This, empirically, works, as tested by running the build and kicking off training.
    2) Apex documentation says it is incorrect syntax and deprecated.
    3) It takes so long to compile that it is probably, all by itself, something that needs fixing.
    4) I will probably pull the fused adamw out of apex.
    5) It has been building for twenty minutes so I am going to go do something else.
    
    * Fix pip version to ensure apex compilation remains good
    
    * Fix unit test for evaluate
    
    * Fix pip requirement
    
    Prevents possible build issues with apex especially across divergent pip versions
    
    * Update dockerfile to point to stripped-down apex repo
    
    * Revert "Update dockerfile to point to stripped-down apex repo"
    
    This reverts commit 40c7656.
    
    * Update apex version in dockerfile
    
    * Switch to downloading prebuilt apex wheel
    
    * Clean up docker copy commands
    
    * Have docker build conditionally get binaries or build apex
    
    * Apply precommit
    segyges authored Feb 23, 2024
    Configuration menu
    Copy the full SHA
    a7638a8 View commit details
    Browse the repository at this point in the history
  3. Minor changes (#1125)

    * Switch default command for docker image
    
    * Rename pythia paths docker file for clarity
    
    * Fix unit test for evaluate
    
    * Update readme for testing to omit --forked argument
    
    * Add pytest-html to requirements-dev.txt
    
    * Revert "Update readme for testing to omit --forked argument"
    
    This reverts commit 19021fc.
    
    * Add data/ directory and .bin and .idx files in /tests/data to .gitignore
    
    This makes it so that git doesn't try to let you commit (or force you to stash) data files
    
    * Make .gitignore for data files slightly more elegant
    
    * Add utility script for doing token counts on processed datasets
    
    * Run precommit hook
    
    * Fix token count script, run precommit
    segyges authored Feb 23, 2024
    Configuration menu
    Copy the full SHA
    72d1803 View commit details
    Browse the repository at this point in the history
  4. Draft PR Adding mistral 0.1 (#1131)

    * add support for flash attention 2
    
    * change cosine decay to chinchilla style
    
    * set default warmup to none so that warmup_iters can be set
    
    * fixed bug
    
    * fixed chinchilla lr
    
    * add s3 checkpoint syncing
    
    * rotary embedding in fp32
    
    * fix for seq_len < max_seq_len
    
    * some fixes, still not working
    
    * ?'
    :
    
    * fix bugs; evaluate on step 0
    
    * first attempt at gqa
    
    * gqa works in kv_heads==query_heads case
    
    * gqa working
    
    * workaround for FSX quota
    
    * update with llemma
    
    * update with recent PR
    
    * README and requirements updated
    
    * Added Mistral config
    
    * Added sliding window through flash attention 2
    
    * Added sliding window
    
    * Mistral should likely use mp=2 like llama2
    
    * Update gitignore
    
    * Removed unused CPCargo import
    
    * Conversion script (WIP)
    
    * Fixed missing slurm environ vars
    
    * updated mistral config
    
    * updated job script
    
    * initial commit conversion mistral hf to sequential
    
    * Added stacking q, k, v appropriately for mp ranks
    
    * pp=0 support from end of 2023
    
    * Cleaning up config and removing Autoconfig in conversion script
    
    * Cleaned up conversion example script
    
    * cleanup: add back configs folder, discard Llemma readme
    
    * cleanup: remove llemma lr sched changes, re-add requirements/ folder
    
    * docs: add explanation of intermediate_size behavior
    
    * args: add argument checking for num_kv_heads, clean up usage syntax
    
    * args: prevent num KV heads < TP worldsize
    
    * readd triton flash attn func
    
    * cleanup: use tools/ dir from main
    
    * docs: re-add mistral , GQA as supported
    
    * cleanup: delete duplicate tools/ files
    
    * cleanup: use fp32 rope (non-fused) from main
    
    * cleanup: no longer block out GQA codepaths in conversion scripts
    
    * cleanup: gqa code a bit
    
    * add llama2, llemma configs
    
    * add non-flash GQA ; refactor modeling code
    
    * clean up mistral config for commit
    
    * further cleanup configs dir
    
    * remove slurm script from llemma
    
    * update seqlen params for codellama, llemma configs
    
    * add more comments to GQA code, and make reshapes more readable
    
    * make inv_freq non-persistent
    
    * actually, just ensure mistral has inv_freqs as a persistent buffer
    
    * non-flash GQA works, so ensure arguments.py permits it
    
    * no longer use our own copies of flash attention interface functions
    
    * remove unused mpu util fn
    
    * delete unused config file
    
    * fix diff on mpu/utils.py
    
    * remove slurm scripts that won't be in this PR
    
    * run pre-commit
    
    * update tests for conversion scripts
    
    * add flash version check for sliding window
    
    * pre-commit
    
    ---------
    
    Co-authored-by: zhangir-azerbayev <[email protected]>
    Co-authored-by: haileyschoelkopf <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    4 people authored Feb 23, 2024
    Configuration menu
    Copy the full SHA
    f36aed7 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2024

  1. [Bug?] Fix profiling argument names (#1155)

    * possibly fix profiling flag names
    
    * actually, profile_backward already exists
    
    * Update NeoXArgs docs automatically
    
    * neox_args.profile was also used some places, update that too
    
    * Update NeoXArgs docs automatically
    
    * profiling --> profile
    
    * Update NeoXArgs docs automatically
    
    * Revert neox_arguments.md changes
    
    * Update NeoXArgs docs automatically
    
    * Update gen_docs since __name__ only returns the Literal for string args with Python 3.10
    
    * Update NeoXArgs docs automatically
    
    * Another update to preserve non-literals
    
    * Update NeoXArgs docs automatically
    
    * add union
    
    * Update NeoXArgs docs automatically
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Feb 26, 2024
    Configuration menu
    Copy the full SHA
    9663802 View commit details
    Browse the repository at this point in the history

Commits on Feb 29, 2024

  1. Update cpu_ci.yml (#1159)

    * Update cpu_ci.yml
    
    Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    jaimemcc-intel and github-actions authored Feb 29, 2024
    Configuration menu
    Copy the full SHA
    3c03fc7 View commit details
    Browse the repository at this point in the history

Commits on Mar 2, 2024

  1. Improve argument validation for Flash-attn + SWA (#1162)

    * Improve argument validation for Flash-attn + SWA
    
    * Update NeoXArgs docs automatically
    
    * don't pass window_size if not necessary
    
    * Update NeoXArgs docs automatically
    
    * Update 7B.yml
    
    * Update NeoXArgs docs automatically
    
    * apply precommit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    haileyschoelkopf and github-actions authored Mar 2, 2024
    Configuration menu
    Copy the full SHA
    19596b0 View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2024

  1. Single node Pythia 14M training on ngc pytorch 24.02 container (#1170)

    * Pythia 14M training on ngc pytorch 24.02 container
    
    * pre-commit
    
    ---------
    
    Co-authored-by: Quentin Anthony <[email protected]>
    tf-nv and Quentin-Anthony authored Mar 4, 2024
    Configuration menu
    Copy the full SHA
    119950c View commit details
    Browse the repository at this point in the history
  2. Remove unnecessary fp32/bf16 conversion (#1169)

    * feat: remove unnecessary bf16 conversions since no collective op is performed
    
    * pre-commit
    
    ---------
    
    Co-authored-by: Quentin Anthony <[email protected]>
    DayOfThePenguin and Quentin-Anthony authored Mar 4, 2024
    Configuration menu
    Copy the full SHA
    7b8187a View commit details
    Browse the repository at this point in the history
  3. Ignore markdown for pre-commit (#1171)

    * ignore markdown for pre-commit
    
    * only ignore end of file and trailing whitespace
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Quentin-Anthony and github-actions authored Mar 4, 2024
    Configuration menu
    Copy the full SHA
    31cfe52 View commit details
    Browse the repository at this point in the history
  4. Make rotary freqs buffer non-persistent (#1168)

    * make inv_freq non-persistent by default
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Mar 4, 2024
    Configuration menu
    Copy the full SHA
    e109bf5 View commit details
    Browse the repository at this point in the history
  5. Support Lion with Zero Optimizer (#1166)

    * feat: deepspeed zero lion support
    
    * feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion
    
    * feat: bump DeeperSpeed version to include pipeline logging fix
    
    * pre-commit
    
    ---------
    
    Co-authored-by: Quentin Anthony <[email protected]>
    DayOfThePenguin and Quentin-Anthony authored Mar 4, 2024
    Configuration menu
    Copy the full SHA
    df8cf24 View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2024

  1. Add MoE (#1129)

    * Add DeepSpeed MoE
    
    Thanks to dayofthepenguin for extensive testing
    
    Closes #479
    
    * Update NeoXArgs docs automatically
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: Yang Zhang <[email protected]>
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    4 people authored Mar 7, 2024
    Configuration menu
    Copy the full SHA
    86758c3 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2024

  1. remove best_download as dependency (#1179)

    * Update requirements.txt
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    haileyschoelkopf and github-actions authored Mar 8, 2024
    Configuration menu
    Copy the full SHA
    63b9fa1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    90d4cb3 View commit details
    Browse the repository at this point in the history
  3. clean up dockerfile: (#1175)

    - Eliminate already installed apt packages
    - sparse attn requirement lead to a triton downgrade
    - flash attn is already part of the ngc container (in another version
      that is compatible with TE)
    tf-nv authored Mar 8, 2024
    Configuration menu
    Copy the full SHA
    8c13642 View commit details
    Browse the repository at this point in the history
  4. When using kv cache and flash attention in conjunction, it's crucial …

    …to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results. (#1178)
    chaochen99 authored Mar 8, 2024
    Configuration menu
    Copy the full SHA
    c1fa994 View commit details
    Browse the repository at this point in the history
  5. Remove gas from Pythia configs (#1181)

    Fixes #1165
    
    Co-authored-by: Yang Zhang <[email protected]>
    yang and yang authored Mar 8, 2024
    Configuration menu
    Copy the full SHA
    1e7abe7 View commit details
    Browse the repository at this point in the history
  6. Fix moe_loss in gpt_j_residual path (#1180)

    Fixes #1174
    
    Co-authored-by: Yang Zhang <[email protected]>
    yang and yang authored Mar 8, 2024
    Configuration menu
    Copy the full SHA
    82ddc66 View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2024

  1. Add Mamba Architecture (#1157)

    * initial mamba support (no kernels, no parallelism)
    
    * Mamba runs! Also, add flags for sel. scan and conv1d fused kernels
    
    * Update NeoXArgs docs automatically
    
    * add mamba_inner_fn ; try really hard to make A_log and D no-WD and stored in fp32
    
    * cleanup print statements
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * add draft conversion script (tested working TP=1)
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * update parallelism checks for mamba--partition activations works
    
    * add mamba requirements
    
    * clean up and better comment mamba code
    
    * clean up and better comment mamba code
    
    * update arg validation in mamba
    
    * more cleanup
    
    * add flag for fp32 Alog/D, add init_methods support for mamba
    
    * Update NeoXArgs docs automatically
    
    * update conversion script name, add docstring
    
    * name conversion script
    
    * Update NeoXArgs docs automatically
    
    * add demo configs
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    * add arguments to control conv and (in,out)_proj biases in mamba separately
    
    * Update NeoXArgs docs automatically
    
    * make x_proj bias also controlled by flag
    
    * Update NeoXArgs docs automatically
    
    * pre-commit, add comments
    
    * Update NeoXArgs docs automatically
    
    * Add mamba import print
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Mar 10, 2024
    Configuration menu
    Copy the full SHA
    6809bbc View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2024

  1. Switch to using Cuda Flash Attn for Alibi (#1183)

    * add cuda support for flash attn w/ alibi, warn of deprecation of triton
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    haileyschoelkopf and github-actions authored Mar 13, 2024
    Configuration menu
    Copy the full SHA
    03186de View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2024

  1. Mamba + Tensor Parallel Support (#1184)

    * TP works!
    
    * merge TP mamba changes with most current MambaLayer
    
    * cleanup TP, confirmed working still
    
    * make shapes with TP>1 work with conversion
    
    * tested and PP works, so no need for assert blocking it in arguments
    
    * update comment
    
    * Update NeoXArgs docs automatically
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Mar 15, 2024
    Configuration menu
    Copy the full SHA
    277141e View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2024

  1. [ZeRO-3] Partitioned init with deepspeed.zero.Init() (#1190)

    * added ds zero.Init() to get_model
    
    * Clean up conditional with block
    
    * pre-commit
    
    ---------
    
    Co-authored-by: Quentin Anthony <[email protected]>
    R0n12 and Quentin-Anthony authored Mar 19, 2024
    Configuration menu
    Copy the full SHA
    7267a74 View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2024

  1. Small typo in the README

    edouardoyallon committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    e6b5261 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #1196 from edouardoyallon/typo_readme

    ENH Small typo in the README
    StellaAthena authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    4085302 View commit details
    Browse the repository at this point in the history
  3. Added more papers

    StellaAthena authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    1960b66 View commit details
    Browse the repository at this point in the history
  4. Update README.md

    StellaAthena authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    3616658 View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2024

  1. making PR triggered CPU test for changes to megatron (#1195)

    * making PR triggered CPU test for changes to megatron
    
    * Update NeoXArgs docs automatically
    
    * pre-commit
    
    * Update NeoXArgs docs automatically
    
    ---------
    
    Co-authored-by: github-actions <[email protected]>
    Co-authored-by: Quentin Anthony <[email protected]>
    3 people authored Apr 1, 2024
    Configuration menu
    Copy the full SHA
    977448e View commit details
    Browse the repository at this point in the history
  2. [AMD] Supporting fused kernels build using JIT (#1188)

    * initial JIT load functions
    
    * passing neox_arge to load() as optional for easy testing
    
    * modified headers for correct copyright statements
    R0n12 authored Apr 1, 2024
    Configuration menu
    Copy the full SHA
    51a7de9 View commit details
    Browse the repository at this point in the history
  3. [ZeRO-3] Ensured passing neox deepspeed_config when using partitioned…

    … init (#1191)
    
    * added ds zero.Init() to get_model
    
    * Clean up conditional with block
    
    * pre-commit
    
    * ensured deepspeed configs are passed to init
    
    ---------
    
    Co-authored-by: Quentin Anthony <[email protected]>
    R0n12 and Quentin-Anthony authored Apr 1, 2024
    Configuration menu
    Copy the full SHA
    01657aa View commit details
    Browse the repository at this point in the history