Skip to content

Commit

Permalink
Merge pull request #14 from RWKV/PicoCreator-patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
PicoCreator authored Aug 22, 2023
2 parents 505622b + 1013082 commit ea1aa52
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,11 @@ You can find the training channel on our discord here: https://discord.com/chann

## Should I use the official RWKV-LM trainer or the infctx trainer?

Generally if your training a foundation model from scratch - with a fixed context size, and you need the absolute highest throughput across multiple nodes (ie. 10 nodes filled with A100 servers), the [official trainer](https://github.com/BlinkDL/RWKV-LM) should perform better.
Generally if your training a foundation model from scratch - with a fixed context size, and you need the absolute highest throughput across multiple nodes (ie. 10 nodes filled with A100 servers), the [official trainer](https://github.com/BlinkDL/RWKV-LM) would perform much better (ie 2x faster depending on the settings)

If you need deepspeed 3 support, or you deal with dynamic datasets, this trainer is much more flexible, for most nearly all other use cases.
If you need deepspeed 3 support, or you deal with dynamic datasets, this trainer is much more flexible, for nearly all other use cases.

Overtime as we optimize the infctx trainer, the gap to the official trainer should shrink, however this is not the highest priority (infctx working > absolute speed)

## Some long term architecture goals

Expand Down

0 comments on commit ea1aa52

Please sign in to comment.