Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulling main updates to rwkv-x-playground #89

Merged
merged 153 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
ff516a1
WIP baseline run
pic-o Jan 18, 2024
a4ab031
WIP torch compile test
PicoCreator Jan 19, 2024
2a015d0
Tweak torch compile to show a warning
PicoCreator Jan 19, 2024
a8585e1
wip runs
pic-o Jan 19, 2024
5c60ab9
wip train, exp sloss train
pic-o Jan 19, 2024
ee976dd
fixing the prefix masking setting
pic-o Jan 19, 2024
783f36e
docker build for updated cuda version
PicoCreator Jan 19, 2024
1a772b1
Merge branch 'rwkv-x-selective-loss-exp' of https://github.com/RWKV/R…
PicoCreator Jan 19, 2024
5288410
Merge pull request #59 from RWKV/rwkv-x-selective-loss-exp
PicoCreator Jan 19, 2024
b1ab77f
tweaking build rules
PicoCreator Jan 19, 2024
ccc2cb9
Merge pull request #60 from RWKV/rwkv-x-selective-loss-exp
PicoCreator Jan 19, 2024
a6eb626
building the cuda env
PicoCreator Jan 19, 2024
64cf80f
test building on x series
PicoCreator Jan 19, 2024
8dd3eac
trying to trigger GH actions
PicoCreator Jan 19, 2024
6aae950
Merge pull request #61 from RWKV/rwkv-x-selective-loss-exp
PicoCreator Jan 19, 2024
4546994
trying to trigger the GH docker action
PicoCreator Jan 19, 2024
42cf1af
still trying to fix the build
PicoCreator Jan 19, 2024
f088536
7B baseline
PicoCreator Jan 19, 2024
bbcb751
wip 7b baseline
PicoCreator Jan 19, 2024
c00a27e
sloss perf
PicoCreator Jan 19, 2024
9426405
7B baseline
PicoCreator Jan 20, 2024
38c30f6
Fixed DS 3?
pic-o Jan 20, 2024
e0aad53
deepspeed and multi gpu validation
pic-o Jan 20, 2024
3186764
Merge pull request #63 from RWKV/ds3-fix
PicoCreator Jan 20, 2024
1483339
7B benchmarking
pic-o Jan 21, 2024
c251e14
benchmark file tweak
pic-o Jan 21, 2024
01835f4
tweak
pic-o Jan 21, 2024
b237fb3
WIP 1B5 baseline
pic-o Jan 21, 2024
9438607
WIP 1B5 baseline and sloss runs
pic-o Jan 21, 2024
86b3aea
WIP experiments
pic-o Jan 21, 2024
a3a8f00
Fixing validation loss code
pic-o Jan 21, 2024
3516b74
1B5 - enwiki / sloss runs
pic-o Jan 21, 2024
5d0bcd8
100k run
pic-o Jan 21, 2024
463b5ee
fixing spikes in token/s tracking
pic-o Jan 22, 2024
5e9802f
reverted how KTokens are measure (the newer graph is wierder)
pic-o Jan 22, 2024
695e4c2
WIP 1B5 run
pic-o Jan 22, 2024
325c53a
wip iteration
pic-o Jan 23, 2024
ebab809
WIP returne
pic-o Jan 23, 2024
54c816a
1B5 memory finetune
PicoCreator Jan 23, 2024
dc99b9b
1.5b and 3b runs
PicoCreator Jan 23, 2024
d57efd2
3B & 7B runs
PicoCreator Jan 24, 2024
66e37ad
Merge pull request #65 from RWKV/rwkv-x-selective-loss-exp
PicoCreator Jan 24, 2024
20d0411
reverting lr calc
pic-o Jan 24, 2024
767f46e
example of S3 based dataset
pic-o Jan 24, 2024
425e8db
updating config example
pic-o Jan 24, 2024
47cc775
tweak example yaml
pic-o Jan 24, 2024
a1b1046
change default tokenizer to world
pic-o Jan 25, 2024
6e89d63
Bug fixes, and feature patches from @smerky
pic-o Jan 25, 2024
2db0b8d
Merge pull request #66 from RWKV/remote-dataset-support
PicoCreator Jan 25, 2024
0bea4d9
fixing total token counting, maybe?
pic-o Jan 25, 2024
f976b7e
WIP datapack implementaiton
pic-o Jan 28, 2024
5605acc
WIP datapack implementation
pic-o Jan 28, 2024
41710dc
Working datapack builder
pic-o Jan 28, 2024
53cf968
Working datapack packing
pic-o Jan 28, 2024
2164db0
Format tweaking
pic-o Jan 28, 2024
2ad2ef7
run tweak
pic-o Jan 28, 2024
6c35546
Merge pull request #68 from RWKV/rwkv-x-interweave-datapack
PicoCreator Jan 28, 2024
84fb00a
Merge pull request #69 from RWKV/rwkv-x-interweave-datapack
PicoCreator Jan 28, 2024
4ee43f2
Updating exampels to v5?
pic-o Jan 31, 2024
fa2c3eb
bug fixes
pic-o Jan 31, 2024
db37891
WIP Eagle Finetune Notebook
pic-o Jan 31, 2024
0530723
wip tune examples
PicoCreator Jan 31, 2024
815492b
chat format fix
PicoCreator Jan 31, 2024
f6f23e5
@smerkyg optimized timemix
PicoCreator Jan 31, 2024
4deb31a
Automatically disable CUDA, if its not detected
PicoCreator Jan 31, 2024
dd8998e
Fixing the batching dummy passes
PicoCreator Jan 31, 2024
75f2a4c
fixing dummay_batch_zero
PicoCreator Jan 31, 2024
52dbb5f
precision fixes
SmerkyG Jan 31, 2024
36a4cd4
Adding no cuda auto toggle, adding torch compile (again)
PicoCreator Jan 31, 2024
fce7dbc
Empty shape skip code
PicoCreator Jan 31, 2024
97c8df9
capybara chat example
PicoCreator Jan 31, 2024
ded473e
capybara
PicoCreator Jan 31, 2024
fd0342f
simplified padding
SmerkyG Jan 31, 2024
d4fa285
Merge pull request #71 from RWKV/rwkv-x-interweave-datapack
PicoCreator Feb 1, 2024
fe440bb
various finetune examples
PicoCreator Feb 2, 2024
a9a5912
multipack example tweak
PicoCreator Feb 2, 2024
c5b98f3
WIP Eagle-x
PicoCreator Feb 2, 2024
06e4280
use chunklen128 with float64 and tighter decay clamp for JIT training…
SmerkyG Feb 2, 2024
7444e63
Revert "use chunklen128 with float64 and tighter decay clamp for JIT …
PicoCreator Feb 2, 2024
90059d3
multipack
PicoCreator Feb 2, 2024
2c4fc86
Enofrcing NO_CUDA by default, and disabling torch compile by default
PicoCreator Feb 2, 2024
6d52048
tweaking defaults
PicoCreator Feb 2, 2024
a7b090d
Merge pull request #72 from RWKV/rwkv-x-eagle-notebooks
PicoCreator Feb 2, 2024
7f7f61f
typo in checkpoint size
PicoCreator Feb 2, 2024
786889d
updated examples
PicoCreator Feb 2, 2024
00274ed
Merge pull request #73 from RWKV/rwkv-x-eagle-notebooks
PicoCreator Feb 2, 2024
021424f
cuda 12-1 build
PicoCreator Feb 3, 2024
24cd174
updated github runner template
PicoCreator Feb 3, 2024
08d5296
Fixing cosign, for docker img
PicoCreator Feb 3, 2024
aca143b
bumping docker build process
PicoCreator Feb 3, 2024
34b3616
base model tweak
PicoCreator Feb 3, 2024
2e33f4b
docker build bump
PicoCreator Feb 3, 2024
d606d4a
fixing docker ref
PicoCreator Feb 3, 2024
4ddbf83
change logging format, to reduce confusion
pic-o Feb 3, 2024
b3a32c0
move to experiment folder
pic-o Feb 3, 2024
695545d
preparing experiment notebooks
pic-o Feb 3, 2024
d4e2d3e
Added notes on runnign the runner
pic-o Feb 3, 2024
fea8459
Merge remote-tracking branch 'origin/main' into rwkv-x-eagle-notebooks
pic-o Feb 3, 2024
d9a6211
include any option
pic-o Feb 3, 2024
3d5fc0f
wip benchmarks
PicoCreator Feb 4, 2024
9c7744c
tweak
PicoCreator Feb 4, 2024
59dc19e
Merge branch 'rwkv-x-eagle-notebooks' of https://github.com/RWKV/RWKV…
PicoCreator Feb 4, 2024
c42c7c0
Fixing the LR for batched
pic-o Feb 4, 2024
fe9bd2c
Merge branch 'rwkv-x-eagle-notebooks' of https://github.com/RWKV/RWKV…
pic-o Feb 4, 2024
475558d
fixing mask sum calc
pic-o Feb 4, 2024
0fd1387
wip calibration
PicoCreator Feb 4, 2024
a2443d7
enwiki 16k test
PicoCreator Feb 4, 2024
85ce0c1
Update notebook title in enwiki-16k-3e-5.ipynb
PicoCreator Feb 4, 2024
18f52d0
Fixing multipack
PicoCreator Feb 4, 2024
ade9b35
loss validation run
PicoCreator Feb 4, 2024
efced7f
WIP datapack fixing code
PicoCreator Feb 4, 2024
e5deb69
Update learning rate initialization and finalization values
PicoCreator Feb 4, 2024
5e1b715
WIP benchmarks
PicoCreator Feb 4, 2024
6d7ca1c
drop dataset_name and dataset_index
PicoCreator Feb 4, 2024
dd9c79d
Merge branch 'rwkv-x-eagle-notebooks' of https://github.com/RWKV/RWKV…
PicoCreator Feb 4, 2024
0ee162d
the datapack code
PicoCreator Feb 4, 2024
44fb0be
support custom dataset split
PicoCreator Feb 4, 2024
6f1e02e
config update
PicoCreator Feb 4, 2024
601986c
WIP tweaks
PicoCreator Feb 4, 2024
dfccbc7
wip MultiPack train
PicoCreator Feb 4, 2024
74deedb
tweaks
PicoCreator Feb 6, 2024
429c8c1
prototype train/test split swap - because sometimes you need that
pic-o Feb 6, 2024
0654172
Merge branch 'rwkv-x-eagle-notebooks' of https://github.com/RWKV/RWKV…
pic-o Feb 6, 2024
e02907a
fixing multi-gpu sync
pic-o Feb 6, 2024
6b04127
Merge pull request #75 from RWKV/rwkv-x-eagle-notebooks
PicoCreator Feb 6, 2024
bea9014
enforce DropLast for the distributed sampler to work around set issues
PicoCreator Feb 6, 2024
d9f8b8d
Merge pull request #76 from RWKV/rwkv-x-eagle-notebooks
PicoCreator Feb 6, 2024
727e080
Add notes to README.md and update config-example.yaml
PicoCreator Feb 7, 2024
5ffe8cb
Added nvidia-smi safety within the container, stops and restart on mi…
pic-o Feb 7, 2024
6e792a2
Merge branch 'rwkv-x-eagle-notebooks' of https://github.com/RWKV/RWKV…
pic-o Feb 7, 2024
ba60880
Merge pull request #77 from RWKV/rwkv-x-eagle-notebooks
PicoCreator Feb 8, 2024
7a92b7b
fixing finetune recommendations
pic-o Feb 8, 2024
b9f1f1a
WIP dataset behaviour tweak
PicoCreator Feb 9, 2024
fe5f734
enable no cuda by default
PicoCreator Feb 9, 2024
539e8ed
Merge branch 'rwkv-x-eagle-notebooks' of https://github.com/RWKV/RWKV…
PicoCreator Feb 9, 2024
143ff31
T2 chunk 1 build process
PicoCreator Feb 10, 2024
976e038
Multipack instruct tweak
PicoCreator Feb 10, 2024
efa7737
wip datastreaming test
PicoCreator Feb 10, 2024
6861e5e
notebook validation of the continue checkpoint
PicoCreator Feb 11, 2024
605c9a3
the crazy fix - hijack LR schedule code to fix dataset offset
PicoCreator Feb 11, 2024
da81f80
Merge pull request #78 from RWKV/rwkv-x-eagle-notebooks
PicoCreator Feb 11, 2024
73716ed
jit improvement, noncuda fixes
SmerkyG Feb 15, 2024
40c1fb3
Merge pull request #80 from RWKV/smerky_fixes_2024_02_15
PicoCreator Feb 15, 2024
ba2f463
initial rwkv6 and cuda state support for v5
SmerkyG Mar 14, 2024
9c3ab87
model init version support
SmerkyG Mar 14, 2024
7d443f5
v6 chanmix bugfix
SmerkyG Mar 14, 2024
d42fa37
improved v6 assertion to allow non-cuda inference
SmerkyG Mar 15, 2024
fcee20d
add older v6 cuda
SmerkyG Mar 15, 2024
31fe6b6
split v6 into separate directory
SmerkyG Mar 15, 2024
e672079
removed v6 chanmix from v5 dir
SmerkyG Mar 15, 2024
4d215ef
revert usage of v6state cuda for v5
SmerkyG Mar 15, 2024
2908b58
import bugfix, single step non-cuda bugfix
SmerkyG Mar 19, 2024
2528086
Merge pull request #88 from RWKV/rwkv-6-support
SmerkyG Apr 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
name: Docker Env Image (cuda-11-8)
name: Docker Env Image (cuda-12-1)

on:
push:
branches: [ "main" ]
branches: [ "main", "rwkv-x-*" ]
# Publish semver tags as releases.
tags: [ 'v*.*.*' ]
# Reduce build to only for the valid path
paths:
- docker/**
pull_request:
branches: [ "main" ]
branches: [ "main", "rwkv-x-*" ]
paths:
- docker/**

Expand All @@ -21,7 +21,7 @@ env:

jobs:
build_env:
name: Docker Env Image (cuda-11-8)
name: Docker Env Image (cuda-12-1)

runs-on: ubuntu-latest
permissions:
Expand Down Expand Up @@ -71,9 +71,9 @@ jobs:
# https://github.com/sigstore/cosign-installer
- name: Install cosign
if: github.event_name != 'pull_request'
uses: sigstore/cosign-installer@f3c664df7af409cb4873aa5068053ba9d61a57b6 #v2.6.0
with:
cosign-release: 'v1.11.0'
uses: sigstore/cosign-installer@v3.3.0
# with:
# cosign-release: 'v2.2.0'

# Workaround: https://github.com/docker/build-push-action/issues/461
- name: Setup Docker buildx
Expand Down Expand Up @@ -103,20 +103,20 @@ jobs:

# Build and push Docker image with Buildx (don't push on PR)
# https://github.com/docker/build-push-action
- name: Build and push Docker image (env-cuda-11-8)
- name: Build and push Docker image (env-cuda-12-1)
id: build-and-push
uses: docker/build-push-action@v4
with:
context: "{{defaultContext}}:docker/env-cuda-11-8"
context: "{{defaultContext}}:docker/env-cuda-12-1"
push: ${{ github.event_name != 'pull_request' }} # Don't push on PR
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME_LC }}:env-cuda-11-8
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME_LC }}:env-cuda-12-1
# tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha,src=docker/env-cuda-11-8
cache-from: type=gha,src=docker/env-cuda-12-1
cache-to: type=gha,mode=max

build_runner:
name: Docker Env Image (github-worker-11-8)
name: Docker Env Image (github-worker-12-1)

needs: build_env
runs-on: ubuntu-latest
Expand Down Expand Up @@ -167,9 +167,9 @@ jobs:
# https://github.com/sigstore/cosign-installer
- name: Install cosign
if: github.event_name != 'pull_request'
uses: sigstore/cosign-installer@f3c664df7af409cb4873aa5068053ba9d61a57b6 #v2.6.0
with:
cosign-release: 'v1.11.0'
uses: sigstore/cosign-installer@v3.3.0
# with:
# cosign-release: 'v2.2.0'

# Workaround: https://github.com/docker/build-push-action/issues/461
- name: Setup Docker buildx
Expand Down Expand Up @@ -199,14 +199,14 @@ jobs:

# Build and push Docker image with Buildx (don't push on PR)
# https://github.com/docker/build-push-action
- name: Build and push Docker image (github-worker-cuda-11-8)
- name: Build and push Docker image (github-worker-cuda-12-1)
id: build-and-push
uses: docker/build-push-action@v4
with:
context: "{{defaultContext}}:docker/github-worker-cuda-11-8"
context: "{{defaultContext}}:docker/github-worker-cuda-12-1"
push: ${{ github.event_name != 'pull_request' }} # Don't push on PR
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME_LC }}:github-worker-cuda-11-8
tags: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME_LC }}:github-worker-cuda-12-1
# tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha,src=docker/github-worker-cuda-11-8
cache-from: type=gha,src=docker/github-worker-cuda-12-1
cache-to: type=gha,mode=max
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ dmypy.json
# and standard hidden files ignore. Including
# example files generated via notebook tutorials
.*
scratch/
model/
dataset/
datapath/
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ conda update conda
conda create -n rwkv-infctx python=3.11 pip
conda activate rwkv-infctx

# Install pytorch (>=2.0.1)
conda install -y pytorch==2.0.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
python -m pip install lightning==2.0.5 deepspeed==0.10.0
# Install pytorch (>=2.1.2)
conda install -y pytorch==2.1.2 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
python -m pip install lightning==2.1.3 deepspeed==0.12.6

# Currently for torch.compile + 3.11 to work, for some platform, you will need the nightly build
# if so you may need to try the following instead - this is considered highly "unstable"
Expand Down
50 changes: 46 additions & 4 deletions RWKV-v5/config-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,17 @@ model:
# dim_att: null
# dim_ffn: null
data:
# Skip the datapath setup
#
# ignored if using the preload_datapath.py, useful for speeding up the trainer startup
# provided you have your datasets all properly preinitialized
# ---
# skip_datapath_setup: True

# Datapack config yaml to use instead, this overwrites all other settings below
# ---
# datapack_config_path: null

# dataset_path for the prebuilt dataset, using HF `load_from_disk()`
#
# Use this if you have built your own dataset and saved it with `save_to_disk()`
Expand All @@ -334,6 +345,23 @@ data:
# If using relative path, this should be relative to the trainer script path
data_path: /path/to/store/your/data_path/

# Data path storage options, this is used to support cloud storage
# via the huggingface dataset API. See:
# https://huggingface.co/docs/datasets/v2.16.1/en/filesystems#amazon-s3
#
# Note: As of Jan 2023, these options has been only tested to work with AWS S3, and backblaze. YMMV
# For S3 bucket support you will also need to install s3fs `python3 -m pip install s3fs`
#
# If you want to reduce the risk of accidental key/secret commits, you can use
# `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables instead
#
# For datapath, it should use the `s3://bucket-name/subpath` format
# ---
# data_path_storage_options:
# key: <example S3 key>
# secret: <example S3 secret>
# endpoint_url: <example S3 endpoint>

# Other wise provide the source path, which is used as huggingface dataset path
# this will be used to populate the dataset_path
#
Expand All @@ -349,6 +377,10 @@ data:
# source: "teven/enwiki_00k" # Hugging face dataset
# source: text # Text mode, used with source_data_dir

# Dataset split to use from HF dataset
# ---
# source_dataset_split: train

# Additional source dataset params, used to grab subsets of the dataset
# ---
# source_dataset_params:
Expand Down Expand Up @@ -395,6 +427,7 @@ data:

# Custom text column to use, useful for dataset with alternative training columns labels
# This is checked before multi column merging, default is null (disabled)
# If set this takes priority
# eg: 'code'
# ---
# custom_text_key: 'code'
Expand All @@ -407,19 +440,18 @@ data:
# or throw an error if the default fallback is not found
#
# IMPORTANT NOTE: as newlines are commonly used for multi_column_suffix, etc.
# you should use single quotes to ensure such values dun get escaped.
# eg. multi_column_suffix: ['\n\n']
# you should use double quotes to ensure such values dun get escaped.
# eg. multi_column_suffix: ["\n\n"]
#
# See: https://github.com/RWKV/RWKV-infctx-trainer/issues/34
# Need to use " or the new lines won't be tokenized properly
# ---
# multi_column_keys: ["instruction", "input", "output"]
# multi_column_prefix: ["Instruction:\n", "Input:\n", "Output:\n"]
# multi_column_suffix: ["\n\n", "\n\n", "\n\n"]
# multi_column_suffix: ['', '', '']
# multi_column_train_mask: [true, false, true]
# multi_column_separator: "\n\n"


# Conversation merging process
# useful for merging full conversational datasets, into single documents
# default is off, (or set conversation_key to [])
Expand Down Expand Up @@ -504,6 +536,16 @@ data:
# this can be used together with sort_by_length, otherwise a shuffle will be done
packing_in_sequence: False

# ----------------------------
# Specal use caes flags
# ----------------------------

# Reverse the training dataset order before saving, this is useful for,
# optimizing dataset packing process, when using packing_in_sequence
# and sort_by_length desc order together
reverse_train_dataset_before_save: False


# Path to the current checkpoint to continue training from
# this should be the directory path, and ends with `.ckpt/`
ckpt_path: null
Loading
Loading