We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
necktwi@CheapFellow:~/workspace/llm.c$ make train_gpt2fp32cu USE_CUDNN=1 CUDNN_FRONTEND_PATH="/home/necktwi/workspace/cudnn-frontend/include" necktwi@CheapFellow:~/workspace/llm.c$ ./train_gpt2cu Multi-GPU support is disabled. Using a single GPU. +-----------------------+----------------------------------------------------+ | Parameter | Value | +-----------------------+----------------------------------------------------+ | train data pattern | dev/data/tinyshakespeare/tiny_shakespeare_train.bin | | val data pattern | dev/data/tinyshakespeare/tiny_shakespeare_val.bin | | output log dir | NULL | | checkpoint_every | 0 | | resume | 0 | | micro batch size B | 4 | | sequence length T | 1024 | | total batch size | 4096 | | LR scheduler | cosine | | learning rate (LR) | 3.000000e-04 | | warmup iterations | 0 | | final LR fraction | 1.000000e+00 | | weight decay | 0.000000e+00 | | skip update lossz | 0.000000 | | skip update gradz | 0.000000 | | max_steps | -1 | | val_loss_every | 20 | | val_max_steps | 20 | | sample_every | 20 | | genT | 64 | | overfit_single_batch | 0 | | use_master_weights | enabled | | gelu_fusion | 0 | | recompute | 1 | +-----------------------+----------------------------------------------------+ | device | NVIDIA GeForce RTX 2060 | | peak TFlops | -1.0 | | precision | BF16 | +-----------------------+----------------------------------------------------+ | weight init method | gpt2_124M_bf16.bin | | max_sequence_length T | 1024 | | vocab_size V | 50257 | | padded_vocab_size Vp | 50304 | | num_layers L | 12 | | num_heads NH | 12 | | channels C | 768 | | num_parameters | 124475904 | +-----------------------+----------------------------------------------------+ | train_num_batches | 74 | | val_num_batches | 20 | +-----------------------+----------------------------------------------------+ | run hellaswag | no | +-----------------------+----------------------------------------------------+ | Zero Optimization is disabled | | num_processes | 1 | | zero_stage | 0 | +-----------------------+----------------------------------------------------+ num_parameters: 124475904 => bytes: 248951808 allocated 237 MiB for model parameters batch_size B=4 * seq_len T=1024 * num_processes=1 and total_batch_size=4096 => setting grad_accum_steps=1 allocating 237 MiB for parameter gradients allocating 1326 MiB for activations allocating 474 MiB for AdamW optimizer state m allocating 474 MiB for AdamW optimizer state v allocating 474 MiB for master copy of params device memory usage: 3652 MiB / 5740 MiB memory per sequence: 331 MiB -> estimated maximum batch size: 10 [CUDNN ERROR] at file llmc/cudnn_att.cpp:120: [cudnn_frontend] Error: No execution plans support the graph.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
The text was updated successfully, but these errors were encountered: