flux lora training does not start #1589

nim00e · 2024-09-10T16:44:07Z

running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 3
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 3
num epochs / epoch数: 250
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 750
steps: 0%| | 0/750 [00:00<?, ?it/s]2024-09-10 15:23:27 INFO unet dtype: torch.float16, device: cpu train_network.py:1046
INFO text_encoder [0] dtype: torch.float16, device: cpu train_network.py:1052
INFO text_encoder [1] dtype: torch.float16, device: cpu train_network.py:1052

epoch 1/250
INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:668
Traceback (most recent call last):
File "/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py", line 519, in
trainer.train(args)
File "/workspace/nima_workspace/kohya_ss/sd-scripts/train_network.py", line 1141, in train
noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
File "/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py", line 380, in get_noise_pred_and_target
model_pred = unet(
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/nima_workspace/kohya_ss/sd-scripts/library/flux_models.py", line 1008, in forward
img = self.img_in(img)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
steps: 0%| | 0/750 [02:24<?, ?it/s]
Traceback (most recent call last):
File "/workspace/miniconda3/envs/kohya/bin/accelerate", line 8, in
sys.exit(main())
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/miniconda3/envs/kohya/bin/python', '/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', '/workspace/nima_workspace/kohya_ss/experiments/test_lora/img_2/model/config_lora-20240910-152248.toml']' returned non-zero exit status 1.

The training does not start and exits with the above error.

I see that torach.cuda.is_available returns True, so why is that the device for text encoder and unet is cpu?
Any help is appreciated

kohya-ss · 2024-09-10T23:15:54Z

Your accelerate config seems to be configured with cpu. Please run accelerate config again to use gpu.

nim00e · 2024-09-12T11:37:30Z

I set accelerate config to gpu and gave [all] for gpu ids. but still facing this issue

kohya-ss · 2024-09-12T14:08:07Z

steps: 0%| | 0/750 [00:00<?, ?it/s]2024-09-10 15:23:27 INFO unet dtype: torch.float16, device: cpu train_network.py:1046

This line shows U-Net (DiT) is on CPU. Did the output of this line change?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flux lora training does not start #1589

flux lora training does not start #1589

nim00e commented Sep 10, 2024 •

edited

Loading

kohya-ss commented Sep 10, 2024

nim00e commented Sep 12, 2024

kohya-ss commented Sep 12, 2024

flux lora training does not start #1589

flux lora training does not start #1589

Comments

nim00e commented Sep 10, 2024 • edited Loading

kohya-ss commented Sep 10, 2024

nim00e commented Sep 12, 2024

kohya-ss commented Sep 12, 2024

nim00e commented Sep 10, 2024 •

edited

Loading