You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 3
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 3
num epochs / epoch数: 250
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 750
steps: 0%| | 0/750 [00:00<?, ?it/s]2024-09-10 15:23:27 INFO unet dtype: torch.float16, device: cpu train_network.py:1046
INFO text_encoder [0] dtype: torch.float16, device: cpu train_network.py:1052
INFO text_encoder [1] dtype: torch.float16, device: cpu train_network.py:1052
epoch 1/250
INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:668
Traceback (most recent call last):
File "/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py", line 519, in
trainer.train(args)
File "/workspace/nima_workspace/kohya_ss/sd-scripts/train_network.py", line 1141, in train
noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
File "/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py", line 380, in get_noise_pred_and_target
model_pred = unet(
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/nima_workspace/kohya_ss/sd-scripts/library/flux_models.py", line 1008, in forward
img = self.img_in(img)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
steps: 0%| | 0/750 [02:24<?, ?it/s]
Traceback (most recent call last):
File "/workspace/miniconda3/envs/kohya/bin/accelerate", line 8, in
sys.exit(main())
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/miniconda3/envs/kohya/bin/python', '/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', '/workspace/nima_workspace/kohya_ss/experiments/test_lora/img_2/model/config_lora-20240910-152248.toml']' returned non-zero exit status 1.
The training does not start and exits with the above error.
I see that torach.cuda.is_available returns True, so why is that the device for text encoder and unet is cpu?
Any help is appreciated
The text was updated successfully, but these errors were encountered:
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 3
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 3
num epochs / epoch数: 250
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 750
steps: 0%| | 0/750 [00:00<?, ?it/s]2024-09-10 15:23:27 INFO unet dtype: torch.float16, device: cpu train_network.py:1046
INFO text_encoder [0] dtype: torch.float16, device: cpu train_network.py:1052
INFO text_encoder [1] dtype: torch.float16, device: cpu train_network.py:1052
epoch 1/250
INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:668
Traceback (most recent call last):
File "/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py", line 519, in
trainer.train(args)
File "/workspace/nima_workspace/kohya_ss/sd-scripts/train_network.py", line 1141, in train
noise_pred, target, timesteps, huber_c, weighting = self.get_noise_pred_and_target(
File "/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py", line 380, in get_noise_pred_and_target
model_pred = unet(
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/nima_workspace/kohya_ss/sd-scripts/library/flux_models.py", line 1008, in forward
img = self.img_in(img)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half
steps: 0%| | 0/750 [02:24<?, ?it/s]
Traceback (most recent call last):
File "/workspace/miniconda3/envs/kohya/bin/accelerate", line 8, in
sys.exit(main())
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/workspace/miniconda3/envs/kohya/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/miniconda3/envs/kohya/bin/python', '/workspace/nima_workspace/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', '/workspace/nima_workspace/kohya_ss/experiments/test_lora/img_2/model/config_lora-20240910-152248.toml']' returned non-zero exit status 1.
The training does not start and exits with the above error.
I see that torach.cuda.is_available returns True, so why is that the device for text encoder and unet is cpu?
Any help is appreciated
The text was updated successfully, but these errors were encountered: