Skip to content

Commit

Permalink
Fix slow performance on 10 series Nvidia GPUs.
Browse files Browse the repository at this point in the history
  • Loading branch information
comfyanonymous committed Aug 21, 2024
1 parent 015f73d commit a60620d
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions comfy/model_management.py
Original file line number Diff line number Diff line change
Expand Up @@ -668,6 +668,7 @@ def unet_manual_cast(weight_dtype, inference_device, supported_dtypes=[torch.flo
if bf16_supported and weight_dtype == torch.bfloat16:
return None

fp16_supported = should_use_fp16(inference_device, prioritize_performance=True)
for dt in supported_dtypes:
if dt == torch.float16 and fp16_supported:
return torch.float16
Expand Down

8 comments on commit a60620d

@JorgeR81
Copy link

@JorgeR81 JorgeR81 commented on a60620d Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTX 1070
32 GB RAM
Windows 10
pytorch version: 2.1.0+cu121

No difference for me, with this commit:

  • Flux fp8 is still slower, and it uses mostly my CPU.

  • Flux fp16 is still much faster, and it uses mostly my GPU,

  • Flux fp16 still has about the same speed as GUFF ( witch also uses mostly GPU )
    All GUFF formats also have about the same speeds ( including Q4 and Q8 )
    There are diminishing returns as image size increases, especially from 512x768 to 1024x1024 ( Diminishing returns with larger sizes city96/ComfyUI-GGUF#35 )
    And all formats ( fp16, Q4, Q8 ) have about the same speeds per image size.
    ( So there's got to be a bottleneck somewhere ? )

@brahianrosswill
Copy link

@brahianrosswill brahianrosswill commented on a60620d Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTX 1070 32 GB RAM Windows 10 pytorch version: 2.1.0+cu121

No difference for me, with this commit:

  • Flux fp8 is still slower, and it uses mostly my CPU.
  • Flux fp16 is still much faster, and it uses mostly my GPU,
  • Flux fp16 still has about the same speed as GUFF ( witch also uses mostly GPU )
    All GUFF formats also have about the same speeds ( including Q4 and Q8 )
    There are diminishing returns as image size increases, especially from 512x768 to 1024x1024 ( Diminishing returns with larger sizes city96/ComfyUI-GGUF#35 )
    And all formats ( fp16, Q4, Q8 ) have about the same speeds per image size.
    ( So there's got to be a bottleneck somewhere ? )

just don't be impatient my guy i think my speeds are ok, not amazing. however i think lllyasviel from forge is creating some secret sauce and nobody has figure out yet

@comfyanonymous
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you actually updated? Can I see your log?

@JorgeR81
Copy link

@JorgeR81 JorgeR81 commented on a60620d Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually just do an "Update All" via manager.

Here is the log:

comfyui-logs-1724314579650.json

console output
C:\Cui\cu_121_2\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --force-fp16 --windows-standalone-build
[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-08-22 09:09:18.807357
** Platform: Windows
** Python version: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
** Python executable: C:\Cui\cu_121_2\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI
** Log path: C:\Cui\cu_121_2\ComfyUI_windows_portable\comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Marigold
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\rgthree-comfy
   2.2 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 8192 MB, total RAM 32727 MB
pytorch version: 2.1.0+cu121
Forcing FP16.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1070 : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\web
Adding extra search path checkpoints d:/ComfyUI/models/checkpoints/
Adding extra search path clip d:/ComfyUI/models/clip/
Adding extra search path clip_vision d:/ComfyUI/models/clip_vision/
Adding extra search path configs d:/ComfyUI/models/configs/
Adding extra search path controlnet d:/ComfyUI/models/controlnet/
Adding extra search path embeddings d:/ComfyUI/models/embeddings/
Adding extra search path ipadapter d:/ComfyUI/models/ipadapter/
Adding extra search path loras d:/ComfyUI/models/loras/
Adding extra search path unet d:/ComfyUI/models/unet/
Adding extra search path upscale_models d:/ComfyUI/models/upscale_models/
Adding extra search path vae d:/ComfyUI/models/vae/
[ComfyUI-0️⃣ 2️⃣ 4️⃣ 6️⃣ ] Topological Execution is detected.
[ComfyUI-0️⃣ 2️⃣ 4️⃣ 6️⃣ ] Loaded all nodes and apis.
### Loading: ComfyUI-Impact-Pack (V7.1.1)
### Loading: ComfyUI-Impact-Pack (Subpack: V0.6)
[Impact Pack] Wildcards loading done.
### Loading: ComfyUI-Inspire-Pack (V0.86.1)
### Loading: ComfyUI-Manager (V2.50.1)
### ComfyUI Revision: 2594 [a60620dc] | Released on '2024-08-21'
C:\Cui\cu_121_2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\transformers\transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Total VRAM 8192 MB, total RAM 32727 MB
pytorch version: 2.1.0+cu121
Forcing FP16.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1070 : cudaMallocAsync
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ReActor] - STATUS - Running v0.5.1-a6 in ComfyUI
C:\Cui\cu_121_2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
Torch version: 2.1.0+cu121
[comfyui_controlnet_aux] | INFO -> Using ckpts path: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_controlnet_aux\ckpts
[comfyui_controlnet_aux] | INFO -> Using symlinks: False
[comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
DWPose: Onnxruntime with acceleration providers detected
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
Please 'pip install xformers'
Nvidia APEX normalization not installed, using PyTorch LayerNorm

[rgthree] Loaded 42 exciting nodes.

WAS Node Suite: BlenderNeko's Advanced CLIP Text Encode found, attempting to enable `CLIPTextEncode` support.
WAS Node Suite: `CLIPTextEncode (BlenderNeko Advanced + NSP)` node enabled under `WAS Suite/Conditioning` menu.
WAS Node Suite: OpenCV Python FFMPEG support is enabled
WAS Node Suite Warning: `ffmpeg_bin_path` is not set in `C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\was-node-suite-comfyui\was_suite_config.json` config file. Will attempt to use system ffmpeg binaries if available.
WAS Node Suite: Finished. Loaded 219 nodes successfully.

        "You have within you right now, everything you need to deal with whatever the world can throw at you." - Brian Tracy


Import times for custom nodes:
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Noise
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Cutoff
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\cg-use-everywhere
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_TiledKSampler
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ADV_CLIP_emb
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AutomaticCFG
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_InstantID
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Custom-Scripts
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_UltimateSDUpscale
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\rgthree-comfy
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-0246
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\aegisflow_utility_nodes
   0.1 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_essentials
   0.1 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Inspire-Pack
   0.1 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_controlnet_aux
   0.2 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_smZNodes
   0.3 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Marigold
   0.3 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_FaceAnalysis
   0.4 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\PuLID_ComfyUI
   0.5 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
   0.8 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-reactor-node
   1.2 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack
   2.4 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\was-node-suite-comfyui

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
model weight dtype torch.float16, manual cast: None
model_type FLUX
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 4778.66552734375 True
Requested to load Flux
Loading 1 new model
loaded partially 5928.819987487793 5919.8790283203125 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [02:58<00:00, 14.86s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 336.28 seconds

@JorgeR81
Copy link

@JorgeR81 JorgeR81 commented on a60620d Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit "fp16 is actually faster than fp32 on a GTX 1080" also made no difference.

So I did an updade_comfyui_and_phyton_dependencies.bat

This got me to pytorch 2.4 !

I thought it would only take me to 2.3.1 ... ( I was in 2.1.0 )

Let's see how it goes ....


EDIT:

  • Flux fp16 speed seems to be about the same ( about 27s/it, for a1024x1024 image )

  • Flux fp8 is now slighty faster than fp16 ( about 23s/it )

  • Q4_K_S and Q8_0 seem to have similar speeds to FP8 and FP16, respectively.
    Their advantage, for me, is still less RAM usage overall and staying below 32 GB RAM usage while loading.

Here is the log now:

comfyui-logs-1724325836041.json

console output
C:\Cui\cu_121_2\ComfyUI_windows_portable>.\python_embeded\python.exe -s ComfyUI\main.py --force-fp16 --windows-standalone-build
[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-08-22 11:33:22.848959
** Platform: Windows
** Python version: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
** Python executable: C:\Cui\cu_121_2\ComfyUI_windows_portable\python_embeded\python.exe
** ComfyUI Path: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI
** Log path: C:\Cui\cu_121_2\ComfyUI_windows_portable\comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\rgthree-comfy
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Marigold
   1.9 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 8192 MB, total RAM 32727 MB
pytorch version: 2.4.0+cu121
Forcing FP16.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1070 : cudaMallocAsync
Using pytorch cross attention
[Prompt Server] web root: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\web
Adding extra search path checkpoints d:/ComfyUI/models/checkpoints/
Adding extra search path clip d:/ComfyUI/models/clip/
Adding extra search path clip_vision d:/ComfyUI/models/clip_vision/
Adding extra search path configs d:/ComfyUI/models/configs/
Adding extra search path controlnet d:/ComfyUI/models/controlnet/
Adding extra search path embeddings d:/ComfyUI/models/embeddings/
Adding extra search path ipadapter d:/ComfyUI/models/ipadapter/
Adding extra search path loras d:/ComfyUI/models/loras/
Adding extra search path unet d:/ComfyUI/models/unet/
Adding extra search path upscale_models d:/ComfyUI/models/upscale_models/
Adding extra search path vae d:/ComfyUI/models/vae/
C:\Cui\cu_121_2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\kornia\feature\lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
[ComfyUI-0️⃣ 2️⃣ 4️⃣ 6️⃣ ] Topological Execution is detected.
[ComfyUI-0️⃣ 2️⃣ 4️⃣ 6️⃣ ] Loaded all nodes and apis.
### Loading: ComfyUI-Impact-Pack (V7.1.1)
### Loading: ComfyUI-Impact-Pack (Subpack: V0.6)
[Impact Pack] Wildcards loading done.
### Loading: ComfyUI-Inspire-Pack (V0.86.1)
### Loading: ComfyUI-Manager (V2.50.1)
### ComfyUI Revision: 2597 [dafbe321] | Released on '2024-08-21'
C:\Cui\cu_121_2\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\models\transformers\transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
  deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Total VRAM 8192 MB, total RAM 32727 MB
pytorch version: 2.4.0+cu121
Forcing FP16.
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1070 : cudaMallocAsync
C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Marigold\nodes.py:44: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  empty_text_embed = torch.load(os.path.join(script_directory, "empty_text_embed.pt"), map_location="cpu")
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/github-stats.json
[ReActor] - STATUS - Running v0.5.1-a6 in ComfyUI
Torch version: 2.4.0+cu121
[comfyui_controlnet_aux] | INFO -> Using ckpts path: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_controlnet_aux\ckpts
[comfyui_controlnet_aux] | INFO -> Using symlinks: False
[comfyui_controlnet_aux] | INFO -> Using ort providers: ['CUDAExecutionProvider', 'DirectMLExecutionProvider', 'OpenVINOExecutionProvider', 'ROCMExecutionProvider', 'CPUExecutionProvider', 'CoreMLExecutionProvider']
DWPose: Onnxruntime with acceleration providers detected
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json
[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json
Please 'pip install xformers'
Nvidia APEX normalization not installed, using PyTorch LayerNorm

[rgthree] Loaded 42 exciting nodes.

WAS Node Suite: BlenderNeko's Advanced CLIP Text Encode found, attempting to enable `CLIPTextEncode` support.
WAS Node Suite: `CLIPTextEncode (BlenderNeko Advanced + NSP)` node enabled under `WAS Suite/Conditioning` menu.
WAS Node Suite: OpenCV Python FFMPEG support is enabled
WAS Node Suite Warning: `ffmpeg_bin_path` is not set in `C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\was-node-suite-comfyui\was_suite_config.json` config file. Will attempt to use system ffmpeg binaries if available.
WAS Node Suite: Finished. Loaded 219 nodes successfully.

        "Success is not just about making money. It's about making a difference." - Unknown


Import times for custom nodes:
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\websocket_image_save.py
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Noise
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\cg-use-everywhere
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_ADV_CLIP_emb
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_Cutoff
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_TiledKSampler
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-AutomaticCFG
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_InstantID
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Custom-Scripts
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_UltimateSDUpscale
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\rgthree-comfy
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_essentials
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\aegisflow_utility_nodes
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-0246
   0.0 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_controlnet_aux
   0.1 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Inspire-Pack
   0.2 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_smZNodes
   0.2 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Marigold
   0.3 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\PuLID_ComfyUI
   0.3 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_FaceAnalysis
   0.4 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager
   0.5 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui-reactor-node
   0.9 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Impact-Pack
   2.2 seconds: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\was-node-suite-comfyui

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: C:\Cui\cu_121_2\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json [DONE]
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLUX
Requested to load FluxClipModel_
Loading 1 new model
loaded completely 0.0 4778.66552734375 True
Requested to load Flux
Loading 1 new model
loaded partially 5740.115987487793 5732.692443847656 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [04:42<00:00, 23.53s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 448.84 seconds
got prompt
Requested to load Flux
Loading 1 new model
got prompt
loaded partially 5738.115987487793 5732.692443847656 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [04:30<00:00, 22.51s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 277.74 seconds
Prompt executed in 0.00 seconds
got prompt
got prompt
model weight dtype torch.bfloat16, manual cast: torch.float16
model_type FLUX
Requested to load Flux
Loading 1 new model
loaded partially 5738.115987487793 5721.8321533203125 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [05:30<00:00, 27.55s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 481.10 seconds
Prompt executed in 0.00 seconds
got prompt

ggml_sd_loader:
 0                             471
 12                            304
 1                               5


model weight dtype torch.bfloat16, manual cast: torch.float16
model_type FLUX
Requested to load Flux
Loading 1 new model
loaded partially 5738.115987487793 5729.402587890625 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [04:41<00:00, 23.44s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 308.26 seconds
got prompt

ggml_sd_loader:
 1                             476
 8                             304


model weight dtype torch.bfloat16, manual cast: torch.float16
model_type FLUX
Requested to load Flux
Loading 1 new model
loaded partially 5738.115987487793 5731.0665283203125 0
100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [05:24<00:00, 27.06s/it]
Requested to load AutoencodingEngine
Loading 1 new model
loaded completely 0.0 319.7467155456543 True
Prompt executed in 350.61 seconds

@JorgeR81
Copy link

@JorgeR81 JorgeR81 commented on a60620d Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's Torch 2.4, improving the FT8 speeds or if a simple update_comfyui.bat would have fixed that for me.

This user is also on Windows 10, CUDA12.1, and that issue ( FT8 being slower than FT16 ), was fixed for him, a few days ago, when he upgraded from Torch 2.3.1 to Torch 2.4

#4501 (comment)

@JorgeR81
Copy link

@JorgeR81 JorgeR81 commented on a60620d Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trying some loras, with Flux FT8, I suddenly noticed a massive slow down, after the first step ( 108 s/it ).
Never had that before. 

Memory usage is still the same but GPU activity goes to up to 100%, on step 2. 

Here is the log:
comfyui-logs-1724337255442.json


EDIT:

I tested some more.

This only seems to happen after using a very large lora ( 1.28 GB ), I've got on civitai.
https://civitai.com/models/641309/formcorrector-anatomic?modelVersionId=717317

I just realized that I've only tried this lora with Q4_0 ( not FP8 ), before upgrading to torch 2.4.
Now it works fine with Q4_K_S and even Q8_0.
Speed is only slightly slower with the lora ( 25 s/it, instead of 23 s/it ).

So, this may not be a torch 2.4. issue


I can keep testing other stuff with pytorch 2.4, if it helps.

Or I can downgrade to another pytorch version, if you tell me the exact cmd ( I have a portable install ).

@JorgeR81
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm on Windows, I downgraded to pytorch 2.3.1, just to be on the safe side. 
 
...\python_embeded>python.exe -m pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121


  • Flux FP8 speed is about the same 
  • The lora issue ( mentioned above ) still exists in FP8

Please sign in to comment.