ZeroDivisionError in squeeze_excitation function due to SE_ratio being set to zero #224

ChinChangYang · 2024-02-12T16:19:46Z

Description
I encountered a ZeroDivisionError during the execution of the net_to_model.py script in the lczero-training project. The error occurs in the squeeze_excitation function within the tfprocess.py file, specifically at the line where it asserts that the number of channels is evenly divisible by self.SE_ratio. The subsequent division operation leads to a division by zero, indicating that self.SE_ratio is inadvertently set to zero.

Steps to Reproduce

Clone the lczero-training repository with submodules.
Install necessary Python packages: numpy, tensorflow, protobuf.
Download specific network weights and configuration files.
Initialize and run the training setup as per the provided instructions.
The error occurs during the execution of the net_to_model.py script, specifically when the squeeze_excitation function is called.

Expected Behavior
The squeeze_excitation function should execute without errors, processing the input tensor by applying squeeze and excitation operations based on a non-zero SE_ratio.

Actual Behavior
The execution fails with a ZeroDivisionError, indicating that self.SE_ratio is set to zero, which is not expected. The traceback points to the squeeze_excitation function in tfprocess.py.

Environment
https://colab.research.google.com/drive/1a3lkH1IUG-P_Y7scNjenmTmRdJ0RF_5R?usp=sharing

Additional Context
The error suggests a misconfiguration or an oversight in the initialization of the SE_ratio. This parameter is crucial for the squeeze-excitation operation, and it should be a positive integer that divides the number of channels without remainder. It's possible that this is either a code bug or a configuration issue.

Here's the relevant portion of the error message for quick reference:

Traceback (most recent call last):
  File "/content/lczero-training/tf/net_to_model.py", line 28, in <module>
    tfp.init_net()
  File "/content/lczero-training/tf/tfprocess.py", line 383, in init_net
    outputs = self.construct_net(input_var)
  File "/content/lczero-training/tf/tfprocess.py", line 1529, in construct_net
    flow = self.create_residual_body(inputs)
  File "/content/lczero-training/tf/tfprocess.py", line 1424, in create_residual_body
    flow = self.residual_block(flow,
  File "/content/lczero-training/tf/tfprocess.py", line 1248, in residual_block
    out2 = self.squeeze_excitation(self.batch_norm(conv2,
  File "/content/lczero-training/tf/tfprocess.py", line 1196, in squeeze_excitation
    assert channels % self.SE_ratio == 0
ZeroDivisionError: integer division or modulo by zero

I would appreciate any insights into this issue or suggestions on how to properly configure the SE_ratio to avoid this error.

EDIT

Network: 11248.pb.gz
YAML configuration: 256x20.yaml
How to reproduce: lc0-net-to-model.ipynb

The text was updated successfully, but these errors were encountered:

borg323 · 2024-02-12T16:36:36Z

This goes back to #58, when support for non SE CNNs was dropped.

ChinChangYang mentioned this issue Feb 12, 2024

Implement Net-to-CoreML Conversion Script #222

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeroDivisionError in squeeze_excitation function due to SE_ratio being set to zero #224

ZeroDivisionError in squeeze_excitation function due to SE_ratio being set to zero #224

ChinChangYang commented Feb 12, 2024 •

edited

Loading

borg323 commented Feb 12, 2024

ZeroDivisionError in squeeze_excitation function due to SE_ratio being set to zero #224

ZeroDivisionError in squeeze_excitation function due to SE_ratio being set to zero #224

Comments

ChinChangYang commented Feb 12, 2024 • edited Loading

borg323 commented Feb 12, 2024

ChinChangYang commented Feb 12, 2024 •

edited

Loading