Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError in squeeze_excitation function due to SE_ratio being set to zero #224

Open
ChinChangYang opened this issue Feb 12, 2024 · 1 comment

Comments

@ChinChangYang
Copy link

ChinChangYang commented Feb 12, 2024

Description
I encountered a ZeroDivisionError during the execution of the net_to_model.py script in the lczero-training project. The error occurs in the squeeze_excitation function within the tfprocess.py file, specifically at the line where it asserts that the number of channels is evenly divisible by self.SE_ratio. The subsequent division operation leads to a division by zero, indicating that self.SE_ratio is inadvertently set to zero.

Steps to Reproduce

  1. Clone the lczero-training repository with submodules.
  2. Install necessary Python packages: numpy, tensorflow, protobuf.
  3. Download specific network weights and configuration files.
  4. Initialize and run the training setup as per the provided instructions.
  5. The error occurs during the execution of the net_to_model.py script, specifically when the squeeze_excitation function is called.

Expected Behavior
The squeeze_excitation function should execute without errors, processing the input tensor by applying squeeze and excitation operations based on a non-zero SE_ratio.

Actual Behavior
The execution fails with a ZeroDivisionError, indicating that self.SE_ratio is set to zero, which is not expected. The traceback points to the squeeze_excitation function in tfprocess.py.

Environment
https://colab.research.google.com/drive/1a3lkH1IUG-P_Y7scNjenmTmRdJ0RF_5R?usp=sharing

Additional Context
The error suggests a misconfiguration or an oversight in the initialization of the SE_ratio. This parameter is crucial for the squeeze-excitation operation, and it should be a positive integer that divides the number of channels without remainder. It's possible that this is either a code bug or a configuration issue.

Here's the relevant portion of the error message for quick reference:

Traceback (most recent call last):
  File "/content/lczero-training/tf/net_to_model.py", line 28, in <module>
    tfp.init_net()
  File "/content/lczero-training/tf/tfprocess.py", line 383, in init_net
    outputs = self.construct_net(input_var)
  File "/content/lczero-training/tf/tfprocess.py", line 1529, in construct_net
    flow = self.create_residual_body(inputs)
  File "/content/lczero-training/tf/tfprocess.py", line 1424, in create_residual_body
    flow = self.residual_block(flow,
  File "/content/lczero-training/tf/tfprocess.py", line 1248, in residual_block
    out2 = self.squeeze_excitation(self.batch_norm(conv2,
  File "/content/lczero-training/tf/tfprocess.py", line 1196, in squeeze_excitation
    assert channels % self.SE_ratio == 0
ZeroDivisionError: integer division or modulo by zero

I would appreciate any insights into this issue or suggestions on how to properly configure the SE_ratio to avoid this error.

EDIT

@borg323
Copy link
Member

borg323 commented Feb 12, 2024

This goes back to #58, when support for non SE CNNs was dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants