Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix major CI bugs #233

Merged
merged 9 commits into from
Oct 21, 2024
Merged

Fix major CI bugs #233

merged 9 commits into from
Oct 21, 2024

Conversation

danielholanda
Copy link
Collaborator

@danielholanda danielholanda commented Oct 20, 2024

Description

This PR solves 1 major and 1 minor CI bug:

  • [BUG 1] TKML was testing the pypi version rather than the current version (Major)
  • [BUG 2] TKML timeout fails on main (minor)

Bug 1

TKML was testing the pypi version rather than the current version. This was effectively causing all of the plugin and turnkey tests to be false positives.

This was caused by hardcoding the supported version of TKML into the plugins setup.py AND installing plugins after installing tkml itself (overwriting this).

This issue has been solved by carefuly changing the installation order and being more flexible on the versions of tkml that the plugins accept.

Bug 2

Pytorch export significantly improved recently, causing our timeout test to behave as shown below:

Info: Running turnkey on ~\turnkeyml\src\turnkeyml\common\generated\test_corpus\extras\timeout.py

Building "timeout"
    + Discovering PyTorch models   
    x Exporting PyTorch to ONNX   
      Optimizing ONNX file   

timeout.py:
        model (executed 1x - 0.16s)
                Location:       C:\\Users\\danie\\turnkeyml\\src\\turnkeyml\\common\\generated\\test_corpus\\extras\\timeout.py, line 22
                Parameters:     500,001,000 (1.86 GB)
                Input Shape:    'x': (500000,)
                Build dir:      C:\Users\danie\turnkeyml\src\turnkeyml\common\generated\cli_cache_dir\timeout
                Status:         Error: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
(...)
    _C._jit_pass_onnx_graph_shape_type_inference(
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

Instead of relying on an extremely long export, we simply make the model take long to be discovered by adding a busy wait using sleep. As a result, we can reliably test our timeout feature.

@danielholanda danielholanda self-assigned this Oct 20, 2024
@danielholanda danielholanda marked this pull request as draft October 21, 2024 16:05
@danielholanda danielholanda changed the title Fix CI timeout issues Fix major CI bugs Oct 21, 2024
@danielholanda danielholanda marked this pull request as ready for review October 21, 2024 16:26
@danielholanda danielholanda added bug Something isn't working p0 Top priority labels Oct 21, 2024
Copy link
Collaborator

@ramkrishna2910 ramkrishna2910 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@danielholanda danielholanda merged commit 3f97d74 into main Oct 21, 2024
11 checks passed
@danielholanda danielholanda deleted the dholanda/fix_ci_timeouts branch October 21, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p0 Top priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants