Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make more tests robust and independent #877

Merged
merged 20 commits into from
Sep 24, 2024
Merged

Conversation

glatterf42
Copy link
Member

@glatterf42 glatterf42 commented Sep 12, 2024

Closes #797.

This is a follow-up PR to #784 with the aim of bringing even more test stability into our CI suite.

For the moment (at least while it's free), this PR experiments with codecov's new flaky-test-detection feature.

How to review

  • Read the diff and note that the CI checks all pass.
  • Read the PR comments and ensure the reasoning makes sense.

PR checklist

  • Continuous integration checks all ✅
  • Update tests; coverage checks both ✅
  • [ ] Add, expand, or update documentation. Just CI.
  • Update release notes.

@glatterf42 glatterf42 added enh New features & functionality ci Continuous integration labels Sep 12, 2024
@glatterf42 glatterf42 self-assigned this Sep 12, 2024
Copy link

codecov bot commented Sep 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.5%. Comparing base (ef66e37) to head (f70e76a).
Report is 20 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##            main    #877     +/-   ##
=======================================
- Coverage   95.6%   95.5%   -0.1%     
=======================================
  Files         46      46             
  Lines       4344    4340      -4     
=======================================
- Hits        4154    4148      -6     
- Misses       190     192      +2     
Files with missing lines Coverage Δ
message_ix/cli.py 94.2% <100.0%> (+1.5%) ⬆️
message_ix/tests/conftest.py 100.0% <ø> (ø)
message_ix/tests/test_cli.py 90.6% <100.0%> (-9.4%) ⬇️
message_ix/tests/test_core.py 100.0% <100.0%> (ø)
...age_ix/tests/test_feature_bound_activity_shares.py 100.0% <100.0%> (ø)
message_ix/tests/test_feature_storage.py 100.0% <100.0%> (ø)
message_ix/tests/test_macro.py 100.0% <100.0%> (ø)
message_ix/tests/test_report.py 100.0% <100.0%> (ø)
message_ix/tests/test_util.py 100.0% <100.0%> (ø)
message_ix/tests/tools/test_add_year.py 95.6% <100.0%> (ø)
... and 1 more

@glatterf42
Copy link
Member Author

Most items from #797 seem addressed now. Mainly, I'm not yet sure why the tests in test_feature_bound_activity_shares are flaky. The only thing I can think of is that despite our previous efforts (identifying scenario names via request.node.name), some scenarios could still get mixed up so that bounds would be applied sort of randomly instead of as intended. So for now, I have doubled down on this approach with the help of pytest-xdist's recommended way of uniquely identifying each test run.
For now, I've only used it in test_feature_bound_activity_share, but if it works (and we don't intend to drop pytest-xdist), we could use this feature elsewhere, too, to replace request.node.name.
However, in general, I thought request.node.name was uniquely identifying scenarios already, so I'm not quite convinced this new approach works any better.

@glatterf42
Copy link
Member Author

I have to admit: while I did make changes to test_feature_bound_activity_shares, I was not sure they would address the flakiness seen in test_add_share_mode_lo or test_commodity_share_lo. However, I have re-run all tests five times now and have not observed a single flaky test, so maybe one of the changes I made did resolve that flakiness.

For future reference: I have tried replicating the numbers reported in #797 (comment): called make_dantzig locally, defined calc_share as it is in the test file and I find:

>>> 1.05 * calc_share(scen1)
0.56875
>>> calc_share(scen1)
0.5416666666666666

This seems odd since 0.56875 is not one of the reported values when flakiness occurs, seemingly suggesting either that the numbers are reported from a different scenario altogether or that the scenario in question had also been changed by another test (in the same module; since all of them use the module-scoped fixture test_mp). For example, maybe the test_add_share_mode_up frequently added a 0.95 multiplier to the scenario, resulting in the test_add_share_mode_lo scenario finding the "normal" 0.5416... value (due to its reverse-factor of 1 / 0.95 = 1.05).
In either of those cases, the changes I made (ensuring each clone gets its own unique scenario name) could well be the changes needed to eliminate the flakiness.

@glatterf42
Copy link
Member Author

Today, I'm seeing these flaky errors for the first time:

 __________________________________ test_dl[] ___________________________________
[gw1] linux -- Python 3.9.20 /opt/hostedtoolcache/Python/3.9.20/x64/bin/python

message_ix_cli = <bound method message_ix_cli.<locals>.Runner.invoke of <message_ix.tests.conftest.message_ix_cli.<locals>.Runner object at 0x7fc27797a2b0>>
opts = ''
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_dl__0')

    @pytest.mark.parametrize(
        "opts",
        [
            "",
            "--branch=main",
            "--tag=v1.2.0",
            # Nonexistent tag
            pytest.param("--tag=v999", marks=pytest.mark.xfail(raises=AssertionError)),
        ],
    )
    def test_dl(message_ix_cli, opts, tmp_path):
        r = message_ix_cli("dl", opts, str(tmp_path))
    
>       assert r.exit_code == 0, (r.exception, r.output)
E       AssertionError: (URLError(TimeoutError(110, 'Connection timed out')), '')
E       assert 1 == 0
E        +  where 1 = <Result URLError(TimeoutError(110, 'Connection timed out'))>.exit_code

message_ix/tests/test_cli.py:66: AssertionError
____________________________ test_dl[--tag=v1.2.0] _____________________________
[gw0] linux -- Python 3.9.20 /opt/hostedtoolcache/Python/3.9.20/x64/bin/python

message_ix_cli = <bound method message_ix_cli.<locals>.Runner.invoke of <message_ix.tests.conftest.message_ix_cli.<locals>.Runner object at 0x7f3e3f6a8e20>>
opts = '--tag=v1.2.0'
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/popen-gw0/test_dl___tag_v1_2_0_0')

    @pytest.mark.parametrize(
        "opts",
        [
            "",
            "--branch=main",
            "--tag=v1.2.0",
            # Nonexistent tag
            pytest.param("--tag=v999", marks=pytest.mark.xfail(raises=AssertionError)),
        ],
    )
    def test_dl(message_ix_cli, opts, tmp_path):
        r = message_ix_cli("dl", opts, str(tmp_path))
    
>       assert r.exit_code == 0, (r.exception, r.output)
E       AssertionError: (URLError(TimeoutError(110, 'Connection timed out')), '')
E       assert 1 == 0
E        +  where 1 = <Result URLError(TimeoutError(110, 'Connection timed out'))>.exit_code

message_ix/tests/test_cli.py:66: AssertionError
 ____________________________ test_dl[--branch=main] ____________________________
[gw1] linux -- Python 3.9.20 /opt/hostedtoolcache/Python/3.9.20/x64/bin/python

message_ix_cli = <bound method message_ix_cli.<locals>.Runner.invoke of <message_ix.tests.conftest.message_ix_cli.<locals>.Runner object at 0x7fc27797a2b0>>
opts = '--branch=main'
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_dl___branch_main_0')

    @pytest.mark.parametrize(
        "opts",
        [
            "",
            "--branch=main",
            "--tag=v1.2.0",
            # Nonexistent tag
            pytest.param("--tag=v999", marks=pytest.mark.xfail(raises=AssertionError)),
        ],
    )
    def test_dl(message_ix_cli, opts, tmp_path):
        r = message_ix_cli("dl", opts, str(tmp_path))
    
>       assert r.exit_code == 0, (r.exception, r.output)
E       AssertionError: (URLError(TimeoutError(110, 'Connection timed out')), 'Retrieving https://github.com/iiasa/message_ix/archive/main.zip
E         ')
E       assert 1 == 0
E        +  where 1 = <Result URLError(TimeoutError(110, 'Connection timed out'))>.exit_code

message_ix/tests/test_cli.py:66: AssertionError

Unfortunately, they don't provide too much information. If I'm not mistaken, the first two tests timed out retrieving the tags_info from GitHub, while the last test apparently timed out retrieving the data from GitHub. In both cases, I'm not sure what we can do to avoid the flakiness (other than re-running tests on fails): there's just one GitHub address we need to address and it's the same for all tests and we're retrieving all data to temporary directories (that should be unique for each test run), anyway. So this is mainly documenting for future reference.

Lastly, I noticed that we didn't test calling the CLI command dl with both --tag and --branch, so I added a test for that. Unfortunately, click handles the exception somewhat awkwardly, so that the test is for now XPassing: it is triggering the right click.UsageError, but this gets printed to stdout and raises a SystemExit, which doesn't make it through our message_ix_cli fixture: this still returns r, only now it has an .exit_code != 0. While this is not ideal, I didn't want to spend too much time on this right now.

@@ -98,7 +98,7 @@ jobs:
pytest message_ix \
-m "not nightly and not tutorial" \
-rA --verbose --color=yes --durations=20 \
--cov-report=xml \
--cov --junitxml=junit.xml \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, the --cov-report=xml file (coverage.xml) includes the line coverage, whereas the --junitxml file includes the results of tests.

I guess the latter is needed for Codecov to detect flaky tests. But doesn't this change also turn off generation of the former? How does Codecov then receive info on line coverage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to drop this commit since flaky-test-detection is going to become a paid feature sooner or later, anyway. I mainly wanted to see if it could be useful for this PR, no need to keep these changes :)

Copy link
Member

@khaeru khaeru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough tracking and many fixes to flaky tests! The use of isolation across the test suite now exhibits a more consistent style that can be followed by others making additions to the code.

I put some notes inline, and a few small requested changes. With those, this will be good to merge.

message_ix/util/__init__.py Outdated Show resolved Hide resolved
@@ -46,28 +50,29 @@ def output(str):
else:
output = print

# Identify the source directory using importlib.resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just recording for posterity that:

  • importlib.resources was used in Make tests robust & independent #784 because it's in the Python standard library, thus appears to be recommended/preferred over pathlib.Path(__file__).
  • However, there are strong dissenting views from Python users.
  • It was decided to switch to using pathlib.Path(__file__). This reverses changes made in the earlier PR.
  • We don't know if this has any impact on flakiness of tests.

message_ix/cli.py Outdated Show resolved Hide resolved
message_ix/tests/test_cli.py Outdated Show resolved Hide resolved
@glatterf42 glatterf42 merged commit f070ebe into main Sep 24, 2024
26 checks passed
@glatterf42 glatterf42 deleted the enh/ci-test-stability branch September 24, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous integration enh New features & functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make more tests robust & independent
2 participants