Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid jenkinsapi trying to fetch all jobs. #1044

Closed
wants to merge 6 commits into from

Conversation

nuclearsandwich
Copy link
Contributor

There's a malinteraction with the jenkinsapi library and Python convention which has been causing extremely large HTTP requests whenever the truthiness of an instance of the Jenkins class is tested.

Jenkins defines a __len__ method which is used to determine truthiness. But this method triggers a fetch of every job on our Jenkins host which can take 2 - 5 minutes to return due to the sheer volume of jobs.

By avoiding this in favor of an explicit check of is False or is None as appropriate we can save taking a really big runtime hit and cut down on spurious failures due to timeouts.

By default the jenkinsapi Jenkins class will trigger an eager loading of all Jenkins jobs via the API. Our Jenkins server buckles under the weight of this and so we do not want to do this until we need to and ideally future changes will replace unfiltered polling completely. In order to prevent an initial poll when we first create the local Jenkins instance we also set the lazy kwarg.

There's a malinteraction with the jenkinsapi library and Python
convention which has been causing extremely large HTTP requests whenever
the truthiness of an instance of the Jenkins class is tested.

Jenkins defines a `__len__` method which is used to determine
truthiness. But this method triggers a fetch of every job on our Jenkins
host which can take 2 - 5 minutes to return due to the sheer volume of
jobs.

By avoiding this in favor of an explicit check of `is False` or `is
None` as appropriate we can save taking a really big runtime hit and cut
down on spurious failures due to timeouts.
By default the jenkinsapi Jenkins class will trigger an eager loading of
all Jenkins jobs via the API. Our Jenkins server buckles under the
weight of this and so we do not want to do this until we need to and
ideally future changes will replace unfiltered polling completely.
There's a malinteraction with the jenkinsapi library and Python
convention which has been causing extremely large HTTP requests whenever
the truthiness of an instance of the Jenkins class is tested.

Jenkins defines a `__len__` method which is used to determine
truthiness. But this method triggers a fetch of every job on our Jenkins
host which can take 2 - 5 minutes to return due to the sheer volume of
jobs.

By avoiding this in favor of an explicit check of `is False` or `is
None` as appropriate we can save taking a really big runtime hit and cut
down on spurious failures due to timeouts.
@nuclearsandwich
Copy link
Contributor Author

I think I either got all of the conditional logic correct or I reversed all of them.

@cottsay
Copy link
Member

cottsay commented Apr 23, 2024

Is there a reason some comparisons are with None and others with False?

Copy link
Member

@cottsay cottsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ROS 1 Release CI failures are a legitimate regression and should be addressed.

@nuclearsandwich
Copy link
Contributor Author

Is there a reason some comparisons are with None and others with False?

Yeah, it's based on what the "fallback" value is in the surrounding code.
In some cases the jenkins / _cached_jenkins value is unset and so None is what we expect to find when not a Jenkins class instance but in the _job.py in particular there's an inline conditional based on whether the job is generating a groovy script or running on a workstation that determines whether it connects to a remote Jenkins or not and the fallback value used there is False rather than None.

I would be fine refactoring those to use None as well if preferred.

The ROS 1 Release CI failures are a legitimate regression and should be addressed.

Agreed. I will look into them.

The ros_buildfarm pipeline has branching behavior whenever being run on
a workstation, in which case it connects to Jenkins via HTTP, and in a
build farm job , in which case it emits Groovy scripts to make the
equivalent changes.

In order to avoid hitting heavyweight API endpoints for presence
queries, always check whether the jenkins API client is `None` rather
than different paths checking for for it to be False or None.

Previously these false-y values were interchangable but not in this
formulation.

Another alternative would be to make all checks:
`not isinstance(Jenkins)`
which would keep the versatility without trying to check the truthiness,
and thus length of the Jenkins object, avoiding the heavyweight API
call.
@nuclearsandwich
Copy link
Contributor Author

Well e935e5e now has a different breakage but at least its one I can reproduce locally unlike the last one which I just had to suss out.

…ot None."

Everything is more broken from this so work will be requried to find out
where this Jenkins object is coming from.

This reverts commit 31a6442.
@nuclearsandwich
Copy link
Contributor Author

#1045 avoids this same problem but without attempting to clean up the tri-state logic. I'm going to proceed with that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants