Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_cp_org_libraries.py runs veryyyy slow #263

Open
evaherrada opened this issue Apr 4, 2022 · 13 comments · Fixed by #362
Open

update_cp_org_libraries.py runs veryyyy slow #263

evaherrada opened this issue Apr 4, 2022 · 13 comments · Fixed by #362
Assignees

Comments

@evaherrada
Copy link
Collaborator

evaherrada commented Apr 4, 2022

Results of my initial tests of the functions that get run.

Total time per repo: 3.5s
validate_actions_state: 0.459673s
validate_contents: 0.433544s
validate_core_driver_page: 0.003178s
validate_default_branch: 0.000001s
validate_in_pypi: 0.040591s
validate_labels: 0.117371s
validate_passes_linting: 0.639910s
validate_readthedocs: 1.361857s
validate_release_state: 0.333909s
validate_repo_state: 0.134701s

So the real big one is validate_readthedocs. That one is definitely one of the more complicated ones but I think we could for sure simplify it.

validate_passes_linting also takes a while but it actually clones the repository so that make sense.

@evaherrada evaherrada self-assigned this Apr 4, 2022
@dhalbert
Copy link
Contributor

dhalbert commented Nov 9, 2023

We are now often seeing the job exceed the 6 hour job runtime limit. For example: https://github.com/adafruit/circuitpython-org/actions/runs/6809744073/job/18516615845.

There is some GitHub API rate limiting going on too (see in that job), but it would be really nice to make this run faster.

@dhalbert
Copy link
Contributor

Thinking about your local test in #361, are we doing the queries without credentials? If we did them with credentials, would we avoid the rate limit?

@jepler
Copy link
Member

jepler commented Nov 26, 2023

I'm not sure about authorization. I filed #348 thinking that this process was using Classic PATs (https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic) which are deprecated or discouraged; I forget the exact context that led to me filing that issue but it seems to be related to this note in the docs:

Note: Organization owners can restrict the access of personal access token (classic) to their organization. If you try to use a personal access token (classic) to access resources in an organization that has disabled personal access token (classic) access, your request will fail with a 403 response. Instead, you must use a GitHub App, OAuth app, or fine-grained personal access token.

Using PATs of some kind seems to be the Proper Way (TM) to do this, but "fine-grained" PATs are the preferred way now, if we don't want to deal with having a GitHub App: https://docs.github.com/en/actions/security-guides/automatic-token-authentication#granting-additional-permissions

I am pretty sure SOME kind of authentication/token is at play here, otherwise the unauthenticated limit is something like 20 API calls an hour, far too few.

@jepler
Copy link
Member

jepler commented Nov 26, 2023

My local test was without authorization, but that was kinda-nice since it let me easily hit the timeout case. For the run in circuitpython-org that's troubling us, we appear to be using a token via ADABOT_GITHUB_USER and _ACCESS_TOKEN:

Thu, 09 Nov 2023 09:20:13 GMT     ADABOT_GITHUB_USER: ***
Thu, 09 Nov 2023 09:20:13 GMT     ADABOT_GITHUB_ACCESS_TOKEN: ***

since the value of repository secrets can't be inspected, I don't know what user & token is in play.

@dhalbert
Copy link
Contributor

I think these tokens are OK, because I changed them recently, carefully, and other Actions jobs depend on them.

@jepler
Copy link
Member

jepler commented Nov 26, 2023

A successful run looks like this:

Sat, 25 Nov 2023 09:19:00 GMT Run Date: 25 November 2023, 09:19AM
Sat, 25 Nov 2023 09:19:00 GMT  - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Sat, 25 Nov 2023 10:02:45 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:02:45 GMT Rate Limit will reset at: 2023-11-25 10:19:13
Sat, 25 Nov 2023 10:19:13 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Sat, 25 Nov 2023 10:20:59 GMT {
Sat, 25 Nov 2023 10:20:59 GMT   "updated_at": "2023-11-25T09:19:00Z",
[rest of json snipped]

so we have:

  • runs for ~40 minutes before hitting rate limit
  • sleeps for ~20 minutes after printing "rate limit reached" message once
  • encounters rate limit message again, but apparently doesn't sleep this time
  • finishes about 2 minutes later

A failed run:

Fri, 24 Nov 2023 09:20:27 GMT  - Report output will be saved to: /home/runner/work/circuitpython-org/circuitpython-org/bin/adabot/libraries.v2.json
Fri, 24 Nov 2023 10:06:30 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 10:20:39 GMT GitHub API Rate Limit reached. Pausing until Rate Limit reset.
Fri, 24 Nov 2023 10:20:39 GMT Rate Limit will reset at: 2023-11-24 10:20:39
Fri, 24 Nov 2023 15:20:09 GMT Error: The operation was canceled.

here,

  • runs for about 45 minutes
  • sleeps for about 15 minutes
  • apparently goes to sleep again for a long time, or never hits rate limit again
  • gets timed out 5 hours later

Besides adding debugging another idea is to prepend the command with timeout so that we can hopefully get a Python traceback from where the process is stuck, something like timeout -s INT 18000 [adabot command]. this would be a change in the circuitpython-org repo, not here, I think.

@jepler
Copy link
Member

jepler commented Nov 26, 2023

also we could sleep +60 seconds after we think the rate limit will reset, not +1 second; since we hit the rate limit after 40-45 minutes and it resets after 1 hour, this doesn't really lower our throughput much.

@dhalbert
Copy link
Contributor

Can we insert delays between our requests? Would that throttle it enough to avoid the rate limit?

jepler added a commit to jepler/adabot that referenced this issue Nov 27, 2023
)

The "core rate limit reset" time is in UTC, while github actions
may be running in another timezone, such as PST. The below example
was run in my local timezone, CST:

```python
>>> import os, datetime
>>> import github as pygithub
>>> GH_INTERFACE = pygithub.Github(os.environ.get("ADABOT_GITHUB_ACCESS_TOKEN"))
>>> GH_INTERFACE.get_rate_limit().core.reset - datetime.datetime.now()
datetime.timedelta(seconds=24750, microseconds=356986)
>>> GH_INTERFACE.get_rate_limit().core.reset - datetime.datetime.utcnow()
datetime.timedelta(seconds=3147, microseconds=87271)
```

Use utcnow() instead of now() so that the sleep is for the correct
duration.

(utcnow is deprecated, but the correct alternative, `now(tz=datetime.timezone.utc)`
does not work: `GH_INTERFACE.get_rate_limit().core.reset` is a naive
timestamp and can't be compared with a tz-aware object)
@jepler
Copy link
Member

jepler commented Nov 27, 2023

I think I found the real problem 🎉

@dhalbert dhalbert linked a pull request Nov 28, 2023 that will close this issue
@dhalbert
Copy link
Contributor

Fixed by #362.

@dhalbert
Copy link
Contributor

I saw another failed 6-hour run, despite circuitpython-org being updated to include #362. 😕
https://github.com/adafruit/circuitpython-org/actions/runs/7030445965
Reopening
@jepler

@dhalbert dhalbert reopened this Nov 29, 2023
@jepler
Copy link
Member

jepler commented Nov 30, 2023

oh drat

@tekktrik
Copy link
Member

tekktrik commented Jan 9, 2024

I have thought about moving part of the CI into libraries themselves somehow. The basic idea is to have repos trigger when updated (or scheduled) to push a record of the information gathered either to a central repo (like a folder in this repository, for example) or possibly an S3 bucket. This has the advantage of spreading out or eliminating API calls, as well as distributing the burden over multiple CI runs across the libraries. This would allow this CI to just collect them all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants