Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete uploads database table after MetaCPAN outages #27

Open
eserte opened this issue Feb 21, 2024 · 11 comments
Open

Incomplete uploads database table after MetaCPAN outages #27

eserte opened this issue Feb 21, 2024 · 11 comments

Comments

@eserte
Copy link

eserte commented Feb 21, 2024

Looking at https://github.com/cpan-testers/cpantesters-backend/blob/master/lib/CPAN/Testers/Backend/FetchUploads.pm and

cpantesters-backend/Rexfile

Lines 126 to 135 in e8803f4

'fetch-uploads' => {
user => 'cpantesters',
minute => '*/10',
hour => '*',
day_of_month => '*',
month => '*',
day_of_week => '*',
command => 'beam run metacpan fetch_uploads'
. ' --since $( date --date="30 minutes ago" -Iseconds )'
. ' >>$HOME/var/log/fetch_uploads.log 2>&1',
it seems that freshly uploaded CPAN releases are missing permanently in the database if the MetaCPAN API is down or unreachable for about 20 to 30 minutes. Unfortunately this seems to happen now and then, see also metacpan/metacpan-web#2992 for a recent incident.

This in turn means that test reports for these missing CPAN releases are permanently lost, even if coming after the MetaCPAN outage. A prominent example is Net-SSLeay 1.94, which was released about six weeks ago, and still does not have any test reports listed, see http://matrix.cpantesters.org/?dist=Net-SSLeay (the website is currently showing "NOTE: no report for latest version 1.94").

So what can be done? As a quick fix, I think it would be good to fill the missing bits in the database. Probably running beam run metacpan fetch_uploads without the --since option could help. Maybe try first with increasing intervals (I don't know how the MetaCPAN API behaves if everything without filter is fetched). For the period starting from last Friday I would expect that about 70-80 entries would be added.

What to do as a long-term fix? I am not sure. Outages or network problems of all kinds may always happen. Maybe it would help if the --since period would be permanently increased (to one day? more?), but this would add more load to the MetaCPAN API and the local database. Maybe there could be a rarely running "repair" cronjob which uses a longer --since period. Maybe monitoring could be better (currently it seems that failures to connect to MetaCPAN are not logged at all).

It would also be nice if the possibly existing reports in the database could be repaired by reprocessing them after the uploads table was repaired.

FYI @jkeenan (James: this relates to the post "CPANtesters failing to report distribution name for Net-SSLeay" you wrote some weeks ago) and @andk.

@glasswalk3r
Copy link

it seems that freshly uploaded CPAN releases are missing permanently in the database if the MetaCPAN API is down or unreachable for about 20 to 30 minutes.

Is this not related to the report submission? If the API is down, there is no way the report could be submitted, right? If the testers has some mean to keep the report locally (like using metabase-relayd), the report could be submitted again later.

Or there is a part of the flow that not I'm aware of?

@eserte
Copy link
Author

eserte commented Feb 23, 2024

It seems that reports for any distribution which is not listed in the cpantesters database are just ignored. You can check http://metabase.cpantesters.org/tail/log.txt --- there are still about 100 out of 1000 lines which have just a [] where the distribution name should be. These reports are lost, and it does not help to wait and send later. Only inserting the missing distribution to the database would help.

@jkeenan
Copy link
Contributor

jkeenan commented Mar 20, 2024

It seems that reports for any distribution which is not listed in the cpantesters database are just ignored. You can check http://metabase.cpantesters.org/tail/log.txt --- there are still about 100 out of 1000 lines which have just a [] where the distribution name should be. These reports are lost, and it does not help to wait and send later. Only inserting the missing distribution to the database would help.

This problem persists. Today I installed perl-5.39.9 and tried to install ~ 500 CPAN modules against it. I can confirm that Net-SSLeay and MIME-Tools are two distributions where reports were generated, but logged at http://metabase.cpantesters.org/tail/log.txt without their distribution names. We have no recent CPANtesters data for recent releases of these two distros. See: http://fast-matrix.cpantesters.org/?dist=MIME-tools and http://fast-matrix.cpantesters.org/?dist=Net-SSLeay.

@eserte
Copy link
Author

eserte commented Mar 27, 2024

Any news on this? Is this a topic we can tackle at PTS 2024?

@preaction
Copy link
Member

Yes, @eserte, your summation is correct: If the MetaCPAN API is down 3 times in 30 minutes (once every 10 minutes), CPAN Testers will never get that data, and no reports can possibly be submitted for those distributions. And, also yes, I can manually run that job without the --since argument to rebuild that table from zero.

This shouldn't be possible, I think: It's not CPAN Testers's job to know what is uploaded. If we get a report for something, we should accept it, and we can later decide if we want to display it (once we've verified the report is for a module uploaded to CPAN by an authorized account). So, I think instead of failing if an upload record isn't found, I'll just insert a provisional record in that table (which I thought was the current behavior, but clearly not...)

While investigating this, I also have found that there aren't the UNIQUE constraints I would've expected, so there are multiple records for every dist/version... I'll deal with that presently as well.

@jkeenan
Copy link
Contributor

jkeenan commented Apr 26, 2024

@preaction, thanks for your investigation. Look forward to the results.

@preaction
Copy link
Member

preaction commented Apr 26, 2024

So... Holy mother-forking shirt-balls there are reports in here that have failed to process back to 2019 (which, if I recall correctly, was when I added the code to pull this uploads data from MetaCPAN). My script to repair the uploads data, and my other script to de-duplicate that same data, are running now. Once that is complete, I can put the failed jobs in the queue to run the processor again.

I'm still working on the fix to pre-populate the uploads data if it's missing, but that should be done before the summit is finished.

@preaction
Copy link
Member

The 125,000 missing reports are back in the queue (but will probably take quite a bit to chew through). I'm going to finish automated tests for the various fixes to the report processing... process, and then this shouldn't happen again (for this specific reason, at least).

@jkeenan
Copy link
Contributor

jkeenan commented Apr 27, 2024

So far, so good. I have been able to run reports for Net-SSLeay and MIME-tools against the newly released perl-5.39.10 and have the results reported:

http://fast-matrix.cpantesters.org/?dist=Net-SSLeay;perl=5.39.10;reports=1

http://fast-matrix.cpantesters.org/?dist=MIME-tools;perl=5.39.10;reports=1

@preaction
Copy link
Member

Excellent. I've re-prioritized the incoming job queue to put these older reports lower on the queue, so that once again, new reports are processed quickly (to get them on the regular matrix.cpantesters.org). I let a backlog accumulate, though, so we're still behind in processing and likely will be for a couple days.

@eserte
Copy link
Author

eserte commented May 19, 2024

Things look good now. I think this issue may be closed. What do you think, @preaction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants