Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't write GND-lastSuccessfulUpdate.txt and send alert mail if exceptions were caught #338

Closed
fsteeg opened this issue Dec 20, 2022 · 3 comments
Assignees

Comments

@fsteeg
Copy link
Member

fsteeg commented Dec 20, 2022

Since we don't abort our transformations on errors any more, but skip them instead, we have to make sure that the GND-lastSuccessfulUpdate.txt is written only if there were no exceptions, in order for the updates to be retried the next day (we noticed this due to a report about missing updates, seems it was a temporary 502 from the OAI-PMH proxy, worked when re-running the updates now). The existing retry-and-mail logic doesn't seem to work, probably due to the exception handling (not thrown, no retry, no mail?).

@fsteeg fsteeg self-assigned this Dec 20, 2022
@fsteeg fsteeg changed the title Don't write GND-lastSuccessfulUpdate.txt if exceptions were caught Don't write GND-lastSuccessfulUpdate.txt and send alert mail if exceptions were caught Dec 20, 2022
@fsteeg
Copy link
Member Author

fsteeg commented Dec 21, 2022

Since we don't abort our transformations on errors any more, but skip them instead [...]

It seems this is not actually related. I was thinking of ed7e70a, but the thing I saw in the log [1] seems unrelated, since it's happening before running the Metafacture workflow, in ConvertUpdates:

java.io.IOException: Server returned HTTP response code: 502 for URL: http://gnd-proxy.lobid.org/oai/repository?verb=ListRecords&resumptionToken=hhGHeNpNjMsJgDAQBStxOxD2JdkkexcLsANFwYAi-AHL14iIcxsYBj4A3hoBK4qmqs95ovbYx2VNexo2qgkqTLg79sZx5jG4GDRDxb_HO4zhC_NeIhQOQugRQ2e5tIZAYlkvk6YfqQ
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1944)
	at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1939)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1938)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1508)
	at sun.net.www.protocol.http.HttpURLConnection.access$200(HttpURLConnection.java:91)
	at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1484)
	at sun.net.www.protocol.http.HttpURLConnection$9.run(HttpURLConnection.java:1482)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.security.AccessController.doPrivilegedWithCombiner(AccessController.java:782)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1481)
	at ORG.oclc.oai.harvester2.verb.HarvesterVerb.harvest(HarvesterVerb.java:249)
	at ORG.oclc.oai.harvester2.verb.HarvesterVerb.<init>(HarvesterVerb.java:183)
	at ORG.oclc.oai.harvester2.verb.ListRecords.<init>(ListRecords.java:67)
	at ORG.oclc.oai.harvester2.app.RawWrite.run(RawWrite.java:141)
	at apps.ConvertUpdates.process(ConvertUpdates.java:129)

Also, this happened for the 2022-12-15 update, so it's unrelated to the problem that triggered this issue (which happened on 2022-12-19). For this 2022-12-15 problem, there seems to be some issue with the dates used when retrying:

Calling OAI-PMH at http://gnd-proxy.lobid.org/oai/repository from 2022-12-16 until 2022-12-16
Found errors
[error: null]
Error record: <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2022-12-16T00:00:28Z</responseDate><request from="2022-12-16" metadataPrefix="RDFxml" set="authorities" until="2022-12-16" verb="ListRecords">https://services.dnb.de/oai/repository</request><error code="noRecordsMatch"/></OAI-PMH>
Tried 1 times to get the data, but it's empty.
Going to retry in 240 min
Calling OAI-PMH at http://gnd-proxy.lobid.org/oai/repository from 2022-12-15 until 2022-12-15

It seems like after an error when getting 2022-12-16, we're getting 2022-12-15 to recover. Looking into the issue on 2022-12-19, there also seems to be a problem related to the date, though different (calling from 2022-12-20 until 2022-12-19):

Calling OAI-PMH at http://gnd-proxy.lobid.org/oai/repository from 2022-12-20 until 2022-12-19
resumptionToken: 
Found errors
[error: null]
Error record: <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2022-12-21T00:00:56Z</responseDate><request from="2022-12-20" metadataPrefix="RDFxml" set="authorities" until="2022-12-19" verb="ListRecords">https://services.dnb.de/oai/repository</request><error code="noRecordsMatch">The value of argument until '2022-12-19T23:59:59Z' lies before argument from '2022-12-20T00:00:00Z'</error></OAI-PMH>
Calling OAI-PMH at http://gnd-proxy.lobid.org/oai/repository from 2022-12-21 until 2022-12-21

So I think these are the tasks here:

  • Manually restore missing data based on logged errors, e.g. http://test.lobid.org/gnd/search?q=describedBy.dateModified:2022-12-15
  • Understand why the retry-alert logic didn't catch the issues (the one on 2022-12-15 seems to have slipped because the retry did a different day, which then worked? The invalid arguments on 2022-12-19 seems to have triggered no retry at all)
  • Fix the actual date selection / GND-lastSuccessfulUpdate.txt issue, which all of this seems to be related to

[1] Looking at output of e.g. quaoar1:~/git/lobid-gnd$ tail -n 30000 logs/cron.sh.log | grep -B 50 -A 50 2022-12-19

@fsteeg
Copy link
Member Author

fsteeg commented Dec 21, 2022

Looks like these were the days with errors during updates since 2022-10-13 (the data state of the latest baseline, see https://data.dnb.de/opendata/, "Stand:"), collected by looking at the output of quaoar1:~/git/lobid-gnd$ tail -n 3000000 logs/cron.sh.log | grep "HTTP response code\|Calling OAI-PMH":

@fsteeg
Copy link
Member Author

fsteeg commented Dec 22, 2022

So it seems the interval setting for multi-day updates was broken since 6e554aa ("Tweak end date for updates to avoid redundant data", 2022-12-09), resulting in the end date not being incremented. Fixed that in 0d7602f.

It seems the result was that when a single day update failed for some other reason, the following multi-day update failed too, for this reason. After my recent manual updates, the GND-lastSuccessfulUpdate.txt was set to some date in the past (the end date of my last manual update), triggering this problem, resulting in missing updates again. I've deployed 0d7602f to quaoar1 and ran the multi-day update since 2022-12-20 to restore the data and test the fix, looks good:

The reason we did not get an alert for the second issue (the follow-up error, caused by the faulty end date) seems to be that we only retry and alert if we got no data at all, so if e.g. the first day of multi-day updates worked, but the second fails, we get no alert. Opened #339 to improve that.

Assigning @dr0i for review, won't open a PR, changes are only 07d7b3b & 0d7602f.

@fsteeg fsteeg assigned dr0i and unassigned fsteeg Dec 22, 2022
@dr0i dr0i closed this as completed in 42e695b Dec 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants