You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having an issue where Wayback Machine links breaks crawling on completely unrelated pages This page has links to two Wayback Machine links, this one and this one.
After crawling the page with those links, subsequent unrelated websites fail to be crawled with an error message pertaining to the previous two Wayback Machine links, despite the fact that the sites that the error occurs on are completely unrelated, and not even on the same domain. SOSSE also fails to cache them too.
Below are some screenshots showing how the error is unrelated to the failed crawled pages
The text was updated successfully, but these errors were encountered:
It seems the crawler has reached a broken state, due to a previously crawled page having bogus links (most likely the Wayback machine page indeed).
As a work-around, you could probably recrawl the wayback machine pages using Python Request instead of Chromium. As for the tilde.town they can most likely be recrawled as is after restarting the crawler.
Otherwise, I'll have a look tonight to fix the root of the issue.
I'm having an issue where Wayback Machine links breaks crawling on completely unrelated pages
This page has links to two Wayback Machine links, this one and this one.
After crawling the page with those links, subsequent unrelated websites fail to be crawled with an error message pertaining to the previous two Wayback Machine links, despite the fact that the sites that the error occurs on are completely unrelated, and not even on the same domain. SOSSE also fails to cache them too.
Below are some screenshots showing how the error is unrelated to the failed crawled pages
The text was updated successfully, but these errors were encountered: