-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should not visit pages that have already been visited #23
Comments
Also when I ran it with a memo, I got an error eventually
|
I think you should tune your OS for example for Linux |
As for the issue I have a plan to intro an option for |
What is the root cause for this? It seems to me that while opening a TCP Socket connection, ferrum opens a file but never closes it? Shouldn't this not happen since the number of pages being processed at once is at most the number of processors (unless overridden). |
Ferrum opens only one connection per page and closes it when page is processed releasing the page and connection. So something is wrong with the crawler most likely |
How can I make it not visit the same page multiple times?
How can I make it so that it doesn't visit any pages outside of the
domain
?The text was updated successfully, but these errors were encountered: