Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore SSL Certificate Errors? #4

Open
webitsanu opened this issue Jan 23, 2024 · 1 comment
Open

Ignore SSL Certificate Errors? #4

webitsanu opened this issue Jan 23, 2024 · 1 comment

Comments

@webitsanu
Copy link

I was wondering if it might be possible to introduce a checkbox / option to ignore SSL Certificate Errors? I am trying to crawl a website that has a problem with its certificates and so the crawler is failing. It is nice to know the certificates are problematic, but would be amazing if the crawl could continue.

Please find the errors below:

File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/venv/lib/python3.11/site-packages/urllib3/connection.py", line 642, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/connection.py", line 782, in ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/util/ssl
.py", line 470, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/urllib3/util/ssl
.py", line 514, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/ssl.py", line 1075, in _create
self.do_handshake()
File "/usr/lib/python3.11/ssl.py", line 1346, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

I know chrome has an --ignore-certificate-errors flag, but I think there is a bit more to it. Seems like urllib3 and requests are also checjing certificates.

By the way, fantastic work on the app. It works brilliantly.

@biolds
Copy link
Owner

biolds commented Jan 23, 2024

Thanks for the feature request! Indeed it seems like good idea to have that, i'll look into it.

You may be seeing an urllib traceback even though you are crawling with Chrome because urllib is used to check the robot.txt of the site. If that's the case you could ignore the robot.txt by updating the domain settings in http://x.x.x.x/admin/se/domainsetting/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants