Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purl2vcs: out of memory error #489

Open
JonoYang opened this issue Jul 2, 2024 · 0 comments
Open

purl2vcs: out of memory error #489

JonoYang opened this issue Jul 2, 2024 · 0 comments
Assignees

Comments

@JonoYang
Copy link
Contributor

JonoYang commented Jul 2, 2024

The purldb webserver experienced an error that killed the gunicorn worker handling this request:

web-1  | [2024-07-02 21:31:42 +0000] [9] [CRITICAL] WORKER TIMEOUT (pid:10)
web-1  | [2024-07-02 21:31:42 +0000] [10] [ERROR] Error handling request /api/collect/index_packages/
web-1  | Traceback (most recent call last):
web-1  |   File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 135, in handle
web-1  |     self.handle_request(listener, req, client, addr)
web-1  |   File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/sync.py", line 178, in handle_request
web-1  |     respiter = self.wsgi(environ, resp.start_response)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/wsgi.py", line 124, in __call__
web-1  |     response = self.get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 140, in get_response
web-1  |     response = self._middleware_chain(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/utils/deprecation.py", line 134, in __call__
web-1  |     response = response or self.get_response(request)
web-1  |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
web-1  |     response = get_response(request)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
web-1  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/contextlib.py", line 81, in inner
web-1  |     return func(*args, **kwds)
web-1  |            ^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 65, in _view_wrapper
web-1  |     return view_func(request, *args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/viewsets.py", line 124, in view
web-1  |     return self.dispatch(request, *args, **kwargs)
web-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch
web-1  |     response = handler(request, *args, **kwargs)
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/app/packagedb/api.py", line 973, in index_packages
web-1  |     get_source_package_and_add_to_package_set(package)
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 141, in get_source_package_and_add_to_package_set
web-1  |     source_purl = get_source_repo(package=package)
web-1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 198, in get_source_repo
web-1  |     repo_urls = list(get_repo_urls(package))
web-1  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 225, in get_repo_urls
web-1  |     source_urls = get_source_urls_from_package_data_and_resources(
web-1  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 244, in get_source_urls_from_package_data_and_resources
web-1  |     metadata_urls = list(get_urls_from_package_data(package))
web-1  |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 345, in get_urls_from_package_data
web-1  |     found_urls.extend(get_urls_from_text(text=homepage_text))
web-1  |   File "/usr/local/lib/python3.11/site-packages/purl2vcs/find_source_repo.py", line 36, in get_urls_from_text
web-1  |     for url in get_urls_from_location(location=lines)["urls"]:
web-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/scancode/api.py", line 134, in get_urls
web-1  |     for urls, line_num in found_urls:
web-1  |   File "/usr/local/lib/python3.11/site-packages/scancode/api.py", line 130, in <genexpr>
web-1  |     found_urls = ((u, ln) for (u, ln) in find_urls(location) if u)
web-1  |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 257, in find_urls
web-1  |     for _key, url, _line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 78, in unique_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 576, in junk_urls_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 553, in junk_url_hosts_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 425, in canonical_url_cleaner
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 108, in re_filt
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 360, in user_pass_cleaning_filter
web-1  |     for key, match, line, line_number in matches:
web-1  |   File "/usr/local/lib/python3.11/site-packages/cluecode/finder.py", line 336, in scheme_adder
web-1  |     yield key, match, line, line_number
web-1  |   File "/usr/local/lib/python3.11/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
web-1  |     sys.exit(1)
web-1  | SystemExit: 1
web-1  | [2024-07-02 21:31:42 +0000] [10] [INFO] Worker exiting (pid: 10)
web-1  | [2024-07-02 21:31:43 +0000] [9] [ERROR] Worker (pid:10) was sent SIGKILL! Perhaps out of memory?

My initial guess is that there may be a regex explosion happening when parsing urls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants