Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Python usage harness #368

Merged
merged 4 commits into from
Sep 6, 2024
Merged

Improve Python usage harness #368

merged 4 commits into from
Sep 6, 2024

Conversation

jribbens
Copy link
Contributor

@jribbens jribbens commented Sep 5, 2024

The current Python usage harness:

  • is very, very, very slow;
  • matches the patterns case-insensitively, which is not what the other languages' harnesses do, and is not how the patterns appear to be intended to be used;
  • is missing the matching_crawlers() function that the Go harness provides.

This patch fixes all these issues. In some simple tests, the updated is_crawler() function is well over a hundred times faster than the current version.

@monperrus
Copy link
Owner

Thanks a lot for the contribution. Would you also be able to contribute with a test case run in CI? That would be awesome!

@jribbens
Copy link
Contributor Author

jribbens commented Sep 5, 2024

Ok I've added some simple tests, updated pyproject.toml a bit (dev dependencies are now in it rather than a separate requirements.txt, and it defines a project homepage), and changed the CI workflow so it uses pip install -e .[dev] so that the test file can import crawleruseragents in order to test it.

@jribbens
Copy link
Contributor Author

jribbens commented Sep 5, 2024

If you're interested, I also reworked the Python harness to be completely type-annotated and other fixes so it can be fully type-checked and linted with 'mypy --strict', 'ruff check' and 'ruff format': jribbens/crawler-user-agents@master...jribbens:crawler-user-agents:full-typing

This involves considerably more changes to the code though, of course, and it means it's only compatible with Python 3.8 or later (not sure what version you were targeting previously).

@monperrus monperrus merged commit e7f72bc into monperrus:master Sep 6, 2024
1 check passed
@monperrus
Copy link
Owner

thanks a lot @jribbens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants