Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: test against false positives #350

Closed
starius opened this issue Apr 4, 2024 · 2 comments
Closed

Suggestion: test against false positives #350

starius opened this issue Apr 4, 2024 · 2 comments

Comments

@starius
Copy link
Contributor

starius commented Apr 4, 2024

Context

I use the package to distinguish crawlers from human users in HTTP server. The logic is to prevent crawlers from "spoiling" one time links shared in Discord and similar chats which request all the links sent to chats to make preview. Because the link is one-time, the request from the crawler uses it and it does not open when human user opens it. I solved this by blocking access from crawlers to such links. If you need more details, please see starius/pasta#8

Danger of false positives

If some legit browser sends User Agent which accidentally matches one of patterns, the user won't be able to access the link, because the site will treat this request as originated by a crawler.

I guess, other uses of this package will also benefit if false positives are minimized.

Proposed solution

Let's add a test to CI which runs most common User Agents through the patterns and fails if any of them matches.
The list of patterns can be loaded from here: https://github.com/microlinkhq/top-user-agents/tree/master/src
If somebody adds a pattern which matches any of them, it will be early detected and prevented.
Also if some popular browser starts using some User Agent accidentally matching one of patterns, this will also trigger the test failure.

@monperrus
Copy link
Owner

Excellent idea! Looking forward to the PR.

starius added a commit to starius/crawler-user-agents that referenced this issue Apr 5, 2024
@monperrus
Copy link
Owner

closed by #348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants