Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search, xkcd: built-in search plugin has external replacement #2613

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dgw
Copy link
Member

@dgw dgw commented Aug 13, 2024

Description

The built-in search has been unreliable, often yielding no results because it probably gets blocked from scraping Bing or DuckDuckGo SERPs. (This mechanism isn't entirely clear, as it still does work from some machines—including places like Gitpod cloud development servers that presumably live in a higher-profile public cloud than any of the bots with problems.)

However, a near-scratch rewrite at sopel-irc/sopel-search using a library to access DDG seems to be significantly more reliable, and lets us drop xmltodict as a core dependency; search.py was the last place using it.

This patch also removes sopel.builtins.search.duck_search() as a fallback from the xkcd plugin, for better or worse putting all of its proverbial eggs in the searchxkcd.com basket. (I reworked the error-handling there a little, too.)

Checklist

  • I have read CONTRIBUTING.md
  • I can and do license this contribution under the EFLv2
  • No issues are reported by make qa (runs make lint and make test)
  • I have tested the functionality of the things this change touches

Notes

I see no problem with doing this in 8.1, but am open (as always) to other opinions 😸

@dgw dgw added Low Priority Housekeeping Code cleanup, removal of deprecated stuff, etc. labels Aug 13, 2024
@dgw dgw added this to the 8.1.0 milestone Aug 13, 2024
@dgw dgw changed the title search: remove built-in plugin with external replacement search, xkcd: built-in search plugin has external replacement Aug 13, 2024
@dgw dgw mentioned this pull request Aug 13, 2024
4 tasks
@dgw
Copy link
Member Author

dgw commented Aug 13, 2024

Staying in draft until type-checking errors are fixed, which should be accomplished by #2614.

@dgw dgw linked an issue Aug 13, 2024 that may be closed by this pull request
The built-in search plugin is pretty flaky, so it would be dubiously
useful at best even if searchxkcd stops working.

Plus, I'm working on replacing the built-in search plugin entirely with
a more reliable external plugin built on a real library (instead of
hacky regex-based web scraping).
Also lets us drop `xmltodict` as a core dependency; `search.py` was the
last place using it.
@dgw
Copy link
Member Author

dgw commented Aug 31, 2024

Rebased onto master after merging #2614. Finally ready to review 😁

@dgw dgw marked this pull request as ready for review August 31, 2024 20:55
@dgw dgw requested a review from a team August 31, 2024 20:55
Copy link
Contributor

@SnoopJ SnoopJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on mobile so my review is going to be lower quality, but this LGTM

"""Generic exception to raise if there was a problem contacting the API."""


class SearchFailedError(SearchXkcdError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: That seems a little specialized to me, I would probably have used the base exception to cover this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it needs a better name? The idea was to shadow requests.exceptions.ConnectionError with a plugin-specific error type—note where SearchFailedError is used below. It's an attempt to differentiate between errors of the "I talked to the API but couldn't understand what it said back" kind and more serious "the API might be gone???" issues.

I might look at this again when I go through to fix up things like your logging comment and reevaluate the exception hierarchy, even though you did say "LGTM".

LOGGER.debug("Data format from searchxkcd API could not be understood.")
return None
msg = "Data format from searchxkcd API could not be understood."
LOGGER.debug(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I would have expected a warning here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair I often err on the side of "don't spew stuff into logs if the user isn't actively debugging", but this one doesn't carry a big dump of stuff with it. What do you think of a warning with the user-friendly error message here and then a debug logging call with the API response?

return
except (ResponseFormatError, SearchFailedError, SearchXkcdError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Why catch the subclasses here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna say it's because I did this at some ungodly wee hour and forgot that they were subclasses despite just creating the new exception types mere moments earlier. Possibly also related to the fact that I started out creating an Enum type for the search function to return on errors, before deciding exceptions were better.

tl;dr: I dunno, weird brain stuff. 😹

sopel/builtins/xkcd.py Show resolved Hide resolved
Copy link
Member Author

@dgw dgw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how SnoopJ makes an excuse (being on mobile) for the quality of his review, then makes more comments than most reviewers I work with. 😂

return
except (ResponseFormatError, SearchFailedError, SearchXkcdError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna say it's because I did this at some ungodly wee hour and forgot that they were subclasses despite just creating the new exception types mere moments earlier. Possibly also related to the fact that I started out creating an Enum type for the search function to return on errors, before deciding exceptions were better.

tl;dr: I dunno, weird brain stuff. 😹

"""Generic exception to raise if there was a problem contacting the API."""


class SearchFailedError(SearchXkcdError):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it needs a better name? The idea was to shadow requests.exceptions.ConnectionError with a plugin-specific error type—note where SearchFailedError is used below. It's an attempt to differentiate between errors of the "I talked to the API but couldn't understand what it said back" kind and more serious "the API might be gone???" issues.

I might look at this again when I go through to fix up things like your logging comment and reevaluate the exception hierarchy, even though you did say "LGTM".

LOGGER.debug("Data format from searchxkcd API could not be understood.")
return None
msg = "Data format from searchxkcd API could not be understood."
LOGGER.debug(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair I often err on the side of "don't spew stuff into logs if the user isn't actively debugging", but this one doesn't carry a big dump of stuff with it. What do you think of a warning with the user-friendly error message here and then a debug logging call with the API response?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping Code cleanup, removal of deprecated stuff, etc. Low Priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

search plugin should become external for 8.1
2 participants