Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent wildcard expressions for stopwords in simple expressions #368

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

reebalazs
Copy link
Member

  • add stopwords to registry
  • support function for getting the stopwords
  • cache to optimize the parsing of stopwords.txt

This transforms the (term AND term*) expression for stopwords, removing the wildcard expression. Such an expression would never match any documents, because solr won't remove the wildcard term, but the stopword will be missing from the index. This workaround does that with no side effects, as stopwords would be ignored by solr anyway.

@reebalazs reebalazs force-pushed the ree-fix-stopwords-with-wildcards branch 3 times, most recently from 84bcb30 to 568da9a Compare December 28, 2023 15:46
@reebalazs reebalazs changed the title WIP Prevent wildcard expressions for stopwords in simple expressions Prevent wildcard expressions for stopwords in simple expressions Dec 28, 2023
- add stopwords to registry
- add stopwords_case_insensitive option
- support function for getting the stopwords
- cache to optimize the parsing of stopwords.txt

This transforms the (term AND term*) expression for stopwords, removing
the wildcard expression. Such an expression would never match any
documents, because solr won't remove the wildcard term, but the
stopword will be missing from the index. This workaround does that with
no side effects, as stopwords would be ignored by solr anyway.

Both case sensitive and case insensitive stopword processing is
supported, this depends on the solr schema, and must be set accordingly.
@reebalazs reebalazs force-pushed the ree-fix-stopwords-with-wildcards branch from 568da9a to 097a3bf Compare December 28, 2023 16:12
@tisto tisto merged commit b6c013d into main Jan 25, 2024
14 checks passed
@tisto tisto deleted the ree-fix-stopwords-with-wildcards branch January 25, 2024 07:31
@tisto
Copy link
Member

tisto commented Jan 31, 2024

@reebalazs released with c.solr. 9.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants