Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange effect of excluding stop words #101

Open
charlesreid1 opened this issue Aug 27, 2018 · 0 comments
Open

Strange effect of excluding stop words #101

charlesreid1 opened this issue Aug 27, 2018 · 0 comments

Comments

@charlesreid1
Copy link
Contributor

charlesreid1 commented Aug 27, 2018

We are excluding stop words using a stop word filter right after we apply the stemming analyzer.

However, now the items in the index do not contain any stop words.

That can be problematic, as illustrated by the following search:

  • We're trying to find "K6 Standardization and Whitelist"

  • This document contains the sentence "Question: If you want to know where someone is in the process of onboarding/whitelisting?" and we want to try and find that phrase.

  • The first approach is to search for a subset of the phrase, like where someone is. However, this search does not return anything. The reason? The stopword "is" was removed from that item in the search index, and when we search for the phrase "where someone is" we're actually asking for documents that contain "where" and documents that contain "someone" and documents that contain "is".

  • The second approach (which works) is to put the phrase in quotes; "where someone is" and that will return the document as expected.

What we need to do is, if we filter the contents for stop words, we need to also filter the search queries for stop words (unless it's an exact phrase).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant