Skip to content

Commit

Permalink
Allow leading spaces in the stopwords configuration (#373)
Browse files Browse the repository at this point in the history
The stopwords.txt does not allow leading spaces, but the registry
pads it up because of the way we define it in the xml.
  • Loading branch information
reebalazs authored Feb 6, 2024
1 parent adf4fcc commit 9c9b251
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 2 deletions.
2 changes: 1 addition & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Changelog
9.2.2 (unreleased)
------------------

- Nothing changed yet.
- Allow leading spaces in the stopwords configuration [reebalazs]


9.2.1 (2024-02-01)
Expand Down
2 changes: 1 addition & 1 deletion src/collective/solr/stopword.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from collective.solr.utils import getConfig

reLine = re.compile(r"^([A-Za-zÀ-ÖØ-öø-ÿ]*)")
reLine = re.compile(r"^\s*([A-Za-zÀ-ÖØ-öø-ÿ]*)")

raw = None
raw_case_insensitive = None
Expand Down
16 changes: 16 additions & 0 deletions src/collective/solr/tests/test_stopwords.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,3 +130,19 @@ def testComments(self):
self.assertTrue(isStopWord("stopone", self.config))
self.assertTrue(isStopWord("stoptwo", self.config))
self.assertTrue(isStopWord("stopthree", self.config))

def testLeadingSpaces(self):
# stopwords.txt does not allow leading spaces, but the registry
# pads it up because of the way we define it in the xml.
self.config.stopwords = (
"""
stopone
stoptwo
"""
+ " \n"
)
self.assertFalse(isStopWord("", self.config))
self.assertFalse(isStopWord(" ", self.config))
self.assertFalse(isStopWord(" ", self.config))
self.assertTrue(isStopWord("stopone", self.config))
self.assertTrue(isStopWord("stoptwo", self.config))

0 comments on commit 9c9b251

Please sign in to comment.