Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Make PyStemmer optional #309

Closed
wants to merge 4 commits into from
Closed

chore: Make PyStemmer optional #309

wants to merge 4 commits into from

Conversation

generall
Copy link
Member

No description provided.

@generall generall requested a review from joein July 23, 2024 21:02
@joein
Copy link
Member

joein commented Jul 26, 2024

Pystemmer is a C library wrapper which enhances the speed of the tokenizer

I ran the following benchmark with and without it:

import time
from snowballstemmer import stemmer

s = stemmer('english')
text = "This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer."
words = text.split()

loops = 1000
a = time.perf_counter()
for _ in range(loops):
    for word in words:
        stemmed = s.stemWord(word)
print(time.perf_counter() - a)

With pystemmer:
0.0221869999950286
Without pystemmer:
2.5555163340177387

The difference is noticeable, instead of dropping it, we can make it an optional dependency and allow to install it with pip install fastembed[pystemmer]
According to the users' reports, it crashes on windows during the installation, and not on the level of dependency resolution

@Anush008 Anush008 linked an issue Aug 7, 2024 that may be closed by this pull request
@Anush008 Anush008 changed the title remove PyStemmer and see what happens chore: Make PyStemmer optional Aug 7, 2024
@Anush008
Copy link
Member

Anush008 commented Aug 7, 2024

@joein. Review please. Several people are running into this issue.

@bendominguez0111
Copy link

Hello, any update on this? Our team is running into issues with PyStemmer and we'd like the option for it to be optional as well.

@satyaloka93
Copy link

Also would like to make this dependency optional,having issues with the building of PyStemmer.

@sadaisystems
Copy link

Same issue here. WSL2. Python 3.12.5. Unable to install the package.

@satyaloka93
Copy link

I've been installing version 0.2.7, which basically has the same dependencies as the newest version minus pystemmer. Then I install the new version with --no-deps, to avoid that package. It's been working fine, please remove that one requirement and make it an option!

@Anush008
Copy link
Member

@Anush008 Anush008 closed this Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not able to install fastembed in windows machine.
6 participants