From 48e014b850aaf1edfa87cb861ba9c955d4501fc6 Mon Sep 17 00:00:00 2001
From: Olly Betts
Date: Thu, 19 Oct 2023 18:48:59 +1300
Subject: [PATCH] Add missing differences from Porter stemmer
---
algorithms/english/stemmer.tt | 36 ++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)
diff --git a/algorithms/english/stemmer.tt b/algorithms/english/stemmer.tt
index 9eedab8..b50e69f 100644
--- a/algorithms/english/stemmer.tt
+++ b/algorithms/english/stemmer.tt
@@ -50,19 +50,33 @@ But it is hardly surprising that after twenty years of use of the Porter
stemmer, certain improvements did suggest themselves, and a new algorithm
for English is therefore offered here. (It could be called the ‘Porter2’
stemmer to distinguish it from the Porter stemmer, from which it derives.)
-The changes are not so very extensive: (1) terminating y is changed to
-i rather less often, (2) suffix us does not lose its s, (3) a
-few additional suffixes are included for removal, including (4) suffix
-ly. In addition, a small list of exceptional forms is included. In
-December 2001 there were two further adjustments: (5) Steps 5a and 5b
+The changes are not so very extensive:
+
+
+
+- [In C Porter stemmer but not in paper]
+Extra rule in Step 2: logi -> log
+
- [In C Porter stemmer but not in paper]
+Step 2 rule: abli -> able replace by bli -> ble
+
- [In C Porter stemmer but not in paper]
+The algorithm leaves along strings of length 2 (so
+as and is not longer lose s.
+
- Terminating y is changed to i rather less often
+
- Suffix us does not lose its s
+
- A few additional suffixes are included for removal, including suffix
+ly
+
- A small list of exceptional forms is included
+
- [December 2001] Steps 5a and 5b
of the old Porter stemmer were combined into a single step. This means
-that undoubling final ll is not done with removal of final e. (6)
-In Step 3 ative is removed only when in region R2.
-(7)
-In July
-2005 a small adjustment was made (including a new step 0) to handle
+that undoubling final ll is not done with removal of final e
+
- [December 2001] In Step 3 ative is removed only when in region R2.
+
- [May 2005] commun added to exceptional forms
+
- [July 2005] A small adjustment was made (including a new step 0) to handle
apostrophe.
-
+
- [January 2006] "Words" ied and ies now stem to ie rather than i.
+
- [January 2006] The implementation was fixed to follow the algorithm as documented here and now always treats an initial y as a consonant.
+
- [November 2006] arsen added to exceptional forms
+
To begin with, here is the basic algorithm without reference to the