diff --git a/test/apostrophes.html b/test/apostrophes.html index 4957b88d..5ffa4816 100644 --- a/test/apostrophes.html +++ b/test/apostrophes.html @@ -77,6 +77,10 @@ Martin Porter, "The apostrophe character," http://snowball.tartarus.org/texts/apostrophe.html
This additional paragraph contains a couple of other test items for the tokenizer, including a word containing a combining diaeresis (U+0308, for issue #164). For this we have invented a word which must use a combining diacritic rather than a precomposed character because the precomposed character does not exist; this helps to avoid inadvertent composition. The word puc̈ist is made up for this purpose, along with the word puq̈uist.
+We also add here a test for the character U+A78F, the Latin Sinological Dot, which + is supposed to be a letter character, so should not cause a word to be broken during + tokenization. This is the teꞏst.
+