-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
106 lines (71 loc) · 2.65 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Introduction
------------------------------
Specification:
Text character normalization process uses Python unicodedata.
Convert full-width numeric and alphabet character into half-width equivalent.
Convert half-width Katakana into full-width equivalent.
Therefore all of above character variations can be recognized as same ones.
Language Specifications:
- Chinese
- No space between words.
- There is only Kanji(Chinese) character
- Process with Bigram(2-gram) model
- Japanese
- No space between words
- Combination 0f Kanji(Chinese), Katakana, and Hiragana character
- Korean
- There are spaces between words, but it contains a particle
- Combination of Korean alphabet and Kanji(Chinese) character
- Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model
- Thai
- No space between words
- It's very difficult to handle this language in a computer
- A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.
- However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.
- Other languages (Including English)
- There is a space between words
- It is indexed each word
Notes:
- Source Code
Since no documents are available on how to develop 'word splitter', we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.
- Hotfix to Plone 3.0 source code
Because Plone 3.x catalog setting, catalog.xml, doesn't have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.
Installation
-----------------
Use zc.buildout
===============
- Add ``Products.BigramSplitter`` to the list of eggs to install, e.g.::
[buildout]
...
eggs =
...
Products.BigramSplitter
- Tell the plone.recipe.zope2instance recipe to install a ZCML slug::
[instance]
recipe = plone.recipe.zope2instance
...
zcml =
Products.BigramSplitter
- Re-run buildout, e.g. with::
$ ./bin/buildout
- Restart Zope
- Plone setting -- Add on products -- Quick install
Old Style
=========
- Untar downloaded file, then copy to 'Products' directory of your Plone instance.
- Restart Zope
- Plone setting -- Add on products -- Quick install
Required
--------
- Plone3.0.x or higher
License
--------
- See docs/LICENSE.txt
Author
------
- CMScom http://www.cmscom.jp/
- Manabu Terada e-mail : [email protected]
- Mikio Hokari
- Naoki Nakanishi
- Naotaka Hotta
- Takashi Nagai