TranslitKit is a framework for Hebrew-English transliteration.
gem install translit_kit
# in your Gemfile
gem 'translit_kit'
Requires Ruby 2.2 or later
Basic transliteration
require 'translit_kit'
word = HebrewWord.new "אַברָהָם"
word.transliterate(:single)
# => ["avrohom"]
# Shortcut
word.t(:single)
# => ["avrohom"]
Transliteration is powered by phoneme maps, files that map between Hebrew phonemes, or units of sound, and English characters. (see below)
Three phoneme_maps
are provided: :long
, :short
, and :single
.
You can easily add your own (see below)
word.t(:single)
# => ["avrohom"]
word.t(:short)
# => ["avroom", "avroam", "avroem", "avrohom", "avroham",
# "avrohem", "avraom", "avraam", "avraem", "avrahom",
# "avraham", "avrahem", "avreom", "avream", "avreem",
# "avrehom", "avreham", "avrehem" ]
word.t(:long)
# => ["avroom", "avrooom", "avroohm", ... ] # 5,997 more!
The default is :short
:
word.t == word.t(:short)
# => true
To get the total permutation count, call HebrewWord#inspect
word.inspect
# => "אַברָהָם: Permutations: 1 single | 18 short | 6000 long"
Phoneme Maps are simply JSON files, placed in the lib/phoneme_maps
directory.
The file should map between each String
(the phonemes) and an Array
s of replacement characters.
{
"ב": ["v"],
"בּ": ["b", "bb"]
}
A phoneme can be a Hebrew character א
, nekuda (ָ
), or character with modifiers, such as a dagesh (בּ
). Keep in mind that many characters will be normalized (see below).
To install your custom map, place the file in lib/resources
Your file will be available as the symbol:<filename>
without the .json
extension.
Example: klingon.json
becomes :klingon
Now you can use it anywhere:
word.transliterate(:klingon)
# => (Results)
At present, your map will not display results in HebrewWord#inspect
TranslitKit
is currently maintained by @AnalyzePlatypus.
Contributions welcome!
When a word is transliterated, it is pre-processed to normalize certain characters. Specifically:
- Whitespace is stripped
- The final letters
[םןךףץ]
are normalized to their standard forms - CHATAF nekudos
['ֲ','ֳ','ֱ']
are normalized to their standard forms - Full CHIRIK, TZEIREI, and CHOLOM nekudos have their letters removed
- DAGESH characters are removed from all but the characters
[בוכפת]