This package reads and parses the Unicode Character Database files.
Many of them are available in the unicodedata module, or in other 3rd party modules. When the desired data is not in any existing modules, such as the Line_Break property or the Vertical_Orientation property, this package can read the data files at https://www.unicode.org/Public/UNIDATA/.
This package can also generate JavaScript functions that can read the property values of the Unicode Character Database in browsers. Please see the JavaScript section below.
pip install unicodedata-reader
If you want to clone and install using poetry:
git clone https://github.com/kojiishi/unicodedata-reader
cd unicodedata-reader
poetry install
poetry shell
import unicodedata_reader
reader = unicodedata_reader.UnicodeDataReader.default
lb = reader.line_break()
print(lb[0x41])
The example above prints AL
,
the Line_Break property value for U+0041.
Please also see line_break_test.py for more usages.
The UnicodeDataCompressor
class in this package
can generate JavaScript functions that can read the property values
of the Unicode Character Database in browsers.
Following examples are available in the "js
" directory:
- GeneralCategory.js is a generated JavaScript file for the Unicode General_Category property.
- LineBreak.js is a generated JavaScript file for the Unicode Line_Break property.
- LineBreak.html for an example usage of LineBreak.js.
The following command generates a JavaScript file for the Line_Break property
using js/template.js
as the template file:
unicodedata-reader lb -t js/template.js