Skip to content

kojiishi/unicodedata-reader

Repository files navigation

CI PyPI Dependencies

unicodedata-reader

This package reads and parses the Unicode Character Database files.

Many of them are available in the unicodedata module, or in other 3rd party modules. When the desired data is not in any existing modules, such as the Line_Break property or the Vertical_Orientation property, this package can read the data files at https://www.unicode.org/Public/UNIDATA/.

This package can also generate JavaScript functions that can read the property values of the Unicode Character Database in browsers. Please see the JavaScript section below.

Install

pip install unicodedata-reader

If you want to clone and install using poetry:

git clone https://github.com/kojiishi/unicodedata-reader
cd unicodedata-reader
poetry install
poetry shell

Python

import unicodedata_reader

reader = unicodedata_reader.UnicodeDataReader.default
lb = reader.line_break()
print(lb[0x41])

The example above prints AL, the Line_Break property value for U+0041. Please also see line_break_test.py for more usages.

JavaScript

The UnicodeDataCompressor class in this package can generate JavaScript functions that can read the property values of the Unicode Character Database in browsers.

Following examples are available in the "js" directory:

The following command generates a JavaScript file for the Line_Break property using js/template.js as the template file:

unicodedata-reader lb -t js/template.js