Skip to content

Commit

Permalink
Use ink boudning box
Browse files Browse the repository at this point in the history
This patch changes the algorithm to avoid adding spacing pairs if glyphs
will collide by computing glyph bounding boxes.

This change includes following changes:
* The language is now optional, because ink boudning box can discover
  the used language conventions of the fonts.
* By default, specifying the languge disables the automatic detection
  by ink bounding box.
* The dedicated Noto CJK support is removed, because the automatic
  detection can cover the Noto CJK fonts. Now the default builder can
  generate the same fonts as the previous dedicated builder.
* A few glyph pairs change for Meiryo. All changes are diagnosed and
  verified. Updated reference files.

API changes:
* `Config.use_ink_bounds` is added to enable/disable the automatic
  collision detection. The default value is `True`.
* `Config.skip_monospace_ascii` is added. This switch skips fonts whose
  ASCII characters are monospace. The default value is `True`.
* `Config.fullwidth_space` is added, separated from `cjk_middle` because
  its glyph bounding boxes are different.
* `Config.cjk_column_semicolon` was a typo, fixed to
  `cjk_colon_semicolon`.
* `Config.is_colon_semicolon_middle` was removed.
  • Loading branch information
kojiishi committed Jul 18, 2021
1 parent b1d3b29 commit bf68ee4
Show file tree
Hide file tree
Showing 22 changed files with 453 additions and 382 deletions.
119 changes: 71 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ pip install east-asian-spacing
### Clone and Install

If you may need to diagnose fonts or the code,
installing using [poetry] is recommended:
cloning and installing using [poetry] is recommended:
```sh
git clone https://github.com/kojiishi/east_asian_spacing
cd east_asian_spacing
Expand All @@ -84,7 +84,7 @@ You can run [unit tests] to verify your installation if needed.
[pipenv]: https://github.com/pypa/pipenv
[poetry]: https://github.com/python-poetry/poetry

## Adding the feature to your fonts
## Adding the features to your fonts

### Usage

Expand All @@ -95,26 +95,32 @@ east-asian-spacing -o build input-font-file
```
The `--help` option shows the full list of options.

### Languages
### Supported Fonts

Because the glyph for a code point may vary by languages,
different tables are desired for different languages.
Following fonts are tested on each release:
* [Noto CJK]
* Meiryo

In many cases, when the font supports multiple East Asian languages,
this tool can detect the languages automatically.
But it shows an error when it failed to detect.
You need to specify the [OpenType language system tag] of the font in that case.
Several more fonts were tested during the development.
The [algorithm] is generic and is applicable to any fonts.
The [chws_tool] package extends this package and
covers fonts at [fonts.google.com].

The following example specifies that the font is a Japanese font.
```sh
east-asian-spacing --language=JAN input-font-file
```
The [test HTML] is a handy tool to check the fonts you build.
If you encounter any problems with your fonts,
please report to [issues].

[OpenType language system tag]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags
Please also see the [Advanced Topics] below
if you want to customize the default behaviors of this package.

[chws_tool]: https://github.com/googlefonts/chws_tool
[fonts.google.com]: https://fonts.google.com
[issues]: https://github.com/kojiishi/east_asian_spacing/issues
[Noto CJK]: https://www.google.com/get/noto/help/cjk/

### TrueType Collection (TTC)

When the `input-font-file` is a TrueType Collection (TTC),
When the input font file is a TrueType Collection (TTC),
this tool adds the feature to all fonts in the TTC by default.

If you want to add the feature to only some of fonts in the TTC,
Expand All @@ -125,7 +131,45 @@ but not to other fonts in the TTC.
east-asian-spacing --index=0,1 input-font-file.ttc
```

The language option applies to all fonts in the TTC by default.
## Advanced Topics
[Advanced Topics]: #advanced-topics

### Algorithm
[Algorithm]: #algorithm

This package determines the glyph pairs to adjust spacings
by a pre-defined set of Unicode code points.

Then for each pair, it computes if the spacings are applicable
by examining glyph outlines and computing ink bounding boxes of glyphs.
For example, when glyphs are very thick,
glyphs may not have enough internal spacings,
and applying the spacings may cause glyphs to collide.
This package automatically detects such cases and
stops applying spacings to such pairs.

This automatic behavior can be disabled
by specifying the [languages] below,
or by setting `Config.use_ink_bounds` to `False` in your Python program.

### Languages
[languages]: #languages

Because the glyph for a code point may vary by languages,
different tables are desired for different languages.
This package determines such differences by examining glyph outlines
as described in the [algorithm] section above.

Instead of using glyph outlines,
you can specify the [OpenType language system tag] of the font.
The following example specifies that the font is a Japanese font,
and disables the automatic determination by glyph outlines.
```sh
east-asian-spacing --language=JAN input-font-file
```

For TrueType Collections (TTC),
the language option applies to all fonts in the TTC by default.
When you want to specify different languages to each font in the TTC,
it accepts a comma-separated list.
The following example specifies
Expand All @@ -145,27 +189,7 @@ Other fonts in the TTC are not changed.
east-asian-spacing --index=2,3 --language=JAN,ZHS input-font-file.ttc
```

### Noto CJK

For [Noto CJK] fonts,
this tool has a built-in support
to determine the font indices and the languages automatically.

When the first argument is `noto`, it
a) computes the appropriate language for each font, and
b) skips `Mono` fonts,
both determined by the font name.
```sh
east-asian-spacing noto NotoSansCJK.ttc
```
You can also run it for a directory to find all font files recursively.
```sh
east-asian-spacing noto ~/googlefonts/noto-cjk
```

[Noto CJK]: https://www.google.com/get/noto/help/cjk/

## Advanced Topics
[OpenType language system tag]: https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags

### Character-Pairs

Expand All @@ -174,23 +198,19 @@ in cases such as when
your fonts may not have expected spacings for some characters.
Currently, this is possible only from Python programs.

The [chws_tool] is a project that
extends the [`Config` class] to provide default languages and
customized character-pairs for fonts at [fonts.google.com].

For a simpler example, please see the `test_config` function
For a simple example, please see the `test_config` function
in [`tests/config_test.py`].

[chws_tool]: https://github.com/googlefonts/chws_tool
[fonts.google.com]: https://fonts.google.com
The [chws_tool] project is another example
of how to customize this package.

[`Config` class]: east_asian_spacing/config.py
[`tests/config_test.py`]: tests/config_test.py

### HarfBuzz

This package uses the [HarfBuzz] shaping engine
using a Cython bindings [uharfbuzz].
by using a Cython bindings [uharfbuzz].

If you want to use a specific build of the [HarfBuzz],
this tool can invoke the external [hb-shape] command line tool instead
Expand Down Expand Up @@ -218,14 +238,17 @@ Instructions for other platforms may be available at
## Testing

### Test HTML
[test HTML]: #test-html

A [test HTML] is available
to check the behavior of fonts on browsers.

It can test fonts you built locally.
Download it to your local drive and
add your fonts to the "`fonts`" list
at the beginning of the `<script>` block.
1. Save the page to your local drive.
The HTML is a single file, saving the HTML file should work.
2. Add your font files to the "`fonts`" list
at the beginning of the `<script>` block.
3. Open it in your browser and choose your font.

[test HTML]: https://kojiishi.github.io/chws/test.html

Expand Down
1 change: 0 additions & 1 deletion east_asian_spacing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
from east_asian_spacing.dump import *
from east_asian_spacing.font import *
from east_asian_spacing.log_utils import *
from east_asian_spacing.noto_cjk_builder import *
from east_asian_spacing.shaper import *
from east_asian_spacing.spacing import *
from east_asian_spacing.tester import *
5 changes: 0 additions & 5 deletions east_asian_spacing/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,6 @@ def main():
asyncio.run(east_asian_spacing.Dump.main())
return

if sub_command == 'noto':
del args[1]
asyncio.run(east_asian_spacing.NotoCJKBuilder.main())
return

asyncio.run(east_asian_spacing.Builder.main())


Expand Down
36 changes: 23 additions & 13 deletions east_asian_spacing/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,20 @@ def calc_output_path(input_path, output_path, stem_suffix=None):
return (output_path.parent /
f'{output_path.stem}{stem_suffix}{output_path.suffix}')

async def _config_for_font(self, font):
config = self.config.for_font(font)
if config is None:
logger.info('Skipped by config: "%s"', font)
return None
if (config.skip_monospace_ascii
and await EastAsianSpacing.is_monospace_ascii(font)):
logger.info('Skipped because monospace: "%s"', font)
return None
if EastAsianSpacing.font_has_feature(font):
logger.info('Skipped because the features exist: "%s"', font)
return None
return config

async def build(self):
font = self.font
config = self.config
Expand All @@ -70,17 +84,15 @@ async def build(self):
return await self.build_collection()

assert not font.is_collection
config = self.config.for_font(font)
config = await self._config_for_font(font)
if config is None:
logger.info('Skipping by config: "%s"', font)
return
if EastAsianSpacing.font_has_feature(font):
return
spacing = EastAsianSpacing()
await spacing.add_glyphs(font, config)
if not spacing.can_add_to_font:
logger.info('Skipping due to no pairs: "%s"', font)
return
logger.info('Adding features to: "%s" %s', font, spacing)
spacing.add_to_font(font)
self._spacings.append(spacing)

Expand All @@ -91,13 +103,9 @@ async def build_collection(self):
# font, make sure we add the same data so that the new GPOS is also shared.
spacing_by_offset = {}
for font in self.font.fonts_in_collection:
config = self.config.for_font(font)
config = await self._config_for_font(font)
if config is None:
logger.info('Skipping by config: "%s"', font)
continue
if EastAsianSpacing.font_has_feature(font):
logger.info('Feature already exists: "%s"', font)
return
reader_offset = font.reader_offset("GPOS")
# If the font does not have `GPOS`, `reader_offset` is `None`.
# Create a shared `GPOS` for all fonts in the case. e.g., BIZ-UD.
Expand All @@ -119,11 +127,11 @@ async def build_collection(self):
built_fonts = []
for spacing, fonts in spacing_by_offset.values():
if not spacing.can_add_to_font:
logger.info('Skipping due to no pairs: "%s"',
logger.info('Skipping due to no pairs: %s',
list(font.font_index for font in fonts))
continue
logger.info('Adding feature to: %s',
list(font.font_index for font in fonts))
logger.info('Adding features to: %s %s',
list(font.font_index for font in fonts), spacing)
for font in fonts:
spacing.add_to_font(font)
self._spacings.append(spacing)
Expand Down Expand Up @@ -230,6 +238,8 @@ async def main():
default=0)
args = parser.parse_args()
init_logging(args.verbose, main=logger)
if args.glyph_out:
args.glyph_out.mkdir(exist_ok=True, parents=True)
if args.output:
args.output.mkdir(exist_ok=True, parents=True)
for input in Builder.expand_paths(args.inputs):
Expand All @@ -246,7 +256,7 @@ async def main():
builder = Builder(font, config)
await builder.build()
if not builder.has_spacings:
logger.warning('Skipped due to no changes: "%s"', input)
logger.warning('Skipped saving due to no changes: "%s"', input)
continue
builder.save(args.output,
stem_suffix=args.suffix,
Expand Down
56 changes: 19 additions & 37 deletions east_asian_spacing/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,20 @@ def __init__(self):
}
self.quotes_opening = {0x2018, 0x201C}
self.quotes_closing = {0x2019, 0x201D}
self.cjk_middle = {0x3000, 0x30FB}
self.cjk_middle = {0x30FB}
self.fullwidth_space = {0x3000}
self.cjk_period_comma = {0x3001, 0x3002, 0xFF0C, 0xFF0E}
self.cjk_column_semicolon = {0xFF1A, 0xFF1B}
self.cjk_colon_semicolon = {0xFF1A, 0xFF1B}
self.cjk_exclam_question = {0xFF01, 0xFF1F}

# Skip adding the features to fonts with monospace ASCII
# because they are generally for code.
self.skip_monospace_ascii = True
# Determines the applicability by computing ink bounds.
self.use_ink_bounds = True
# Specify which language behavior the font is.
# Valid only when `use_ink_bounds` is `False`.
self.language = None
# These code points are on the left-half of the glyph spaces only in
# ZHS fonts, though not all ZHS fonts follow the convention.
# Setting to `True` or `False` disables the heuristic detection.
self.is_colon_semicolon_middle = None

default = None # This will be set later in this file.

Expand All @@ -39,8 +44,9 @@ def _sets(self):
yield self.quotes_opening
yield self.quotes_closing
yield self.cjk_middle
yield self.fullwidth_space
yield self.cjk_period_comma
yield self.cjk_column_semicolon
yield self.cjk_colon_semicolon
yield self.cjk_exclam_question

def clear(self):
Expand All @@ -67,11 +73,14 @@ def for_font_name(self, name, is_vertical):
return self

def for_language(self, language):
"""Returns a copy with the specified language."""
"""Returns a copy with the specified language.
This also sets `use_ink_bounds` to `False`."""
if language == self.language:
return self
clone = self.clone()
clone.language = language
clone.use_ink_bounds = not language
return clone

def for_smoke_testing(self):
Expand Down Expand Up @@ -102,34 +111,7 @@ def _down_sample_to(input, max):
return set(itertools.islice(input, 0, None, interval))


class DefaultConfig(Config):
def for_font_name(self, name, is_vertical):
if name.startswith("Meiryo"):
config = self.for_language('JAN')
if is_vertical:
config = config.clone_if_is(self)
config.change_quotes_closing_to_opening(0x2019)
config.remove(0xFF0C, 0xFF0E)
return config
if name.startswith("Microsoft JhengHei"):
config = self.for_language('ZHT')
config = config.clone_if_is(self)
config.remove(0xFF08, 0xFF09, 0xFF3B, 0xFF3D, 0xFF5B, 0xFF5D,
0xFF5F, 0xFF60)
if is_vertical:
config.change_quotes_closing_to_opening(0x2019, 0x201D)
return config
if name.startswith("Microsoft YaHei"):
config = self.for_language('ZHS')
if is_vertical:
config = config.clone_if_is(self)
config.remove(0x3001, 0x3002, 0x3018, 0x3019, 0x301A, 0x301B,
0xFF08, 0xFF09, 0xFF0C, 0xFF0E)
return config
return self


class CollectionConfig(DefaultConfig):
class CollectionConfig(Config):
def __init__(self, font, languages=None, indices=None):
assert font.is_collection
super().__init__()
Expand Down Expand Up @@ -164,4 +146,4 @@ def _calc_indices_and_languages(num_fonts, indices, languages):
return itertools.zip_longest(indices, ())


Config.default = DefaultConfig()
Config.default = Config()
Loading

0 comments on commit bf68ee4

Please sign in to comment.