You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Besides segmentation rules and patterns, SILE should likely implement such "per-language" default preferences:
In Babel some are at (2, 2) (e.g. Finnish), most at (2, 3), some at (1,1), etc.
Appropriate values could be possibly derived from Babel's ini files for all supported languages....
Perhaps dubious sometimes, e.g. for Georgian patterns are generated for (1,2) according to their comments, but Babel uses (2,2) regardless; while Typst (see below) seems to use (1,2)...
Workaround
(Not a general solution)
\lua{
-- To do after having switched to English language i.e. the "en" hyphenator got instantiated
SILE._hyphenators['en'].rightmin = 3
}
Further thought
This was probably overlooked (due to other issues), but (language-specific / custom) left/right hyphen minima were actually mentioned in an existing issue (now closed): Justification for Indic scripts (Malayalam) #308, with rather extreme values in the LaTeX example (3, 5).
While at it, the current Knuth-Plass line breaker use a single hyphenPenalty (probably as TeX does), but we could use variable penalties depending on initial/final segment lengths. That is to say, rather than being behind LaTeX (and/or TeX, which we are here), there would be a way to have improvements.
The text was updated successfully, but these errors were encountered:
Linking to #1994 and #1631 -- I do think this should be part of the same "language refactoring".
Perhaps we should have these in a dedicated "project"?
And yes the language code related issues are so intertwined they are hard to track and work on. It's hard to sit down and get my head around the problem or know when an individual issue is actionable. Grouping them all in a "project" sounds like a good idea.
Issue
The Knuth-Liang hyphenation always defaults to (2, 2) for left hyphen and right hyphen minima, upon initialization:
sile/core/hyphenator-liang.lua
Line 105 in b2cc084
These are quite sane defaults for the algorithm... but most languages would beg to differ and use different values...
Typically, for instance, English would likely prefer (2, 3), as Babel (LaTeX) implements it:
https://github.com/latex3/babel/blob/d4d55826cd264220b7a8d92b453748564affea54/locale/en/babel-en-GB.ini#L152-L153
Besides segmentation rules and patterns, SILE should likely implement such "per-language" default preferences:
In Babel some are at (2, 2) (e.g. Finnish), most at (2, 3), some at (1,1), etc.
ini
files for all supported languages....Workaround
(Not a general solution)
Further thought
This was probably overlooked (due to other issues), but (language-specific / custom) left/right hyphen minima were actually mentioned in an existing issue (now closed): Justification for Indic scripts (Malayalam) #308, with rather extreme values in the LaTeX example (3, 5).
AFAIK, Typst (hypher) seems to implement these right/left minima per languages (in one big file):
https://github.com/typst/hypher/blob/6b40344866f2d7b2e156db93e91cf105cb75f7a2/src/lang.rs#L201-L205C1.
While at it, the current Knuth-Plass line breaker use a single hyphenPenalty (probably as TeX does), but we could use variable penalties depending on initial/final segment lengths. That is to say, rather than being behind LaTeX (and/or TeX, which we are here), there would be a way to have improvements.
The text was updated successfully, but these errors were encountered: