-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BCP47<->babel/polyglossia interface #961
Comments
I'll raise with the team: I'd like to see this handled at the |
@josephwright - that would be ideal and if there is any way to help with this, I will make some time. The multiscript branch of biblatex currently uses |
On Wed, Jan 22, 2020 at 04:50:06AM -0800, Joseph Wright wrote:
I'll raise with the team: I'd like to see this handled at the `expl3` level, really
I won’t contradict you here :-)
Arthur
|
One of the purposes of the files under |
biber/biblatex 4.0 will use BCP47 tags to specify multiscript alternates of fields and will do a lot of automatic language switching based on this for fields/parts of fields and for this, we need a reliable mapping of BCP47 <-> babel/polyglossia language names. Ideally, babel and polyglossia would allow language selection via BCP47 tags (I know polyglossia is looking at this but I'm not sure if this has been raised for babel?) as this would standardise language specification across packages but I realise that this is not trivial and so a mapping package would be the next best thing. This would in general help LaTeX integration to other internationalisation systems since BCP47 is an IETF standard. The tricky part is that babel tends to have variants of languages as separate language names and polyglossia tends to use options on top of generic language names. We would need the mapping to be agnostic about babel/polyglossia and ensure that BCP47 tags map to generally "the same" language in both (modulo the details of language support differences, naturally). |
Sure, although I had other priorities, and I left this possibility to a revamped set of selectors based more strictly in the concept of ‘locale’ (vs. the somewhat fuzzy of ’language’), and including extension subtags and private use subtags. Now |
Yes, babel has really improved recently, which is very nice. If we could all agree on BCP47 locale tags, that would be a huge step forward in interoperability. |
If I have a list from bcp47 to language/variant pairs in |
@javier, the list is in the main Polyglossia track for the implementation
of BCP-47:
reutenauer/polyglossia#226
right at the top and on my first reply.
…On Wed, Jan 22, 2020 at 8:12 AM Javier Bezos ***@***.***> wrote:
If I have a list from bcp47 to language/variant pairs in polyglossia, I
could add this information to the ini files, to improve the
interoperability (and in addition improve the data for babel, because
I've seen some mistakes). Note the ini files are not meant exclusively
for babel, but also as a data repository available to other packages.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#961?email_source=notifications&email_token=AAR7WYTAGOAWMBSM5LEOFQ3Q7BV7LA5CNFSM4KKEYVIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUFCEA#issuecomment-577261840>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR7WYTTNWGMMOR3C5S4XBDQ7BV7LANCNFSM4KKEYVIA>
.
|
Ideally, babel and polyglossia would allow language selection via
BCP47 tags (I know polyglossia is looking at this
Yes, polyglossia 1.47 (to be released in the next days) features this
already.
|
If I have a list from bcp47 to language/variant pairs in polyglossia
The list is here:
https://github.com/reutenauer/polyglossia/blob/master/tools/bcp47.py
and in the (master) manual, sec. 2.4:
https://github.com/reutenauer/polyglossia/blob/master/doc/polyglossia.tex
|
@plk: ... and for this, we need a reliable mapping of BCP47 <->
babel/polyglossia language names.
I hope this is needed only for backwards compatibility. BCP-47 says that
the IETF language tags should be used in the programming every time one
needs to refer to the language -- for example, "ngerman" should be replaced
by "de".
…On Wed, Jan 22, 2020 at 9:01 AM Jürgen Spitzmüller ***@***.***> wrote:
Am Mittwoch, den 22.01.2020, 08:12 -0800 schrieb Javier Bezos:
> If I have a list from bcp47 to language/variant pairs in polyglossia
The list is here:
https://github.com/reutenauer/polyglossia/blob/master/tools/bcp47.py
and in the (master) manual, sec. 2.4:
https://github.com/reutenauer/polyglossia/blob/master/doc/polyglossia.tex
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#961?email_source=notifications&email_token=AAR7WYWP5ZSBE3SXNHJQOUTQ7B3XZA5CNFSM4KKEYVIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUKWMA#issuecomment-577284912>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR7WYV22S4RCE6LIMXUMALQ7B3XZANCNFSM4KKEYVIA>
.
|
@jspitz Thank you. I’ll try to have the data in the |
No, we don't replace the old interface. We add the possiblity to use BCP-47 tags alternatively. Personally, I prefer human readable language names a lot. |
@jspitz I guess it depends where you are looking: internally, and at a code level, BCP 47 is really what should be passed. At a document level 'friendly' names are fine, but I'd hope that these can be converted to BCP 47 before being used. |
Arthur is still driving the boat. I'm just helping a bit in the engine room ;-) |
That's not planned, no (at least as far as my engagement with the matter is concerned) |
@jspitz I'm thinking |
@plpauloney
This is debatable in the context of LaTeX, because it's a ‘public’ markup language. It could be true for the internals, at the programming level, which are not exposed to the user (and which most LaTeX users don't know at all), but definitely if a user must select a language in an HTML form, they will see the name, even if internally it's mapped to the bcp47 code. Furthermore, names provide an additional abstraction layer which can be useful. |
I had to try it ;-). |
On Wed, Jan 22, 2020 at 09:41:59AM -0800, Javier Bezos wrote:
> Arthur is still driving the boat. I'm just helping a bit in the engine room ;-)
I had to try it ;-).
I’m all for making Babel and Polyglossia converge, as you should be
well aware.
|
On Wed, Jan 22, 2020 at 9:36 AM Jürgen Spitzmüller ***@***.***> wrote:
at a code level, BCP 47 is really what should be passed. At a document
level 'friendly' names are fine, but I'd hope that these can be converted
to BCP 47 before being used.
That's not planned, no (at least as far as my engagement with the matter
is concerned)
This is a pity, because it is not BCP-47 compliance.
|
On Wed, Jan 22, 2020 at 09:13:12AM -0800, Paulo Ney de Souza wrote:
I hope this is needed only for backwards compatibility. BCP-47 says that
the IETF language tags should be used in the programming every time one
needs to refer to the language
I never heard of anything like that. Do you have a reference?
Backward compatibility is of course essential in any case and we won’t
break the old interface arbitrarily.
for example, "ngerman" should be replaced
by "de".
You of course mean `[de-1901]`.
|
On Wed, Jan 22, 2020 at 9:38 AM Javier Bezos ***@***.***> wrote:
@plpauloney
BCP-47 says that the IETF language tags should be used in the programming
every time one needs to refer to the language -- for example, "ngerman"
should be replaced by "de".
This is debatable in the context of LaTeX, because it's a ‘public’ markup
language. It could be true for the internals, at the programming level,
which are not exposed to the user (and which most LaTeX users don't know at
all), but definitely if a user must select a language in an HTML form, they
will see the name, even if internally it's mapped to the bcp47 code.
Furthermore, names provide an additional abstraction layer which can be
useful.
@javier this is a misunderstanding about BCP-47. In fact, rather the
opposite of what you affirm is true. The choice of language in HTML should
be done in BCP-47 -- read here:
https://www.w3.org/International/articles/language-tags/
Paulo Ney
|
@plk Just a few questions. Which is the user interface do you have in mind? How the tags will be used? Like alternative names or by means or specific macros? |
It will be possible to have multiple
biblatex reads the Another aspect of this is that users specify languages in biblatex by babel/polyglossia names and biber needs to convert these (they appear in the So, there is a need to be able to go both ways (and these two directions need to cohere so a complete round trip results in the same tag/language name). |
@Arthur There are TWO main gains from introducing a standard like BCP-47 in
packages used by LaTeX.
First is the interoperability between packages -- and this has to do with
the internals and the use of BCP-47 in the guts of each package. There are
great benefits that can be derived by cross-feeding between Babel and
Polyglossia, but if they continue to name their files
german.ini <---> de-1901.ini
portugues.ini <----> pt-BR.ini
zh-cmn-Hans-CN.ini <--->
Chinese_Mandarin_Simplified_script_as_used_in_China.ini
there is vey little that can be accomplished.
Literally the Language Tag should be used werever a Language Tag is needed.
For example, a Babel manual written in Serbian using Cyrillic script should
be appropriately named:
Babel-ver7.01-sr-Cyrl.pdf
the same manual in German should be:
Babel-ver7.01-de.pdf
(and that is an extreme example).
The other one is User Interface. Users should be able to enter something
simple as in
... the hero \text-ru{Светлана Евгеньевна Савицкая} know in Japan as
\text-ja{スベトラーナ・サビツカヤ} was the ...
or \text{ru}{...} whatever the interface is decided... so the text should
be aprpopriately typeset syllable separated ... and then extracted into
separate files by programs like "detex" so each one can sent out to spelles
like Aspell using the respective language.
Aspell already used BCP-47, detex is being prepared and the missing part is
the source.
Paulo Ney
On Wed, Jan 22, 2020 at 11:07 AM Arthur Reutenauer <[email protected]>
wrote:
… On Wed, Jan 22, 2020 at 09:13:12AM -0800, Paulo Ney de Souza wrote:
> I hope this is needed only for backwards compatibility. BCP-47 says that
> the IETF language tags should be used in the programming every time one
> needs to refer to the language
I never heard of anything like that. Do you have a reference?
Backward compatibility is of course essential in any case and we won’t
break the old interface arbitrarily.
> for example, "ngerman" should be replaced
> by "de".
You of course mean `[de-1901]`.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#961?email_source=notifications&email_token=AAR7WYSYTFKQ33FEHTWIJHTQ7CKNJA5CNFSM4KKEYVIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUXR4Q#issuecomment-577337586>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR7WYSRC2CS4UKUFPQHWO3Q7CKNJANCNFSM4KKEYVIA>
.
|
@Arthur I think that "ngerman" refers to German since 2006 and not 1901 --
but I could be wrong, things are changing very fast with Babel. One more
reason to use the BCP-47 tags and not random naming of languages.
Paulo Ney
On Wed, Jan 22, 2020 at 11:07 AM Arthur Reutenauer <[email protected]>
wrote:
… On Wed, Jan 22, 2020 at 09:13:12AM -0800, Paulo Ney de Souza wrote:
> I hope this is needed only for backwards compatibility. BCP-47 says that
> the IETF language tags should be used in the programming every time one
> needs to refer to the language
I never heard of anything like that. Do you have a reference?
Backward compatibility is of course essential in any case and we won’t
break the old interface arbitrarily.
> for example, "ngerman" should be replaced
> by "de".
You of course mean `[de-1901]`.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#961?email_source=notifications&email_token=AAR7WYSYTFKQ33FEHTWIJHTQ7CKNJA5CNFSM4KKEYVIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJUXR4Q#issuecomment-577337586>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR7WYSRC2CS4UKUFPQHWO3Q7CKNJANCNFSM4KKEYVIA>
.
|
@jspitz I'm thinking expl3, of course
Yes I agree wrt expl3.
|
@reutenauer @pauloney (1) The current (2) The idea of identifying the version in the file name came to me when I was building the locale files, but I thought (and still think) that could make things unnecesarily complicated. (3) The latest version of @plk Last but not least, there are no unique mappings from a set of rules to any identifier, either a language name or a bcp47 code. While I think there are still many loose ends and I don't want to rush, but there are good news, because I based my work for the locales on the CLDR, so it is close to what you want. |
@josephwright Was there any progress on the I'm asking because I want to look into selecting the correct |
@moewew Nothing yet on The string is BCP47 at least in principle, but at the present I don't have code in place to split the language and locale. But as there are only about 6 languages to cover and they have simple fixed strings, so you can likely special-case. |
Not quite -- I've opened an issue in babel. |
OK, for now I'll look into a manual mapping for the few cases we have. Long-term it would be great if we could have a translator between BCP, |
I agree that we really need a separate package for BCP47<->babel/polyglossia mapping. @bastien-roucaries was working on something, not |
While the BCP47 codes already work in polyglossia they do not in |
We need a way to go between BCP47 tags and babel/polyglossia langauge names/options in a reliable and cross-package way. The multiscript support in bibaltex/biber 4.0 will rely on this heavily. Currently these mappings are hard-coded into the experimental biber/biblatex branches (with some user-facing support for redefining the BCP47->babel/polyglossia mapping).
Some data format that TeX and biber can parse and which is findable with kpsewhich would be ideal. Even a file of TeX data macros would be fine as biber could parse this. Even though multiscript support in biblatex/biber is currently experimental, the lack of such a package will be a blocker eventually.
CC'ing @jspitz, @moewew, @reutenauer, @bastien-roucaries, @jbezos
The text was updated successfully, but these errors were encountered: