-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Latin <-> Cyrillic transliteration and Latin digraphs for Serbian #483
Comments
Many things are possible if someone steps up and does the implementation. |
@yannis1962 has prepared map files based on my contribution here. We'll see what happens next... |
I have prepared map files for Latin->Cyrillic and Cyrillic->Latin in the case of Serbian. The only flaw I see is that when I have Љ Њ Џ as input, I can send them either to LJ NJ DŽ (uppercase) or to Lj Nj Dž (titlecase). I need confirmation by native speakers that this is a good choice. For example, what happens when somebody has a given name starting with Љ? When I transliterate the initial “Љ.” into Latin Maybe should I implement another rule saying that when Љ is not preceded by a capital letter and followed by a period, it I need help from native speakers… I'm including the MAP and TEC files, as well as two test files with the UHRD in Serbian (converted from Latin to Cyrillic and from Cyrillic to Latin) in TeX and PDF format. You will need to use some other font if you run them (XeTeX only). |
I am not a native speaker/writer but it looks OK.
Here I can contribute with the explicit rule: Free translation: Latin digraphs used as starting letters in a sentence, a given name, or an abbreviation must be written as given in Table 8: Dž, Lj, Nj; but as DŽ, LJ, NJ in fully uppercase context (to emphasize). I guess there are no changes in newer editions.
Definitely, let us wait until then... |
As I suspected. So that raises the question: how do I force the transcription into titlecase?
How about using a LaTeX macro \titlecase{Љ} to be sure you will get a titlecase, no matter what follows?
Le 23 mars 2021 à 14:45, Ivan Kokan ***@***.***> a écrit :
The only flaw I see is that when I have Љ Њ Џ as input, I can send them either to LJ NJ DŽ (uppercase) or to Lj Nj Dž (titlecase).
I added a context rule so that Љ Њ Џ followed by a lowercase letter is always sent to titlecase, and otherwise to uppercase.
I need confirmation by native speakers that this is a good choice.
I am not a native speaker/writer but it looks OK.
For example, what happens when somebody has a given name starting with Љ? When I transliterate the initial “Љ.” into Latin
I will get “LJ.” which is obviously bad, but is the correct way to write the initial in that case “Lj.” or rather “L.” ?
Here I can contribute with the explicit rule:
Правопис српскога језика, Матица српска, 1994. (друго издање)
https://sr.wikipedia.org/sr-el/%D0%9F%D1%80%D0%B0%D0%B2%D0%BE%D0%BF%D0%B8%D1%81_%D1%81%D1%80%D0%BF%D1%81%D0%BA%D0%BE%D0%B3%D0%B0_%D1%98%D0%B5%D0%B7%D0%B8%D0%BA%D0%B0 <https://sr.wikipedia.org/sr-el/%D0%9F%D1%80%D0%B0%D0%B2%D0%BE%D0%BF%D0%B8%D1%81_%D1%81%D1%80%D0%BF%D1%81%D0%BA%D0%BE%D0%B3%D0%B0_%D1%98%D0%B5%D0%B7%D0%B8%D0%BA%D0%B0>
https://gimnazijadg.files.wordpress.com/2012/03/pravopis-srpskoga-jezika.pdf <https://gimnazijadg.files.wordpress.com/2012/03/pravopis-srpskoga-jezika.pdf>
<https://user-images.githubusercontent.com/1058211/112155113-46692680-8be5-11eb-8955-86b5763c7f46.png>
Free translation: Latin digraphs used as starting letters in a sentence, a given name, or an abbreviation must be written as given in Table 8: Dž, Lj, Nj; but as DŽ, NJ, LJ in fully uppercase context (to emphasize).
I guess there are no changes in newer editions.
Maybe should I implement another rule saying that when Љ is not preceded by a capital letter and followed by a period, it
should be titlecase?
I need help from native speakers…
Definitely, let us wait until then...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#483 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFXC7M2KJA5VHKEMLPKRFDTFCLOPANCNFSM4ZUE7NYA>.
<http://www.imt-atlantique.fr/> Yannis HARALAMBOUS
Professor
Computer Science Department
UMR CNRS 6285 Lab-STICC
<http://perso.telecom-bretagne.eu/yannisharalambous/> <https://twitter.com/y_haralambous> <https://www.linkedin.com/in/yannis-haralambous-5529073?trk=hp-identity-name>Technopôle Brest-Iroise CS 83818
29238 Brest Cedex 3, France
Une école de l'IMT <http://www.imt.fr/>
— Vous cherchez trop à comprendre, c'est un grave défaut.
— J'ai déjà entendu cette phrase. — Vous l'avez écrite. (Jean Cocteau)
|
"Smart ways": transliterate to titlecase if it is followed by something lowercase (starting a sentence) or a period (initials/abbreviations). This would obviously fail with a sentence having simply "Љ" as its first word. I think that macro is inevitable in any case, hence no "smart way" must be implemented. |
Le 23 mars 2021 à 14:59, Ivan Kokan ***@***.***> a écrit :
"Smart ways": transliterate to titlecase if it is followed by something lowercase (starting a sentence) or a period (initials/abbreviations). This would obviously fail with a sentence having simply "Љ" as its first word.
What I have done is:
1) titlecase if followed by lowercase
2) uppercase if preceded by uppercase
3) titlecase if not (2) and followed by period
These three rules should cover most of the cases…
|
Here are the files with the three smart rules mentioned in the previous message |
I have been in contact with Uroš Stefanović (https://ctan.org/author/stefanovic) meanwhile. It seems we are getting somewhere with this implementation. Let me just summarize what we currently have:
TODO:
I guess that's all. |
As for LuaTeX: Look at how ArabLuaTeX does it. |
More specifically: https://tex.stackexchange.com/questions/285610/ |
I have found two additional rules: При размакнутом (спационираном) писању сва слова се једнако раздвајају (L j u b l j a n a а не Lj u b lj a n a). Ако се натписи (нпр. MENJAČNICA) пишу одозго надоле, NJ, LJ односно DŽ не треба да остану састављени, него друго слово долази испод првог. Google Translate (a bit improved): I would tell that the first rule is feasible providing optional arguments The second rule is way off |
Originally posted by @jspitz in #482 (comment)
I am not sure if the upper comment means that the transliteration is considered for polyglossia's future...
Here are the bidirectional Unicode mappings for Serbian to start with.
serbian_cyrillic-latin_transliteration.xlsx
Note:
T2A
) and keyboard layout, i.e. no checks nor fallbacks to separate characters must be implemented.gloss-serbian.ldf
, good - nothing to take care of.Some good examples to eventually test with:
disableligatures
isfalse
)disableligatures
istrue
)The text was updated successfully, but these errors were encountered: