Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in unicode conversion #274

Closed
rbawden opened this issue Jan 9, 2021 · 1 comment
Closed

Errors in unicode conversion #274

rbawden opened this issue Jan 9, 2021 · 1 comment

Comments

@rbawden
Copy link

rbawden commented Jan 9, 2021

I have come across some problems in the conversion to unicode. I have been using the following code to parse a raw bibtex string (hal_bibtex) using BibTexParser, and although some of the more common escaped characters are correctly converted, others seem to contain some errors:

from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import convert_to_unicode

parser = BibTexParser()
parser.customization = convert_to_unicode
info = bibtexparser.loads(hal_bibtex, parser=parser)

For example, I tried it out on the following bibtex file : https://hal.inria.fr/hal-01682188v1/bibtex, which contains a large number of accented characters. The result contains at least the following errors:

Bibtex:

... Katar{'i}na ... Gr{=u}z{=i}tis ... Jel{'i}nek ... Ljube{\v s}i{'c} ... Mart{'i}nez Alonso ... Ne{\v s}pore-B{=e}rzkalne ... Samard{\v z}i{'c} ... Saul{=i}te, ...

Once converted to unicode:

... Katar\ńa ... and Grūz\̄tis ... and Jel'ék ... and Ljubeši'cŃikola ... Mart'in ́Alonso, ... Nešpore-B=r̄zkalne ... Samardži'c ... Saul=iē, ...

Expected output:

... Katarína ... and Grūzītis ... and Jelínek ... and Ljubešić Nikola ... Martínez Alonso, ... Nešpore‐bērzkalne ... Samardžić ... Saulīte, ...

Thanks in advance for your help!

@MiWeiss
Copy link
Collaborator

MiWeiss commented Jul 9, 2022

Duplicate of #272. There's also a workaround provided there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants