Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing in converting strings to unicode #272

Closed
Rmano opened this issue Nov 11, 2020 · 2 comments · Fixed by #373
Closed

Failing in converting strings to unicode #272

Rmano opened this issue Nov 11, 2020 · 2 comments · Fixed by #373

Comments

@Rmano
Copy link

Rmano commented Nov 11, 2020

Suppose you have this bibtex file (call it fail.bib):

@article{a,
  author = {One Two and Three{\'\i}abc-Four{\'\i}def},
}

The example program:

#! /usr/bin/python3
#
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import convert_to_unicode
bf=open("fail.bib")
bib_database = bibtexparser.bparser.BibTexParser(common_strings=True,
                                                 customization=convert_to_unicode
                                                ).parse_file(bf)
bf.close()
print(bib_database.entries)

produces the following output:

{'author': 'One Two and Threeı́abc-Four\\d́ef', 'ENTRYTYPE': 'article', 'ID': 'a'}]

which is evidently wrong. It seems that \i is converted to ı (dotless i) too early, and then \'ı creates havoc.

I am not sure what the solution could be, because I do not follow the code very well --- quite too complex for my skill level, I fear.

@Rmano
Copy link
Author

Rmano commented Nov 11, 2020

It seems that adding the pattern and sorting the substitution lists (so that it starts substituting the longest match) sort of work:

#! /usr/bin/python3
#
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import convert_to_unicode
bibtexparser.latexenc.unicode_to_crappy_latex1 = (
            ('í', r"{\'\i}"), *bibtexparser.latexenc.unicode_to_crappy_latex1
                )
bibtexparser.latexenc.unicode_to_crappy_latex1=sorted(bibtexparser.latexenc.unicode_to_crappy_latex1, key=lambda x: len(x[1]), reverse=True)
bf=open("fail.bib")
bib_database = bibtexparser.bparser.BibTexParser(common_strings=True,
                                                 customization=convert_to_unicode
                                                ).parse_file(bf)
bf.close()
print(bib_database.entries)

which outputs

[{'author': 'One Two and Threeíabc-Fourídef', 'ENTRYTYPE': 'article', 'ID': 'a'}]

@MiWeiss
Copy link
Collaborator

MiWeiss commented Jul 10, 2022

Waiting for #264 before addressing this (might be fixed along the way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants