Failing in converting strings to unicode #272

Rmano · 2020-11-11T18:31:52Z

Suppose you have this bibtex file (call it fail.bib):

@article{a,
  author = {One Two and Three{\'\i}abc-Four{\'\i}def},
}

The example program:

#! /usr/bin/python3
#
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import convert_to_unicode
bf=open("fail.bib")
bib_database = bibtexparser.bparser.BibTexParser(common_strings=True,
                                                 customization=convert_to_unicode
                                                ).parse_file(bf)
bf.close()
print(bib_database.entries)

produces the following output:

{'author': 'One Two and Threeı́abc-Four\\d́ef', 'ENTRYTYPE': 'article', 'ID': 'a'}]

which is evidently wrong. It seems that \i is converted to ı (dotless i) too early, and then \'ı creates havoc.

I am not sure what the solution could be, because I do not follow the code very well --- quite too complex for my skill level, I fear.

The text was updated successfully, but these errors were encountered:

Rmano · 2020-11-11T18:53:09Z

It seems that adding the pattern and sorting the substitution lists (so that it starts substituting the longest match) sort of work:

#! /usr/bin/python3
#
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import convert_to_unicode
bibtexparser.latexenc.unicode_to_crappy_latex1 = (
            ('í', r"{\'\i}"), *bibtexparser.latexenc.unicode_to_crappy_latex1
                )
bibtexparser.latexenc.unicode_to_crappy_latex1=sorted(bibtexparser.latexenc.unicode_to_crappy_latex1, key=lambda x: len(x[1]), reverse=True)
bf=open("fail.bib")
bib_database = bibtexparser.bparser.BibTexParser(common_strings=True,
                                                 customization=convert_to_unicode
                                                ).parse_file(bf)
bf.close()
print(bib_database.entries)

which outputs

[{'author': 'One Two and Threeíabc-Fourídef', 'ENTRYTYPE': 'article', 'ID': 'a'}]

MiWeiss · 2022-07-10T13:12:49Z

Waiting for #264 before addressing this (might be fixed along the way)

closes #272

This was referenced Jul 9, 2022

Errors in unicode conversion #274

Closed

bibtextparser does not properly handle escaped dollar signs in input file #264

Closed

MiWeiss added the on hold label Jul 10, 2022

MiWeiss added a commit that referenced this issue May 26, 2023

✅ Implement test case provided in #272

d40642d

MiWeiss added fixed in v2 and removed on hold labels May 26, 2023

MiWeiss mentioned this issue May 26, 2023

✅ Implement test case provided in #272 #373

Merged

MiWeiss closed this as completed in #373 May 26, 2023

MiWeiss added a commit that referenced this issue May 26, 2023

✅ Implement test case provided in #272 (#373)

b2a668c

MiWeiss added a commit that referenced this issue May 26, 2023

✅ Test empty string (#374)

3b0d923

closes #272

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing in converting strings to unicode #272

Failing in converting strings to unicode #272

Rmano commented Nov 11, 2020

Rmano commented Nov 11, 2020

MiWeiss commented Jul 10, 2022

Failing in converting strings to unicode #272

Failing in converting strings to unicode #272

Comments

Rmano commented Nov 11, 2020

Rmano commented Nov 11, 2020

MiWeiss commented Jul 10, 2022