Unexpected concatenation of field tokens #396

zepinglee · 2023-09-15T15:22:01Z

Describe the bug

In the following example, the value of field 10 # "~" # jan is expected to be 10~Jan. but the output of this library is 10 # "~".

BibTeX has three types of field tokens: nonnegative number, macro name (like jan), and a brace-balanced string delimited by either double quotes or braces. They can be concatenated by the # character. Although the first type is called "number", it behaves the same as a string and it can be applied with string slicing, text length, and concatenation in a .bst style.

BTW, I've also made a bib2json.bst style that may help testing. It reads .bib data and writes JSON format (though with some limitations) to the .bbl output.

Reproducing

Version: 2.0.0b2

Code:

import bibtexparser
bibtex_str = '''
@STRING{ jan = "Jan." }

@INBOOK{inbook-full,
   month = 10 # "~" # jan,
}
'''
library = bibtexparser.parse_string(bibtex_str)
month = library.entries[0].fields_dict['month'].value
print(month.__repr__())
assert month == "10~Jan."

Output:

'10 # "~"'

The text was updated successfully, but these errors were encountered:

MiWeiss · 2023-09-15T16:35:57Z

Thanks a lot for the beautiful bug report. This will probably have to be adressed in two distinct PRs

One PR to fix the splitter to contain the entire field, even if the field contains string concatenations.
One follow-up PR to adapt StringInterpolationMiddleware (and probably add a further middleware) to properly handle concatenation.

The first of these PRs is likely nontrivial.

P.s. I have not actually reproduced the issue, but given the nice issue description and the fact that token concatenation is not yet supported, I still added the reproduced label.

This allows the splitter to correctly handle #-based string concatenation. Note: This will still lead to downstream problems, as some of these concatenated fields will not have a recognized enclosing, and as string interpolation does not yet work with concatenated references. However, these cases did not work before either and this this PR does not (knowingly) introduce any regressions. The hereby mentioned problems will be addressed in a subsequent PR. This is the first pr to address (but not yet close) #396

kmccurley · 2024-06-21T04:06:57Z

Note that string concatenation can also be used inside @string, and I've seen this in cryptobib. An example is:

@string{asiacryptname =         "ASIACRYPT"}
@string{asiacrypt91name =       asiacryptname # "'91"}
@string{asiacrypt92name =       auscryptname # "'92"}

MiWeiss added bug enhancement reproduced labels Sep 15, 2023

MiWeiss mentioned this issue Sep 18, 2023

✨ handle concatenated fields with inner quotes in splitter #398

Merged

MiWeiss added the v2 label Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected concatenation of field tokens #396

Unexpected concatenation of field tokens #396

zepinglee commented Sep 15, 2023

MiWeiss commented Sep 15, 2023 •

edited

Loading

kmccurley commented Jun 21, 2024

Unexpected concatenation of field tokens #396

Unexpected concatenation of field tokens #396

Comments

zepinglee commented Sep 15, 2023

MiWeiss commented Sep 15, 2023 • edited Loading

kmccurley commented Jun 21, 2024

MiWeiss commented Sep 15, 2023 •

edited

Loading