Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected concatenation of field tokens #396

Open
zepinglee opened this issue Sep 15, 2023 · 2 comments
Open

Unexpected concatenation of field tokens #396

zepinglee opened this issue Sep 15, 2023 · 2 comments

Comments

@zepinglee
Copy link
Contributor

Describe the bug

In the following example, the value of field 10 # "~" # jan is expected to be 10~Jan. but the output of this library is 10 # "~".

BibTeX has three types of field tokens: nonnegative number, macro name (like jan), and a brace-balanced string delimited by either double quotes or braces. They can be concatenated by the # character. Although the first type is called "number", it behaves the same as a string and it can be applied with string slicing, text length, and concatenation in a .bst style.

BTW, I've also made a bib2json.bst style that may help testing. It reads .bib data and writes JSON format (though with some limitations) to the .bbl output.

Reproducing

Version: 2.0.0b2

Code:

import bibtexparser
bibtex_str = '''
@STRING{ jan = "Jan." }

@INBOOK{inbook-full,
   month = 10 # "~" # jan,
}
'''
library = bibtexparser.parse_string(bibtex_str)
month = library.entries[0].fields_dict['month'].value
print(month.__repr__())
assert month == "10~Jan."

Output:

'10 # "~"'
@MiWeiss
Copy link
Collaborator

MiWeiss commented Sep 15, 2023

Thanks a lot for the beautiful bug report. This will probably have to be adressed in two distinct PRs

  • One PR to fix the splitter to contain the entire field, even if the field contains string concatenations.
  • One follow-up PR to adapt StringInterpolationMiddleware (and probably add a further middleware) to properly handle concatenation.

The first of these PRs is likely nontrivial.

P.s. I have not actually reproduced the issue, but given the nice issue description and the fact that token concatenation is not yet supported, I still added the reproduced label.

MiWeiss added a commit that referenced this issue Sep 18, 2023
This allows the splitter to correctly handle #-based string concatenation.

Note: This will still lead to downstream problems, as some of these concatenated fields will not have a recognized enclosing, and as string interpolation does not yet work with concatenated references. However, these cases did not work before either and this this PR does not (knowingly) introduce any regressions. The hereby mentioned problems will be addressed in a subsequent PR.

This is the first pr to address (but not yet close) #396
@MiWeiss MiWeiss added the v2 label Sep 20, 2023
@kmccurley
Copy link

Note that string concatenation can also be used inside @string, and I've seen this in cryptobib. An example is:

@string{asiacryptname =         "ASIACRYPT"}
@string{asiacrypt91name =       asiacryptname # "'91"}
@string{asiacrypt92name =       auscryptname # "'92"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants