Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect headings generated for my heading-torture-test.md file #87

Open
eliminmax opened this issue Mar 19, 2024 · 4 comments
Open

Incorrect headings generated for my heading-torture-test.md file #87

eliminmax opened this issue Mar 19, 2024 · 4 comments

Comments

@eliminmax
Copy link

Thank you for making such a great plugin.

I created a markdown file (available here) designed to see how GitHub generates heading IDs in different cases ranging from common (like headings containing non-[a-z] letters like the German ß, Arabic ا, and Chinese , to weird cases with numbers at the end of headings.

Several of the headings generated by this plugin when I run :GenTocGFM in that file are different than the ones generated by GitHub.

Most of the issues had to do with headings with numbers at the end, though the Arabic ا was incorrectly deleted, as was a trailing underscore.

Click here to see what this plugin generates for my test file, with notes where it got it wrong.
<!-- vim-markdown-toc GFM -->

* [test.md](#testmd)
* [Same Level Same Name](#same-level-same-name)
* [Same Level Same Name](#same-level-same-name-1)
* [Different Level Same Name](#different-level-same-name)
  * [Different Level Same Name](#different-level-same-name-1)
* [Same Name Differing Caps](#same-name-differing-caps)
* [SAME NAME DIFFERING CAPS](#same-name-differing-caps-1)
* [same name differing caps](#same-name-differing-caps-2)
* [Same Name(   )different-Non-»letter° chars](#same-name---different-non-letter-chars)
* [Same Name &^$ different Non letter chars](#same-name--different-non-letter-chars)
* [Same Name but One Has Code](#same-name-but-one-has-code)
* [Same Name `but` One `Has Code`](#same-name-but-one-has-code-1)
* [Ending Number Trickery](#ending-number-trickery)
* [Ending Number Trickery](#ending-number-trickery-1)
* [Ending Number Trickery 1](#ending-number-trickery-1) <!-- should be "ending-number-trickery-1-1" -->
* [Ending Number Trickery](#ending-number-trickery-2)
* [Ending Number Trickery 2](#ending-number-trickery-2) <!-- should be "ending-number-trickery-2-1" -->
* [Other Ending Number Trickery 1](#other-ending-number-trickery-1)
* [Other Ending Number Trickery](#other-ending-number-trickery)
* [Other Ending Number Trickery](#other-ending-number-trickery-1) <!-- should be "other-ending-number-trickery-2" -->
* [Final Ending Number Trickery](#final-ending-number-trickery)
* [Final Ending Number Trickery](#final-ending-number-trickery-1)
* [Final Ending Number Trickery 1](#final-ending-number-trickery-1) <!-- should be "final-ending-number-trickery-1-1" -->
* [Final Ending Number Trickery 1 1](#final-ending-number-trickery-1-1) <!-- should be "final-ending-number-trickery-1-1-1" -->
* [Final Ending Number Trickery 1 1](#final-ending-number-trickery-1-1-1) <!-- should be "final-ending-number-trickery-1-1-2" -->
* [Underscored_heading](#underscored_heading)
* [Multiple__underscores](#multiple__underscores)
* [\_Leading_underscore](#_leading_underscore)
* [Trailing_underscore\_](#trailing_underscore) <!-- should be "trailing_underscore_" -->
* [Heading with non-`[a-z]` letters like ß, ا, and 猫](#heading-with-non-a-z-letters-like-ß--and-猫) <!-- should be "heading-with-non-a-z-letters-like-ß-ا-and-猫" -->
* [Heading with a Chinese punctuation mark (specifically '】')](#heading-with-a-chinese-punctuation-mark-specifically-)

<!-- vim-markdown-toc -->
# test.md

## Same Level Same Name

## Same Level Same Name

## Different Level Same Name

### Different Level Same Name

## Same Name Differing Caps

## SAME NAME DIFFERING CAPS

## same name differing caps

##   Same Name(   )different-Non-»letter° chars

## Same Name &^$ different Non letter chars

## Same Name but One Has Code

## Same Name `but` One `Has Code`

## Ending Number Trickery

## Ending Number Trickery

## Ending Number Trickery 1

## Ending Number Trickery

## Ending Number Trickery 2

## Other Ending Number Trickery 1

## Other Ending Number Trickery

## Other Ending Number Trickery

## Final Ending Number Trickery 

## Final Ending Number Trickery 

## Final Ending Number Trickery 1 

## Final Ending Number Trickery 1 1

## Final Ending Number Trickery 1 1

## Underscored_heading

## Multiple__underscores

## \_Leading_underscore

## Trailing_underscore\_

## Heading with non-`[a-z]` letters like ß, ا, and 猫

## Heading with a Chinese punctuation mark (specifically '】')
@mzlogin
Copy link
Owner

mzlogin commented Mar 19, 2024

Thanks for reporting, I may look at it tomorrow when get some free time.

And if you can make a PR, feel free to commit it.

@mzlogin
Copy link
Owner

mzlogin commented Mar 20, 2024

Your test cases are very useful. I'll try to fix the issues this weekend.

@eliminmax
Copy link
Author

Thanks! I was working on writing an awk script to add the heading ids to the output of cmark-gfm, and I wanted to make sure to handle it right. Turns out the regexp to match all invalid characters is very complex, and in the regex dialect GNU's awk implementation uses, it's nearly 10 thousand characters long. I found a GitHub repository which includes a computer-generated JavaScript regexp to match all invalid characters in heading names. I created a python script based on that, to generate a series of AWK gsub statements for my script, splitting it into a bunch of smaller regexp patterns, but it requires the non-standard \uHH escape sequence added in the latest version of GNU awk, so it's not portable across awk versions, let alone vim. In case my script is still helpful, I've uploaded it as a gist here.

mzlogin added a commit that referenced this issue Mar 29, 2024
@mzlogin
Copy link
Owner

mzlogin commented Apr 12, 2024

Please update the plugin to the newest version and try again, it should can handle your cases now. 🤝

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants