Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotationless exact-printer #9385

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

BurningWitness
Copy link

@BurningWitness BurningWitness commented Oct 31, 2023

Potentially closes #7544.

Skeleton parser

Currently able to parse every single manifest stored on Hackage and to render it back into the exact same representation.

Implementation choices:

  • All whitespace is treated as spaces;

  • Data representation cannot be delayed in either dimension:

  • Curly bracket syntax is treated more leniently than in the lexer (see Meta: Exact-printer Mega-issue #7544 (comment) for examples, all of these are accepted here);

  • Trailing whitespace and newlines are preserved everywhere in the format due to curly bracket positions depending on them.

  • Fields are allowed to have names with spaces in them, due to the fact that certain extremely old packages allowed that (see e.g. build depends at the very end of smartworld-0.0.0.5). Patches are the current fix to this specific issue, field representation has been narrowed down to exclude spaces.

Further extensions

  • Every single currently used field format
  • Simple Layout encoding with automatic consistent offset increments
  • (?) Lexer compatibility: Layout -> [Field Position]

Discarded ideas


@BurningWitness
Copy link
Author

After thinking about it for a while, I do not think helper functions for updating the data structure make sense here. Every use-case has a completely different flow:

  • Parsing is collecting all the top-level fields and sections into categories (description, flags, executables, libraries, tests), then resolving categories in order.

  • Formatting is going depth-first and changing numbers around;

  • Altering build-depends (or any other field) is going breadth-first and finding the first instance of a specific element on each level, quitting early if possible.

On top of this the data structure itself is rather simple, so if someone needs to go from Contents to a Field, they can come up with a specialized function that does what they need.

@jappeace jappeace mentioned this pull request Nov 12, 2023
11 tasks
@BurningWitness
Copy link
Author

BurningWitness commented Jan 16, 2024

Note that while this is written using parsec, it should ideally use binary. The draft uses parsec simply because I didn't know binary was an option when I was writing it; additionally using binary would not be seamless, as it currently does not have getUTF8 and match/unsafeMatchLazyText primitives.


All of my work on this has stalled back in November.

The first step, which is the exact-printer of the format, works and alone qualifies for a PR. Some minor adjustments could be applied, such as converting to binary or parsing a stream of nodes instead of a list.

The second step would be decoupling every single field format, which turned out to be far more complex. Not only is the Field Syntax poorly documented, currently parsing couples format with implementation: for example Version parses version tags, logs a warning and discards the tags (see here). It is impossible to advance on anything here without a broad agreement on everything prior, and it would most probably fit better in a separate PR.

Consider this draft to be on an indefinite hiatus until feedback arrives.

@andreabedini
Copy link
Collaborator

@BurningWitness I just want to thank you for the update and commend you for having a go at this. I had been working on a different approach (that I have so far kept to myself). If I ever put something out, I will ping you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Meta: Exact-printer Mega-issue
2 participants