Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to literal style for multiline strings #822

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

perlpunk
Copy link
Member

@perlpunk perlpunk commented Aug 10, 2024

Since many people complain about the defaulting to folded single quotes when a string has line breaks, I opened this PR for discussion.

It will now output stinggs as literal if it has line breaks.
Exceptions:

  • trailing spaces
  • special characters
  • string only contains line breaks (then it now uses double quotes and \n)
  • the maximum line length would be greater than the allowed number of columns left

The test suite passes for me locally.

See the output for some examples before and after:

Code
import yaml

strings = [
"""
1234567890 1234567890 1234567890 1234567890
1234567890 1234567890 1234567890 1234567890 1234567890
""",
"""
1234567890 1234567890 1234567890
1234567890 1234567890 1234567890
1234567890 1234567890 1234567890
""",
]
strings2 = strings.copy()
strings3 = [
"abc\n ",
"def\n",
"\nghi",
"\njkl\n",
"\n\n",
"\n \n",
" \n \n",
]
data=[
[[[[[[[[[[[[[[[[[[[[[strings]]]]]]]]]]]]]]]]]]]]],
strings2, strings3
]

out = yaml.dump(data)
print(out)
Output in current main branch
- - - - - - - - - - - - - - - - - - - - - - - '

                                              1234567890 1234567890 1234567890 1234567890

                                              1234567890 1234567890 1234567890 1234567890
                                              1234567890

                                              '
                                            - '

                                              1234567890 1234567890 1234567890

                                              1234567890 1234567890 1234567890

                                              1234567890 1234567890 1234567890

                                              '
- - '

    1234567890 1234567890 1234567890 1234567890

    1234567890 1234567890 1234567890 1234567890 1234567890

    '
  - '

    1234567890 1234567890 1234567890

    1234567890 1234567890 1234567890

    1234567890 1234567890 1234567890

    '
- - "abc\n "
  - 'def

    '
  - '

    ghi'
  - '

    jkl

    '
  - '


    '
  - "\n \n"
  - " \n \n"
Output in PR
- - - - - - - - - - - - - - - - - - - - - - - '

                                              1234567890 1234567890 1234567890 1234567890

                                              1234567890 1234567890 1234567890 1234567890
                                              1234567890

                                              '
                                            - |2

                                              1234567890 1234567890 1234567890
                                              1234567890 1234567890 1234567890
                                              1234567890 1234567890 1234567890
- - |2

    1234567890 1234567890 1234567890 1234567890
    1234567890 1234567890 1234567890 1234567890 1234567890
  - |2

    1234567890 1234567890 1234567890
    1234567890 1234567890 1234567890
    1234567890 1234567890 1234567890
- - "abc\n "
  - |
    def
  - |2-

    ghi
  - |2

    jkl
  - "\n\n"
  - "\n \n"
  - " \n \n"

For strings with multiple line breaks it now behaves like libyaml.

There are still differences to libyaml though.

Current PR:

- "abc\n "
- |
  def
- |2-

  ghi
- |2

  jkl

libyaml:

- "abc\n "
- "def\n"
- "\nghi"
- "\njkl\n"

So if there is only one line break, libyayml uses double quotes.
If we want that too, we would have to add another attribute to the ScalarAnalysis class, I think.

@perlpunk
Copy link
Member Author

Feedback from @nitzmahone and @ingydotnet : This will break stuff relying on that output, so it can't be the default.
It should wait until we have the new API with better configuration options.

Not sure how detailed the emitter configuration should be. For really fine grained formatting preferences a formatter should be used.
So I would need a concrete suggestion here how fine grained we want to influence the behaviour.
As you can see from the PR, use_literal_for_multiline would not be enough to actually describe the change(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant