Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline strings are ugly after dumping #240

Open
neumond opened this issue Dec 27, 2018 · 16 comments
Open

Multiline strings are ugly after dumping #240

neumond opened this issue Dec 27, 2018 · 16 comments

Comments

@neumond
Copy link

neumond commented Dec 27, 2018

>>> lines = """
... a
... b
... c
... """
>>> lines
'\na\nb\nc\n'
>>> print(yaml.dump({'a': lines}, default_flow_style=False))
a: '

  a

  b

  c

  '

>>> # although roundtrip of dump-load is correct
>>> yaml.load(yaml.dump({'a': lines}))
{'a': '\na\nb\nc\n'}

This could output much shorter

a: |
  a
  b
  c
@ingydotnet
Copy link
Member

Or shorter still:

a: "a\nb\nc\n"

Most dumpers have to guess how to dump a string.This is one of those items where it's hard to guess.

I do agree the single quote style is ugly here.

If someone wants to create a PR for scalar style guessing (with lots of tests), we'd be happy to review and consider integrating it. Be aware that the same logic needs to be made to pyyaml and libyaml.

@ysaakpr
Copy link

ysaakpr commented Jun 18, 2019

def str_presenter(dumper, data):
    try:
        dlen = len(data.splitlines())
        if (dlen > 1):
            return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    except TypeError as ex:
        return dumper.represent_scalar('tag:yaml.org,2002:str', data)
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

Tried adding this using add_representer, But somehow the results are not always consistent. For some, I am able them to get properly block quoted, but for some string, though they are a multi line, they will continue a: "a\nb\nc\n" style.

If anyone can give me a hint, on where I have to make the changes, I can put some amount of time to make the PR

@ysaakpr
Copy link

ysaakpr commented Oct 29, 2019

My issue was related to #121 its painful to do spend lots of time to findout what is the reason

@schollii
Copy link

schollii commented May 4, 2020

Nice. If there is fundamental issue we could check in another language that has yaml load/dump (like golang) and supports it. Or ruamel-yaml.

@melezhik
Copy link

melezhik commented Dec 9, 2021

Actually, this issue is only reason why I can't use pyyaml on my project, for me changing from | multiline to line1 \n line2 format is not acceptable as it will break readable representation for people using GitOps manifests in our repository, where helm values passed as YAML multiline stings, I have to use ruamel.yaml now instead, which has other drawbacks for me ... sigh

@ingydotnet
Copy link
Member

OK I've added this to https://github.com/yaml/pyyaml/projects/9

So we'll look at that for the next release, though I can't say when that will happen.

@schollii
Copy link

Sone suggestions :

Separate concerns of representation from data otherwise you will end up with a mess of code that is hard to maintain.

Eg the | symbol is presentation, it is metadata about the layout of the data. Similarly references are metadata. And I think most people would consider Comments to be data in the context of yaml, but they can also be treated as metadata because they provide info about surrounding data or context.

So at load time the metadata data must be loaded and stored, and used at output time.

And as mentioned, there may be third-party open source libs out there eg in Java, Go or Javascript (or even in Python) that have already solved this problem. This is one of the purposes of open source, sharing knowledge. There is no reason to reinvent the wheel here, use them as inspiration.

@melezhik
Copy link

melezhik commented Dec 17, 2021

@schollii i am not sure if you understand the issue. let me repeat. pyyaml parser voluntary changes an original yaml markup when doing a dump. it shouldn't do this or at least this behavior should be configurable.

our programs are not dependent on presentation layers, people who read and edit yaml files are.

@ingydotnet
Copy link
Member

@melezhik right. Since we are probably adding a better config system in the next release, the rough plan for this is that we add a config option for the format of multiline strings to prefer. Also I suspect it should be easy to configure this with a custom function that can provide that the preference.

@jgunstone
Copy link

found this on StackOverflow as a quick fix for my requirement:

parsed = {'fdir_root': '/mnt/c/engDev/git_mf/ipyrun/tests/examples/line_graph_batch',
 'fpth_config': '/mnt/c/engDev/git_mf/ipyrun/tests/examples/line_graph_batch/config-shell_handler.json',
 'title': '# Plot Straight Lines\n### example RunApp',
 'configs': []}

import yaml
def str_presenter(dumper, data):
    """configures yaml for dumping multiline strings
    Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
    if len(data.splitlines()) > 1:  # check for multiline string
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)
yaml.representer.SafeRepresenter.add_representer(str, str_presenter) # to use with safe_dum

s = yaml.dump(parsed, indent=2)  # , sort_keys=True)
print(s)

>>> configs: []
>>> fdir_root: /mnt/c/engDev/git_mf/ipyrun/tests/examples/line_graph_batch
>>> fpth_config: /mnt/c/engDev/git_mf/ipyrun/tests/examples/line_graph_batch/config-shell_handler.json
>>> title: |-
>>>   # Plot Straight Lines
>>>   ### example RunApp

@ingydotnet
Copy link
Member

Also #121 (comment)

@cjw296
Copy link

cjw296 commented Apr 12, 2022

Slight tweak, better handles strings ending in a newline and might be a bit faster:

def str_presenter(dumper, data):
    """configures yaml for dumping multiline strings
    Ref: https://stackoverflow.com/questions/8640959/how-can-i-control-what-scalar-form-pyyaml-uses-for-my-data"""
    if data.count('\n') > 0:  # check for multiline string
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

@Sir-Fancy
Copy link

Sir-Fancy commented May 3, 2024

Sorry to necro this, but wanted to save others the headache. This solution only works if you do not have trailing spaces on any of your lines. If there is a trailing space somewhere, you'll see the original behavior of the string getting all messed up and "\n"s everywhere. To prevent this from accidentally occurring, you can strip them with this modification to @cjw296's code:

def str_presenter(dumper, data):
    if data.count('\n') > 0:
        data = "\n".join([line.rstrip() for line in data.splitlines()])  # Remove any trailing spaces, then put it back together again
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)

This behavior took way too long than I'd care to admit to track down.

@manugarri
Copy link

is there a plan to add this to the library, maybe as a style? its a very common use case and its a shame we have to do these hacks to get proper yaml.

@nirs
Copy link

nirs commented Aug 10, 2024

        data = "\n".join([line.rstrip() for line in data.splitlines()])  # Remove any trailing spaces, then put it back together again
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')

This converts "|" blocks to "|-". To preserve the style we can use:

        # Remove any trailing spaces messing out the output.
        block = "\n".join([line.rstrip() for line in data.splitlines()])
        if data.endswith("\n"):
            block += "\n"
        return dumper.represent_scalar("tag:yaml.org,2002:str", block, style="|")

@perlpunk
Copy link
Member

Here's a PR for discussion:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests