Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make 'Terminal.renderLazy' lazy #176

Merged
merged 5 commits into from
Jul 21, 2020
Merged

Make 'Terminal.renderLazy' lazy #176

merged 5 commits into from
Jul 21, 2020

Conversation

georgefst
Copy link
Contributor

Fixes #175.

This is the non-monadic implementation, as requested by @sjakobi, which means a bit of extra parameter passing, but no big difference.

Any suggestions on how to benchmark this?

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 1, 2020

Any suggestions on how to benchmark this?

Good question! I guess a somewhat lasting solution would be to port the large-output benchmark from prettyprinter and add some annotations. No need to include all the different layouters though. Just using the fastest one should be enough. That's probably layoutCompact.

Does that sound reasonable?

@georgefst
Copy link
Contributor Author

Does that sound reasonable?

Certainly does. I probably won't get around to it until next week though.

@georgefst
Copy link
Contributor Author

Well... We actually seem to be looking at a more than 2x speedup.

Although, the benchmark data doesn't currently actually contain any annotations. I may add some and see what happens.

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 12, 2020

Well... We actually seem to be looking at a more than 2x speedup.

Wow! Any idea why?

Although, the benchmark data doesn't currently actually contain any annotations. I may add some and see what happens.

Yes, please do!

@georgefst
Copy link
Contributor Author

Wow! Any idea why?

Not really. All I'd say is that the old implementation is in ST, seemingly in order to mirror the implementation of renderIO, which maybe introduces a significant overhead. Not sure as I've never used it, but it's certainly not the obvious approach.

@georgefst
Copy link
Contributor Author

I've played around with this a bit, and adding annotations only amplifies the difference. As does using renderStrict, for some reason, even though it's still just toStrict . renderLazy.

The question now is which benchmarks are worth keeping around. We've got a lot of choices:

  • renderStrict, renderLazy
  • comparison with ansi-wl-pprint
  • light annotations, heavy annotations, none

And using annotations also means using an algorithm other than layoutCompact - I'm not sure which would be best. (Side note - it's really not obvious from the docs that layoutCompact strips annotations.)

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 13, 2020

I've played around with this a bit, and adding annotations only amplifies the difference. As does using renderStrict, for some reason, even though it's still just toStrict . renderLazy.

Interesting. Have you looked at the Core to get more insight into the reason for the performance difference?

The question now is which benchmarks are worth keeping around. We've got a lot of choices:

  • renderStrict, renderLazy

renderLazy should be sufficient IMHO

  • comparison with ansi-wl-pprint

I don't care too much about that, although it's a nice baseline.

It might be nice to include prettyprinter's renderLazy as another baseline though.

  • light annotations, heavy annotations, none

Heavy annotations seem most interesting to me.

And using annotations also means using an algorithm other than layoutCompact - I'm not sure which would be best. (Side note - it's really not obvious from the docs that layoutCompact strips annotations.)

Oops! I didn't know that. Let's use layoutPretty then.

@georgefst
Copy link
Contributor Author

I've updated the benchmark with layoutPretty defaultLayoutOptions, and lots of annotations, using Text.renderLazy as the baseline.

Have you looked at the Core to get more insight into the reason for the performance difference?

Nope. I don't have any experience with reading Core, and personally I'm happy to just defer to the magical black box that is GHC on this one.

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 13, 2020

Nice work, @georgefst! Could you paste the benchmark results with the old and new implementations here?

Regarding correctness: I'm unsure how much we can rely on the few doctests in prettyprinter-ansi-terminal. Have you tested this code in other contexts? Would the pretty-simple tests be robust enough to detect any issues?

@georgefst
Copy link
Contributor Author

@sjakobi Well, there's:

  • the tests here
  • the tests in pretty-simple
  • all the examples I used while benchmarking
  • the fact that it was adapted quite directly from the old implementation anyway

So I'm reasonably confident.

@georgefst
Copy link
Contributor Author

Old implementation:

benchmarking prettyprinter-ansi-terminal ... took 151.4 s, total 56 iterations
benchmarked prettyprinter-ansi-terminal
time                 2.827 s    (2.543 s .. 3.045 s)
                     0.988 R²   (0.972 R² .. 0.998 R²)
mean                 2.663 s    (2.499 s .. 2.790 s)
std dev              281.3 ms   (205.9 ms .. 378.3 ms)
variance introduced by outliers: 29% (moderately inflated)

New version:

benchmarking prettyprinter-ansi-terminal ... took 65.21 s, total 56 iterations
benchmarked prettyprinter-ansi-terminal
time                 1.242 s    (1.028 s .. 1.558 s)
                     0.941 R²   (0.892 R² .. 0.991 R²)
mean                 1.202 s    (1.112 s .. 1.326 s)
std dev              182.8 ms   (114.6 ms .. 268.5 ms)
variance introduced by outliers: 48% (moderately inflated)

Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've stared at the Core a bit, but the differences are too large for a visual diff to be useful. Profiling might get us better insights into the performance differences, but I'm fine with just accepting the speedups. :)

Comment on lines 145 to 150
renderedProg = (renderLazy . layoutPretty defaultLayoutOptions { layoutPageWidth = Unbounded } . prettyProgram) prog
(progLines, progWidth) = let l = TL.lines renderedProg in (length l, maximum (map TL.length l))
putDoc ("Program size:" <+> pretty progLines <+> "lines, maximum width:" <+> pretty progWidth)

let render :: (SimpleDocStream AnsiStyle -> TL.Text) -> Program -> TL.Text
render r = r . layoutPretty defaultLayoutOptions . prettyProgram
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's better use the Unbounded PageWidth – that should be much cheaper.

Copy link
Collaborator

@sjakobi sjakobi Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why benchmark the layouting? Better limit it to the rendering, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why benchmark the layouting? Better limit it to the rendering, no?

I think we'd an instance NFData (SimpleDocStream AnsiStyle), which we can't auto-derive since we can't provide an orphan Generic for AnsiStyle, as it's opaque.

Anyway, it's done this way in the prettyprinter benchmark this was based on, perhaps for the same reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought NFData was only needed for the output, which is Text?!

Adding NFData and/or Generic instances sounds like a good idea though. Feel free to make an issue!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm misunderstanding how it works, but I thought that to accurately benchmark the rendering, we'd first have to force the input to that stage, which would be SimpleDocStream.

@georgefst
Copy link
Contributor Author

@sjakobi The review comments I haven't replied to all relate to things that were copy-pasted from the benchmark from the core library. I'm happy to implement all your suggestions, unless you can see any reason not to.

Presumably any backwards compatibility issues would be caught by CI?

@georgefst
Copy link
Contributor Author

Presumably any backwards compatibility issues would be caught by CI?

By which I mean, specifically, compiling with old GHCs.

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 14, 2020

Yeah, we build the benchmarks in a few CI jobs, and I don't really care about the compatibility of the benchmarks with older GHC versions. So if you could clean up these wibbles a bit, that would be appreciated.


@quchen Modulo the mentioned wibbles, this looks ready to go to me. Would you like to review, too?

@quchen
Copy link
Owner

quchen commented Jul 20, 2020

LGTM. The new code is much prettier and easier to understand than the ST-based version as well, and if the benchmark shows an improvement it’s a good change.

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 20, 2020

What's the status now, @georgefst? Is this ready?! :)

Have the benchmark results changed?

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 20, 2020

Oh, BTW, a question for a suspected native English speaker: Is "wibble" a proper word and appropriate where I used it here? I think I saw SPJ use it but I'm actually not sure whether I'm using it right. :)

@georgefst
Copy link
Contributor Author

What's the status now, @georgefst? Is this ready?! :)

Yes it's ready, unless we care about the one remaining review comment. I'd be happy to leave it as-is, seeing as it's no worse than the existing benchmark in the core library.

Have the benchmark results changed?

Slight speedup with -O2 - around 20% for the new implementation, about 5% with the old one.

@georgefst
Copy link
Contributor Author

georgefst commented Jul 20, 2020

wibble

I can't find any actual definition of it, but somehow I instantly knew what you meant. And I can't think of a good alternative, although "quibble" and "foible" are somewhat related. English is a silly language, and I've noticed SPJ is a fairly creative user of it.

Anyway, it amused me because it reminded me of a scene from a classic TV show, which is, as it happens, what almost all references to "wibble" on the web seem to refer to.

@sjakobi sjakobi merged commit 96ba52a into quchen:master Jul 21, 2020
@sjakobi
Copy link
Collaborator

sjakobi commented Jul 21, 2020

Thanks for all the work, @georgefst, and also the explanation for "wibble"! :)

@sjakobi
Copy link
Collaborator

sjakobi commented Jul 21, 2020

FYI regarding the release: #181

@georgefst georgefst deleted the renderLazy branch July 21, 2020 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terminal.renderLazy isn't lazy
3 participants