Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of the diff algorithm #92

Open
guibou opened this issue Sep 5, 2024 · 8 comments
Open

Performance of the diff algorithm #92

guibou opened this issue Sep 5, 2024 · 8 comments

Comments

@guibou
Copy link
Contributor

guibou commented Sep 5, 2024

In multiple cases, we observe that sydtest is hanging for a long time (multiples minutes) when the diff (generated by a failing shouldBe or golden test) is large.

The following example:

import Test.Syd
import Control.Monad
import System.Random

n = 10

main = sydTest $ do
  it "test" $ do
    numbers <- replicateM n (randomIO @Double)
    numbers' <- replicateM n (randomIO @Double)
    numbers `shouldBe` numbers'

Build with ghc -O2 Main.hs, it fails and the runtime of the test suite (mostly spent in the diff computation) varies with n:

  • n = 10: 13ms
  • n = 100: 203ms (15x with n = 10)
  • n = 1000: 29s (142x with n = 100)
  • n = 10000: (killed after 10 minutes).

Note that the diff algorithm is generating an impressive number of added/removed chunks, see:

image

In our codebase, we mitigated the issue by pre simplifying the diff. Mostly for golden tests, all of our golden tests are done on json output, so we have a "post process" in the goldenTestCompare logic which recursively replaces "big diff" nodes in the json tree by a smaller diff (e.g. for a json array of numbers, we replace the array by a string containing some statistical values about the array). However, it does not always work and users are able to break tests in ways which lead to sydtest hanging. Our CI is also hanging a lot.

A few possible action points:

a) Add a timeout on the diff algorithm in sydtest. If the diff takes more than (e.g. 100ms), just stop computing the diff and display a message. User can still do --golden-reset and use local (and possibly more efficient) diff tools. This won't fix the shouldBe unfortunately.
b) Improve the diff algorithm. Most of the time, git diff is able to diff these files in a matter of millisec. Note that having the current diff algorithm takes great care of generating the "smallest" diff possible and this usually leads to a lot of diff chunks and I suspect that this is the reason why it is slow. Unfortunately, having so much diff chunks makes also the diff someone not readable. For example, for me, it is more readable to have something like common prefix {- 1.3456 -} {+ 2.3457 +} common suffix instead of common prefix {- 1 -} {+ 2 +}.345 {- 6 -} {+ 7 +} common suffix
c) Could the diff algorithm be parametrized at runtime in sydtest configuration record?

@NorfairKing
Copy link
Owner

NorfairKing commented Sep 5, 2024

@guibou It's a bit funny that you should run into this problem because I already implemented a faster diffing algorithm specifically for sydtest: fast-myers-diff.
It looks like that wasn't fast enough yet.

  1. I like the timeout idea. We could do that irrespective of what else we do.
  2. I'm open to improvements to the diffing algorithm. I also own that code so it'll be easy to get changes merged but it's the ugliest code I've ever written so good luck.
  3. I'm happy to support other diffing algorithms too, but I'm not sure how they would work differently. They'd have to work on something other than bare Strings I guess?

EDIT: A good starting point may be to check if hspecs diffing is faster on this input and why.

@guibou
Copy link
Contributor Author

guibou commented Sep 6, 2024

Using:

import Test.Hspec
import Control.Monad
import System.Random

n = 1000

main = hspec $ do
  it "test" $ do
    numbers <- replicateM n (randomIO @Double)
    numbers' <- replicateM n (randomIO @Double)
    numbers `shouldBe` numbers'
  • n = 10: 12ms (same as sydtest, not representative)
  • n = 100: 23ms (10x faster than sydtest)
  • n = 1000: 0.5s (50x faster than sydtest)
  • n = 10000: (2s, which is faster than "killed after 10 minutes")

What I can see, with n = 10:

image

Seems that there is way less "diff chunks". Each float number is considered as a unique diff chunk, just keeping the 0. as common prefix. This behavior appears also for n = 100, 1000. for n = 10000 it is funny because colouring seems to be disabled:

image

Their diff algo is implemented here: https://github.com/hspec/hspec/blob/9db7aacb85443fefdbcb078e09df3b7bbf380d2c/hspec-core/src/Test/Hspec/Core/Formatters/Diff.hs

And actually, I just found that: hspec/hspec#415, actually, hspec had the same problem with the Diff package, tried to optimise it, then the finally implemented a timeout and the option to use external diff command.

So to summerize:

  1. let's implement the timeout (I'll take care of that whenever I find some bandwith)
  2. let's try to find if reducing the size of chunks in the diff algorithm helps. Maybe we can also just steal the Diff code from hspec (however, I have doubts about the licences compatibilities?
  3. I suppose that an external diff setting working on ByteString (e.g. BS -> BS -> IO DiffStructure or BS -> BS -> IO BS if we don't want to post process the result) could be easy to integrate in sydtest haskell configuration. on hspec they just use a command line with placeholder for files to diff. If timeout is implemented, I don't see reason to bother about an external algo actually.

@guibou
Copy link
Contributor Author

guibou commented Sep 6, 2024

Now that I'm thinking a bit more, maybe the diff algo is not at fault (you already implemented something as supposed faster than the original diff package). Maybe the issue is with the formatter of the diff chunks. Need a benchmark before starting to work on the algo.

@guibou
Copy link
Contributor Author

guibou commented Sep 6, 2024

Fast and dirt profiling (using -O2 -p on the top level program here with n = 1000) gives:

COST CENTRE     MODULE     SRC                                 %time %alloc

getEditScriptBy Myers.Diff src/Myers/Diff.hs:(148,1)-(353,34)   99.8   99.8

guibou added a commit to guibou/sydtest that referenced this issue Sep 6, 2024
But this partially fix NorfairKing#92
by introducing a timeout of 2s on the diff computatation.

The 2s is hardcoded yada yada
@NorfairKing
Copy link
Owner

@guibou I know very little about performance but could it be that getEditScriptBy could be faster if it were inlined?

guibou added a commit to guibou/sydtest that referenced this issue Sep 8, 2024
Ensures that the diff computed when `shouldBe` test are failing is not
taking more than 2s, otherwise do not display a diff.

But this partially fix NorfairKing#92
in which we observed that diff can take an arbitrary amount of time, up
to a few minutes or hours.

This implementation uses `unsafePerformIO` for simplicity, but could be
fixed in a future commit.
@guibou
Copy link
Contributor Author

guibou commented Sep 9, 2024

@NorfairKing I had not looked at getEditScriptBy until now, and yes, it could be faster if inlined. However the use sites are in the same module (in our context, we call getTextDiff which calls getEditScriptBy which is in the same module, so chances are that GHC is already inlining it.

However, now that I'm having a look at getEditScriptBy, there may be some trivial optimisations:

  • It uses MVector of Int, using unboxed vector could dramatically improve the performances (because right now, each mutation operation requires allocating / boxing then later unboxing an Int.
  • There are a few utilities functions, such as forUntilJust and whileM for which it MUST be checked that they are correctly inlined (otherwise you pay a heavy cost for >>= and pure.
  • forUntilJust is not guaranteed to not allocate, I would highly suggest to rewrite it as a function iterating over the int.
  • There are a lot of STRef and this thing is actually a nightmare. Most of the time you do not want to use STRef (in short, you want to use them in concurrent algorithm, where you really need a ref which can be shared between multiples threads), but otherwise, you want to use recursion, because STRef s Int cannot be optimised away (Well, that's complicated) and instead of optimising your code to simple unboxed integer, you need to follow the STRef pointer, read the boxed integer, unbox it, do some operation with it, and reallocate a new boxed pointer.
  • Similar observation happen with computeWhileCond which uses STRef for the Double

So actually there are pletty of room for performance improvments. This being said, I still think the timeout is important and both tasks can be tackled separately.

@NorfairKing
Copy link
Owner

@NorfairKing I had not looked at getEditScriptBy until now, and yes, it could be faster if inlined. However the use sites are in the same module (in our context, we call getTextDiff which calls getEditScriptBy which is in the same module, so chances are that GHC is already inlining it.

Alright thanks for checking!

However, now that I'm having a look at getEditScriptBy, there may be some trivial optimisations:

* It uses `MVector` of `Int`, using unboxed vector could dramatically improve the performances (because right now, each mutation operation requires allocating / boxing then later unboxing an `Int`.

* There are a few utilities functions, such as `forUntilJust` and `whileM` for which it MUST be checked that they are correctly inlined (otherwise you pay a heavy cost for `>>=` and `pure`.

* `forUntilJust` is not guaranteed to not allocate, I would highly suggest to rewrite it as a function iterating over the int.

* There are a lot of `STRef` and this thing is actually a nightmare. Most of the time you do not want to use `STRef` (in short, you want to use them in concurrent algorithm, where you really need a ref which can be shared between multiples threads), but otherwise, you want to use recursion, because `STRef s Int` cannot be optimised away (Well, that's complicated) and instead of optimising your code to simple unboxed integer, you need to follow the STRef pointer, read the boxed integer, unbox it, do some operation with it, and reallocate a new boxed pointer.

* Similar observation happen with `computeWhileCond` which uses `STRef` for the `Double`

I just literally translated the algorithm for Python.
I'm not surprised (but still impressed) that you've found room for improvement.

So actually there are pletty of room for performance improvments. This being said, I still think the timeout is important and both tasks can be tackled separately.

Hard agree!

@guibou
Copy link
Contributor Author

guibou commented Sep 9, 2024

I'm not surprised (but still impressed) that you've found room for improvement.

I have a large mouth, as usual with performance, only the benchmark can talk. I've tried to replace the boxed vector by unboxed ones and the different STRef.

I've got a 2x improvment, which is great, but not perfect. I need to profile correctly to get something more valuable, but I need to add a few SCC in the code. I'll do that later.

guibou added a commit to guibou/sydtest that referenced this issue Sep 21, 2024
Ensures that the diff computed when `shouldBe` test are failing is not
taking more than 2s, otherwise do not display a diff.

This fix NorfairKing#92
in which we observed that diff can take an arbitrary amount of time, up
to a few minutes or hours.

This implementations changes the API of a few internal functions (see
the `Changelog` entries), but no change on the "common" API.

- Diff time is taken into account in the time of the test. Note that the
  diff can be computed multiples time during flaky detection.
- This also fix an issue discussed in
  NorfairKing#80, if the values being
  diffed contains lazy exception, it won't crash the test suite anymore,
  but instead, the exception will be raised in the test context.
guibou added a commit to guibou/sydtest that referenced this issue Sep 21, 2024
Ensures that the diff computed when `shouldBe` test are failing is not
taking more than 2s, otherwise do not display a diff.

This fix NorfairKing#92
in which we observed that diff can take an arbitrary amount of time, up
to a few minutes or hours.

This implementations changes the API of a few internal functions (see
the `Changelog` entries), but no change on the "common" API.

- Diff time is taken into account in the time of the test. Note that the
  diff can be computed multiples time during flaky detection.
- This also fix an issue discussed in
  NorfairKing#80, if the values being
  diffed contains lazy exception, it won't crash the test suite anymore,
  but instead, the exception will be raised in the test context.
guibou added a commit to guibou/sydtest that referenced this issue Sep 21, 2024
Ensures that the diff computed when `shouldBe` test are failing is not
taking more than 2s, otherwise do not display a diff.

This fix NorfairKing#92
in which we observed that diff can take an arbitrary amount of time, up
to a few minutes or hours.

This implementations changes the API of a few internal functions (see
the `Changelog` entries), but no change on the "common" API.

- Diff time is taken into account in the time of the test. Note that the
  diff can be computed multiples time during flaky detection.
- This also fix an issue discussed in
  NorfairKing#80, if the values being
  diffed contains lazy exception, it won't crash the test suite anymore,
  but instead, the exception will be raised in the test context.
guibou added a commit to guibou/sydtest that referenced this issue Sep 21, 2024
Ensures that the diff computed when `shouldBe` test are failing is not
taking more than 2s, otherwise do not display a diff.

This fix NorfairKing#92
in which we observed that diff can take an arbitrary amount of time, up
to a few minutes or hours.

This implementations changes the API of a few internal functions (see
the `Changelog` entries), but no change on the "common" API.

- Diff time is taken into account in the time of the test. Note that the
  diff can be computed multiples time during flaky detection.
- This also fix an issue discussed in
  NorfairKing#80, if the values being
  diffed contains lazy exception, it won't crash the test suite anymore,
  but instead, the exception will be raised in the test context.
NorfairKing pushed a commit that referenced this issue Sep 26, 2024
Ensures that the diff computed when `shouldBe` test are failing is not
taking more than 2s, otherwise do not display a diff.

This fix #92
in which we observed that diff can take an arbitrary amount of time, up
to a few minutes or hours.

This implementations changes the API of a few internal functions (see
the `Changelog` entries), but no change on the "common" API.

- Diff time is taken into account in the time of the test. Note that the
  diff can be computed multiples time during flaky detection.
- This also fix an issue discussed in
  #80, if the values being
  diffed contains lazy exception, it won't crash the test suite anymore,
  but instead, the exception will be raised in the test context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants