-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of the diff algorithm #92
Comments
@guibou It's a bit funny that you should run into this problem because I already implemented a faster diffing algorithm specifically for sydtest:
EDIT: A good starting point may be to check if |
Using:
What I can see, with Seems that there is way less "diff chunks". Each float number is considered as a unique diff chunk, just keeping the Their diff algo is implemented here: https://github.com/hspec/hspec/blob/9db7aacb85443fefdbcb078e09df3b7bbf380d2c/hspec-core/src/Test/Hspec/Core/Formatters/Diff.hs And actually, I just found that: hspec/hspec#415, actually, hspec had the same problem with the Diff package, tried to optimise it, then the finally implemented a timeout and the option to use external diff command. So to summerize:
|
Now that I'm thinking a bit more, maybe the diff algo is not at fault (you already implemented something as supposed faster than the original |
Fast and dirt profiling (using
|
But this partially fix NorfairKing#92 by introducing a timeout of 2s on the diff computatation. The 2s is hardcoded yada yada
@guibou I know very little about performance but could it be that |
Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. But this partially fix NorfairKing#92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementation uses `unsafePerformIO` for simplicity, but could be fixed in a future commit.
@NorfairKing I had not looked at However, now that I'm having a look at
So actually there are pletty of room for performance improvments. This being said, I still think the timeout is important and both tasks can be tackled separately. |
Alright thanks for checking!
I just literally translated the algorithm for Python.
Hard agree! |
I have a large mouth, as usual with performance, only the benchmark can talk. I've tried to replace the boxed vector by unboxed ones and the different STRef. I've got a 2x improvment, which is great, but not perfect. I need to profile correctly to get something more valuable, but I need to add a few SCC in the code. I'll do that later. |
Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix NorfairKing#92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in NorfairKing#80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.
Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix NorfairKing#92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in NorfairKing#80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.
Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix NorfairKing#92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in NorfairKing#80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.
Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix NorfairKing#92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in NorfairKing#80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.
Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix #92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in #80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.
In multiple cases, we observe that sydtest is hanging for a long time (multiples minutes) when the diff (generated by a failing
shouldBe
or golden test) is large.The following example:
Build with
ghc -O2 Main.hs
, it fails and the runtime of the test suite (mostly spent in the diff computation) varies withn
:n = 10
:13ms
n = 100
:203ms
(15x withn = 10
)n = 1000
:29s
(142x withn = 100
)n = 10000
: (killed after 10 minutes).Note that the diff algorithm is generating an impressive number of added/removed chunks, see:
In our codebase, we mitigated the issue by pre simplifying the diff. Mostly for golden tests, all of our golden tests are done on json output, so we have a "post process" in the
goldenTestCompare
logic which recursively replaces "big diff" nodes in the json tree by a smaller diff (e.g. for a json array of numbers, we replace the array by a string containing some statistical values about the array). However, it does not always work and users are able to break tests in ways which lead to sydtest hanging. Our CI is also hanging a lot.A few possible action points:
a) Add a timeout on the diff algorithm in sydtest. If the diff takes more than (e.g. 100ms), just stop computing the diff and display a message. User can still do
--golden-reset
and use local (and possibly more efficient) diff tools. This won't fix theshouldBe
unfortunately.b) Improve the diff algorithm. Most of the time,
git diff
is able to diff these files in a matter of millisec. Note that having the current diff algorithm takes great care of generating the "smallest" diff possible and this usually leads to a lot of diff chunks and I suspect that this is the reason why it is slow. Unfortunately, having so much diff chunks makes also the diff someone not readable. For example, for me, it is more readable to have something likecommon prefix {- 1.3456 -} {+ 2.3457 +} common suffix
instead ofcommon prefix {- 1 -} {+ 2 +}.345 {- 6 -} {+ 7 +} common suffix
c) Could the diff algorithm be parametrized at runtime in sydtest configuration record?
The text was updated successfully, but these errors were encountered: