Sydtest exception handling: crash / lock / ... #80

guibou · 2024-07-12T14:44:30Z

The following test suite:

  describe "behavior with exceptions" $ do
      beforeAll (fail "error in beforeAll") $ do
        it "should not kill the test suit" $ do
          1 `shouldBe` 1

Will crash the complete test suite:

sydtest-test: user error (error in beforeAll)

Replace beforeAll by before and you'll get:

Tests:

Test.Syd.AroundSpec
  behavior with exceptions
    before_
      ✗ should not kill the test suit                                      0.04 ms
        Retries: 3 (does not look flaky)
 
Failures:

    test/Test/Syd/AroundSpec.hs:139
  ✗ 1 Test.Syd.AroundSpec.behavior with exceptions.before_.should not kill the test suit
      Retries: 3 (does not look flaky)
      user error (error in beforeAll)
  
 
  Examples:                     3
  Passed:                       0
  Failed:                       1
  Sum of test runtimes:         0.00 seconds
  Test suite took:              0.00 seconds

Using the following changes:

diff --git a/sydtest/src/Test/Syd/Runner/Asynchronous.hs b/sydtest/src/Test/Syd/Runner/Asynchronous.hs
index 413c354..4a9c436 100644
--- a/sydtest/src/Test/Syd/Runner/Asynchronous.hs
+++ b/sydtest/src/Test/Syd/Runner/Asynchronous.hs
@@ -37,6 +37,7 @@ import Test.Syd.Runner.Single
 import Test.Syd.SpecDef
 import Test.Syd.SpecForest
 import Text.Colour
+import Debug.Trace (traceShow)
 
 runSpecForestAsynchronously :: Settings -> Word -> TestForest '[] () -> IO ResultForest
 runSpecForestAsynchronously settings nbThreads testForest = do
@@ -236,15 +237,21 @@ runner settings nbThreads failFastVar handleForest = do
           DefSetupNode func sdf -> do
             liftIO func
             goForest sdf
-          DefBeforeAllNode func sdf -> do
-            b <- liftIO func
+          DefBeforeAllNode func sdf -> traceShow "Asynchronous" $ do
+            bM <- liftIO $ try $ func
+            let b = case bM of 
+                       Right b -> b
+                       Left (e :: SomeException) -> error (show e)
             withReaderT
               (\e -> e {eExternalResources = HCons b (eExternalResources e)})
               (goForest sdf)
           DefBeforeAllWithNode func sdf -> do
             e <- ask
             let HCons x _ = eExternalResources e
-            b <- liftIO $ func x
+            bM <- liftIO $ try $ func x
+            let b = case bM of 
+                       Right b -> b
+                       Left (e :: SomeException) -> error (show e)
             liftIO $
               runReaderT
                 (goForest sdf)

The following test suites now pass:

  describe "behavior with exceptions" $ do
    describe "beforeAll" $ do
      beforeAll (fail "error in beforeAll") $ do
        it "should not kill the test suit" $ do
          1 `shouldBe` 1
      beforeAll (fail "error in beforeAll") $ do
        itWithOuter "should not kill the test suit" $ \outer ->
          outer `shouldBe` 1
        beforeAllWith (\x -> fail "error in beforeAllWith") $ do
          itWithOuter "should not kill the test suit" $ \outer ->
             outer `shouldBe` 1

With the following result:

Test.Syd.AroundSpec
  behavior with exceptions
    beforeAll
      ✓ should not kill the test suit                                      0.00 ms
      ✗ should not kill the test suit                                      0.02 ms
        Retries: 3 (does not look flaky)
      ✗ should not kill the test suit                                      0.00 ms
        Retries: 3 (does not look flaky)
 
Failures:

    test/Test/Syd/AroundSpec.hs:142
  ✗ 1 Test.Syd.AroundSpec.behavior with exceptions.beforeAll.should not kill the test suit
      Retries: 3 (does not look flaky)
      user error (error in beforeAll)
      CallStack (from HasCallStack):
        error, called at src/Test/Syd/Runner/Asynchronous.hs:244:53 in sydtest-0.15.1.1-inplace:Test.Syd.Runner.Asynchronous
  
    test/Test/Syd/AroundSpec.hs:145
  ✗ 2 Test.Syd.AroundSpec.behavior with exceptions.beforeAll.should not kill the test suit
      Retries: 3 (does not look flaky)
      user error (error in beforeAllWith)
      CallStack (from HasCallStack):
        error, called at src/Test/Syd/Runner/Asynchronous.hs:254:53 in sydtest-0.15.1.1-inplace:Test.Syd.Runner.Asynchronous
  
 
  Examples:                     7
  Passed:                       1
  Failed:                       2
  Sum of test runtimes:         0.00 seconds
  Test suite took:              0.00 seconds

A few notes:

My understanding of the behavior between before and beforeAll is that the code for before is executed during the it execution, when the code for beforeAll is executed during the tree walking.
I've found a few places where similar changes shoud have to be implemented:

	modified:   src/Test/Syd/Runner/Asynchronous.hs
	modified:   src/Test/Syd/Runner/Synchronous/Interleaved.hs
	modified:   src/Test/Syd/Runner/Synchronous/Separate.hs

The idea of the change is to sneak the exception lazyly in the ressource. Another option could be to, in case of exception, just walk the tree and report failures.
Async exception should be handled instead of SomeException for proper excption handling.

So my question is:

Are you interested by an MR doing this change on the three runners
Or is this completly stupid and I miss the obvious point.

Thank you.

The text was updated successfully, but these errors were encountered:

NorfairKing · 2024-07-12T14:52:39Z

Wow thank you for the detailed report!

So my question is:

Are you interested by an MR doing this change on the three runners

Or is this completly stupid and I miss the obvious point.

I think you got that exactly right.
I haven't used modifiers that crash much so I hadn't run into this.

Yes, PR please.

guibou · 2024-08-13T12:36:53Z

Additionally, I observed the following behaviors related to exceptions:

If a pure exception is hidden inside a value compared with shouldBe (or equivalent), it will crash the test report. (e.g. shouldBe (error "cheval") 10 fails during the test, and when sydtest displays the final report, it crashs when it tries to display the diff between 10 and the error "cheval". (Note: now that I'm reporting that, I'm having issues reproducing it, I'll try to find a proper repro).
sydtest seems to ignore most async exception. Especially, ctrl-C during a test suite is not an instant feedbacks and instead it can retry canceled tests and then, finally, stops.

E.g. the folliwng testsuite:

import Test.Syd
import Control.Concurrent
import Control.Exception

main = sydTest $ do
    it "is slow" $ do
      print "is slow"
      e <- try $ threadDelay 10_000_000
      print "end of delay"
      print (e :: Either SomeException ())
      case e of
        Left e -> throwIO e
        Right () -> pure ()
      pure ()

The test is supposed to wait for 10s. If C-c is sent once, it just restart the test:

./SydException +RTS -N         
Tests:"is slow"


^C"end of delay"
Left AsyncCancelled
"is slow"
"end of delay"
Right ()

However, if ctrl-c is sent twice, the runtime terminates the program without taking any exception handler into account:

./SydException +RTS -N
"is slow"
Tests:

^C"end of delay"
Left AsyncCancelled
"is slow"
^C

(That's a documented GHC rts behavior, but I'm unable to find the documentation).

I suppose that sydtest is catching ALL possible exception and retries. From my PoV, it should ignore the asyrc ones and terminate the test runner immediatly.

Note that it is surprising that if the test suite survives one retry, it does not display the final results:

./SydException +RTS -N         
"is slow"
Tests:

^C"end of delay"
Left AsyncCancelled
"is slow"
<end of command, no feedback>

NorfairKing · 2024-08-13T12:48:45Z

Wow cool finds! The retry logic was added after the async exception handling so I'm not surprised it doesn't work well together.

guibou · 2024-08-13T13:05:58Z

I found the problem with shouldBe.

import Test.Syd

data Cheval = Cheval String deriving (Eq)

instance Show Cheval where
  show x = error "NO"

main = sydTest $ do
  it "fails" $ do
    Cheval "troll" `shouldBe` Cheval "troll2"

Here, the shouldBe will fail because values are different, but the Show instance will raises an exception and crash the testsuite during the final report:

./SydException +RTS -N         
Tests:

✗ fails                                                                    0.13 ms
  Retries: 3 (does not look flaky)
 
Failures:

    SydException.hs:13
  ✗ 1 fails
      Retries: 3 (does not look flaky)
SydException: NO
CallStack (from HasCallStack):
  error, called at SydException.hs:10:12 in main:Main

(No final report).

guibou · 2024-08-13T13:09:34Z

So, my plans in a MR about exceptions will be to:

a) Add an exception handler when sydtest displays the result of the tests and uses values from the "user side", which may contain hidden exceptions
b) Add lazyness on before so exception will be hidden inside the resource and will be raised during usage in the different tests
b') Maybe investigate another implementation which just disable the complete subtree of test if a before fails. Maybe cleaner than storing exception in a pure value.
c) check the different exception handler (especially the ones which recover, such as try) for correct async behavior.

guibou · 2024-08-13T14:48:36Z

I've wrote the following test suite:

{-# LANGUAGE NumericUnderscores #-}
module Test.Syd.ExceptionSpec where

import Test.Syd
import Test.Syd.OptParse (defaultSettings)
import Data.Time.Clock
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async

runAndCheckExceptions spec = do
  resultForest <- sydTestResult defaultSettings spec
  pure resultForest

data Cheval = Cheval String deriving (Eq)

instance Show Cheval where
  show x = error "NO"

spec = describe "exception handling" $ do
  it "works with exception in before" $ do
    res <- runAndCheckExceptions $ describe "behavior with exceptions" $ do
      -- This error in before all should appears as an error in the final
      -- report and not a crash of the test suite.
      beforeAll (fail "error in before all") $ do
        it "should not kill the test suit" $ do
          1 `shouldBe` 1

    pure ()
  it "works with exception in shouldBe" $ do
    res <- runAndCheckExceptions $ do
      describe "behavior with exceptions" $ do
        it "should not kill the test suit" $ do
          -- This should be won't raise the exception during comparison
          -- (because `==` is fine). However, the `Show` instance for `Cheval`
          -- is broken.
          --
          -- sydtest will try to display the actual/expected diff using the
          -- show instance, and should not crash.
          Cheval "troll" `shouldBe` Cheval "troll2"

    pure ()

  it "stops immediatly with async" $ do
    -- Tests that when an async exception is sent to sydTest (for example,
    -- ctrl-c), it behaves as expected and terminates immediately and does not
    -- do anything surprising.
    startTime <- liftIO $ getCurrentTime

    -- Runs two threads, one will be done in 1s, and the other is a test suite
    -- with one test which should run for 10s.
    -- When the first threads terminate, it will throw AsyncCancel in the
    -- sydTest test suite, which should terminate asap.
    -- This will be checked using the timer t. That's fragile, but we know that:
    --
    -- t should be more than 1s (because of the first threadDelay)
    -- t should be no much more than 1s. Especially, it should not be 10s
    -- (waiting for the complete threadDelay in the test suite) or even more if
    -- the exception is completly ignored (or test is retried)
    race (threadDelay 1_000_000) $ do
      sydTest $ do
          it "is slow" $ do
            threadDelay 10_000_000

    endTime <- liftIO $ getCurrentTime

    liftIO $ print $ startTime `diffUTCTime` endTime
    pure ()

Which shows the problems and should hopefully be fine once the problems are fixed.

Note that I also observed that:

async exception could also results in sydtest-test: thread blocked indefinitely in an MVar operation errors. My understanding is as follows: when an async exception is raised in the main thread, it stops the concurrencly call in Test.Syd.Runner.Asychronous, actually, concurrently raises the exception in both sub threads: the test runner and the print loop. The printLoop will simply die, and the exception will be rethrown in withJobQueueWorker which will be rethrown in the N workers. However, a worker is possibly currently running runNow on the test, and the async exception is stored in the result, so the test is retried. If a second ctrl-c is raised, the RTS kills everything. If not, the test will eventually terminate and the main runner should also terminate (but way slower, because it does only have a few remaining runners ;), and the test status won't appear anymore.

However, I'm surprised, because tests are run in applyWrapper2, which uses catches with the following handlers:

exceptionHandlers :: [Handler (Either SomeException a)]
exceptionHandlers =
  [ -- Re-throw AsyncException, otherwise execution will not terminate on SIGINT (ctrl-c).
    Handler (\e -> throwIO (e :: AsyncException)),
    -- Catch all the rest
    Handler (\e -> return $ Left (e :: SomeException))
  ]

So as a first observation, it seems that the problem was already anticipated and taken care of. I need to read the precise spec of catches now to understand what is happening.

I observed segmentation fault when designing the second test in the list (the one where Show raises an exception).

guibou · 2024-08-13T15:20:36Z

@NorfairKing

AsyncException is a class of exception which does NOT contain all asynchroneous exception. We need to use SomeAsyncException (or a wrapper library, such as https://hackage.haskell.org/package/safe-exceptions, which takes care of that for us)

This fix is trivial, AsyncException -> SomeAsyncException. MR pending.

NorfairKing · 2024-08-13T22:02:23Z

@guibou I'm impressed that you're run into so many of these issues already. I'm open to any improvements, especially when you have such a nice test suite along with them.

Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix NorfairKing#92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in NorfairKing#80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.

This is related to NorfairKing#80 The current exception handler for test was incorrectly catching async exception (the handler for `AsyncException` was not enough, the "parent" of the async exception hierarchy is `SomeAsyncException`). This lead to surprising behaviors with ctrl-C. when receiving a `ctrl-c` (e.g. an interruption of the test-suite by the user), the test runner was cancelled, cancelling the print loop of the test and cancelling all workers. However, the exception was caught in workers as a "test-failure", leading to it being ignored and the different tests retrieved and meaning that the test runner survived the interruption. They will hence continue their life (e.g. purging the job queue). It was sometime leading to a lot of delay before ending the process, or even sometime other errors related to the different involved `MVar` (when no producer / consumer detection detects the error. Note that a second ctrl-c would cancels the program execution completely (Because that's how the RTS behaves). HOWEVER, the behavior is completely different in the repl, because the repl won't terminate itself on second ctrl-C. Hence the worker threads are leaked and continues running, and even survives the repl reloads. In the best case, that's a waste of resources and CPU time, but if you happen to rerun new sydtest, it can stacks and some tests may run concurrently with the "ghost" version of themself, which may or may not work ;)

This is related to #80 The current exception handler for test was incorrectly catching async exception (the handler for `AsyncException` was not enough, the "parent" of the async exception hierarchy is `SomeAsyncException`). This lead to surprising behaviors with ctrl-C. when receiving a `ctrl-c` (e.g. an interruption of the test-suite by the user), the test runner was cancelled, cancelling the print loop of the test and cancelling all workers. However, the exception was caught in workers as a "test-failure", leading to it being ignored and the different tests retrieved and meaning that the test runner survived the interruption. They will hence continue their life (e.g. purging the job queue). It was sometime leading to a lot of delay before ending the process, or even sometime other errors related to the different involved `MVar` (when no producer / consumer detection detects the error. Note that a second ctrl-c would cancels the program execution completely (Because that's how the RTS behaves). HOWEVER, the behavior is completely different in the repl, because the repl won't terminate itself on second ctrl-C. Hence the worker threads are leaked and continues running, and even survives the repl reloads. In the best case, that's a waste of resources and CPU time, but if you happen to rerun new sydtest, it can stacks and some tests may run concurrently with the "ghost" version of themself, which may or may not work ;)

Ensures that the diff computed when `shouldBe` test are failing is not taking more than 2s, otherwise do not display a diff. This fix #92 in which we observed that diff can take an arbitrary amount of time, up to a few minutes or hours. This implementations changes the API of a few internal functions (see the `Changelog` entries), but no change on the "common" API. - Diff time is taken into account in the time of the test. Note that the diff can be computed multiples time during flaky detection. - This also fix an issue discussed in #80, if the values being diffed contains lazy exception, it won't crash the test suite anymore, but instead, the exception will be raised in the test context.

guibou changed the title ~~Exception in beforeAll crashs the complete test suite~~ Sydtest exception handling: crash / lock / ... Aug 13, 2024

guibou mentioned this issue Aug 14, 2024

Multiples sydTest segfault in parallel #91

Open

guibou mentioned this issue Sep 21, 2024

Timeout of diff algrorithm #94

Closed

guibou mentioned this issue Sep 22, 2024

feat: do not catch async exception as test failure #96

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sydtest exception handling: crash / lock / ... #80

Sydtest exception handling: crash / lock / ... #80

guibou commented Jul 12, 2024

NorfairKing commented Jul 12, 2024

guibou commented Aug 13, 2024

NorfairKing commented Aug 13, 2024

guibou commented Aug 13, 2024

guibou commented Aug 13, 2024

guibou commented Aug 13, 2024

guibou commented Aug 13, 2024

NorfairKing commented Aug 13, 2024

Sydtest exception handling: crash / lock / ... #80

Sydtest exception handling: crash / lock / ... #80

Comments

guibou commented Jul 12, 2024

NorfairKing commented Jul 12, 2024

guibou commented Aug 13, 2024

NorfairKing commented Aug 13, 2024

guibou commented Aug 13, 2024

guibou commented Aug 13, 2024

guibou commented Aug 13, 2024

guibou commented Aug 13, 2024

NorfairKing commented Aug 13, 2024