-
Notifications
You must be signed in to change notification settings - Fork 9
Create a standard API / specification for testing workloads #13
Comments
I was considering sketching out something closer to the Simnet approach for this, where we’d generate actions that’d be executed by the simulator. The actions could be API calls or things like network partitions failures, timesouts, or delays. I think something like Blip implements something similar. What do you think? |
I think the best answer I can give is "that might work". It's far enough away from FoundationDB's model that I can't comment strongly on if it would work or not, but I'm also skeptical that it would work well when extended to a full system. I think I understand the appeal, because it also means that one could try to minimize test failures. The allocator fuzzing is a simple example, and I think the technique applies well there, specifically when it's easy to define an interpreter because each action is atomic and synchronous. For a full distributed system, it's not immediately clear to me how to extend that model to encompass asynchronous and concurrent operations, multi-stage work, and fault injection therein. If you do manage to work out an design that resolves these potential issues, you'll still need to also find a good way to separately specify a set of work and verification actions that you can compose together in a test. To give things a name, I'll call the design of the allocator example the "interpreter-style". Asynchronous operationsAs a concrete example, let's use FoundationDB's However, there's no concurrency provided in this do-one-operation-and-wait-for-it-to-complete. Hopefully, I would be able to verify that if multiple increments are performed concurrently, then the result is still correct. The Multi-stage workFor some tests, the interpreter operation that would be defined for some tests would be monolithic. To use backup tests in FoundationDB as an example, the "start a backup" flow involves multiple serial transactions, with each transaction being its own meaty chunk of logic. This code is Even operations that feel simple and atomic at a first glance might not be so when one examines the details. The Fault injectionWith the goal of also having faults driven from the interpreter, I think it's harder to describe when and where to inject a fault. Considering the cases of a read above, how do I describe when to drop a packet? The amount of data sent/received by a single get request can vary wildly. If I drop the first packet after a To return to Millions of Tiny Databases, Physalia is described as being tested with fault injection explicitly interwoven in the execution of the consensus group. This doesn't seem to give me a large hint on how to apply it to higher levels of abstraction. Being able to predict the work performed from a single stage of a consensus protocol on one node does seem easier than predicting the work done by one FoundationDB client for a read. Maybe there's a good answer that I'm missing, as there are some downsides to just blindly dropping N% of packets as well (N has to be kept very small if any component exists such that packet loss causes a large amount of work). Composition
It's important to note that this is a workload that will pass, given any prefix of the commits done to the database.
Individually, Cycle verifies properties about database consistency, BackupToDBCorrectness verifies things about the Disaster Recovery API. If you run both of them at the same time, then you can verify that the DR keeps data consistency. Modular specification testing means more coverage with less work. In this sense, you could look at Is there a point to having different tests? Why not just always run all verification work with all faults? Well, sometimes they conflict. Let's look at some antagonistic workloads:
Unfortunately, these two can't be combined (easily). This mix-and-match of correctness checking and antagonistic workloads lets one have a wide variety of simulation test configurations to uncover different sorts of bugs, and it's a thing I'd like to see kept in whatever the API is for performing test work in a simulation framework. |
Once a simulation framework exists, then next question is what to do with it. FoundationDB coupled the randomized fault injection with specification based testing to obtain a high level of coverage.
Specifically, there is a workload interface, which "tester" actors run one or more workloads in parallel. Workloads are either antagonistic, e.g. kill N processes during the test, or a specification, e.g. backup and restore some data and assert that they're the same. Orthogonal workloads can be combined in the same test to test different failure scenarios or types of work in the same test.
There is an alternative direction available, which is to allow an easy way to specify the exact fault injections that will happen in a test. Millions of Tiny Databases went in this direction instead, which allowed them to generate a comprehensive set of simulated tests, model checking style.
The text was updated successfully, but these errors were encountered: