Support BUGGIFY #12

thisismiller · 2020-05-15T23:53:14Z

Given a full simulation framework with random fault injection, it's still difficult/unlikely to recreate dangerous situations in the system under test. For example, given random packet/network failures, what's the chance of sending a commit to an exactly minimal quorum of nodes? To help simulation find correctness issues, it's productive to write simulation-only code that helps creates dangerous situations, and randomly run or skip over it in simulation. FoundationDB has the BUGGIFY set of macros for this. It's also used for randomizing flag/tuning settings, randomly restarting multi-stage workflows, or to inject random long delays in highly concurrent code.

Concretely:

if (BUGGIFY) {
  // send to minimal quorum
} else {
  // send to all
}

Will send to a minimal quorum, only when running in simulation, and only a small percentage of the time.

The text was updated successfully, but these errors were encountered:

davidbarsky · 2020-05-16T01:35:55Z

Thanks for commenting on this! CanBUGGIFY be seen as a general purpose failpoint-like system (e.g., https://github.com/tikv/fail-rs)?

thisismiller · 2020-05-16T01:59:02Z

This sort of overlaps with #13 now, but failpoints seem like a great tool for unit testing failures. In the specification-based testing approach, you don't want to manually write every fail::cfg combination, because the point is to get simulation to fuzz that for you. I think BUGGIFY can be viewed as the tool with which to build the fuzzing version of failpoints. It's also possible that one could instead integrate with failpoints, so that instead of creating a failure scenario, the failpoints are triggered randomly ((fail_with_probability?).

To extend and modify their example:

fn do_fallible_work() {
    if BUGGIFY!() {
        !panic("read-dir")
    }
    let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
    if BUGGIFY!() {
        // Pretend that the snapshot of the filesystem state we received is potentially largely out of date with the current filesystem state.
        sleep(5).await;
    }
    // ... do some work on the directory ...
}

The first BUGGIFY!() is a pretty direct translation, which I'd hope the file/directory code would actually do for you instead. The second is more of the "actively try to encourage things that could happen, but would be rare".

LucioFranco · 2020-05-16T17:14:52Z

 if BUGGIFY!() {
        // Pretend that the snapshot of the filesystem state we received is potentially largely out of date with the current filesystem state.
        sleep(5).await;
    }

I wonder if stuff like this we could just introduce as part of the simulation executor? Basically treat the read_dir().await as a yield point and queue the task to be run later to simulation a longer read_dir etc? We could fuzz this of course.

mcches mentioned this issue Jan 12, 2023

Explore state exploration in turmoil tokio-rs/turmoil#75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support BUGGIFY #12

Support BUGGIFY #12

thisismiller commented May 15, 2020

davidbarsky commented May 16, 2020

thisismiller commented May 16, 2020

LucioFranco commented May 16, 2020

Support BUGGIFY #12

Support BUGGIFY #12

Comments

thisismiller commented May 15, 2020

davidbarsky commented May 16, 2020

thisismiller commented May 16, 2020

LucioFranco commented May 16, 2020