Skip to content
This repository has been archived by the owner on Aug 12, 2022. It is now read-only.

Support BUGGIFY #12

Open
thisismiller opened this issue May 15, 2020 · 3 comments
Open

Support BUGGIFY #12

thisismiller opened this issue May 15, 2020 · 3 comments

Comments

@thisismiller
Copy link

Given a full simulation framework with random fault injection, it's still difficult/unlikely to recreate dangerous situations in the system under test. For example, given random packet/network failures, what's the chance of sending a commit to an exactly minimal quorum of nodes? To help simulation find correctness issues, it's productive to write simulation-only code that helps creates dangerous situations, and randomly run or skip over it in simulation. FoundationDB has the BUGGIFY set of macros for this. It's also used for randomizing flag/tuning settings, randomly restarting multi-stage workflows, or to inject random long delays in highly concurrent code.

Concretely:

if (BUGGIFY) {
  // send to minimal quorum
} else {
  // send to all
}

Will send to a minimal quorum, only when running in simulation, and only a small percentage of the time.

@davidbarsky
Copy link
Member

Thanks for commenting on this! CanBUGGIFY be seen as a general purpose failpoint-like system (e.g., https://github.com/tikv/fail-rs)?

@thisismiller
Copy link
Author

This sort of overlaps with #13 now, but failpoints seem like a great tool for unit testing failures. In the specification-based testing approach, you don't want to manually write every fail::cfg combination, because the point is to get simulation to fuzz that for you. I think BUGGIFY can be viewed as the tool with which to build the fuzzing version of failpoints. It's also possible that one could instead integrate with failpoints, so that instead of creating a failure scenario, the failpoints are triggered randomly ((fail_with_probability?).

To extend and modify their example:

fn do_fallible_work() {
    if BUGGIFY!() {
        !panic("read-dir")
    }
    let _dir: Vec<_> = std::fs::read_dir(".").unwrap().collect();
    if BUGGIFY!() {
        // Pretend that the snapshot of the filesystem state we received is potentially largely out of date with the current filesystem state.
        sleep(5).await;
    }
    // ... do some work on the directory ...
}

The first BUGGIFY!() is a pretty direct translation, which I'd hope the file/directory code would actually do for you instead. The second is more of the "actively try to encourage things that could happen, but would be rare".

@LucioFranco
Copy link
Member

 if BUGGIFY!() {
        // Pretend that the snapshot of the filesystem state we received is potentially largely out of date with the current filesystem state.
        sleep(5).await;
    }

I wonder if stuff like this we could just introduce as part of the simulation executor? Basically treat the read_dir().await as a yield point and queue the task to be run later to simulation a longer read_dir etc? We could fuzz this of course.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants