`Generator::seed` in `Generator::__destruct` may break determinism #870

themasch · 2024-03-13T11:55:51Z

Summary

TL;DR:

We observed that, when re-creating new Faker\Generators between test cases, a "random" PHP garbage collector run happening on object allocation triggers Faker\Generator::__destruct which changes the mt_rand seed, breaking further fixture generation in a non-deterministic way.

Context

I am not sure if I would consider this an actual bug of Faker. It's more that the specific use case makes it break in some (rare) conditions. We do not use Faker itself directly, but through the symfony integration of Alice.

We use it to generate test data for phpunit test. Alice populates our test database, using faker to fill in fields that need data, but not predefined specific values. Works great for snapshot test and thelike.

The Problem

With one test, we ran into an issue where the test would sometimes, on some systems (mainly CI) fail due to Faker generating different data for that run. Not always the "same different" data, not always the exact same test case, and not on every run.

Setup

The way its setup looks like this:

For each test case, we "boot" a fresh Symfony stack, recreating all services. In the test configuration, Faker\Generator is a service in that stack, used by Alice to populate the test data.

So on each test, we create all these instances, including a Faker\Generator, which also gets a predefined seed,
use Alice to read our fixture definitions and create objects from it, using Faker to populate properties,
start a fresh db transaction, insert that data into our test db, run tests against that DB, and compare the result with snapshots.
Then we rollback the whole transaction, teardown the symfony stack, and start the next test with a fresh stack, (that calls seed again) and a fresh clean DB.

probable root cause

After some debugging we found that sometimes Faker\Generator::__destruct is called while Alice creates the test data, on random object creations. I think that this is the PHP GC kicking in, claiming the Faker\Generator instance used in the previous test run.

This would also fit on the behaviour seen in #272

It also explains why we only see these problems in some cases, only on some hosts, and never when just running one single test alone.

Building a minimal test case I could reproduce that problem, and commenting out the seed() call in the __destruct also "fixed" the issue.

Conclusion

Our current workaround is to call gc_collect_cycles in the phpunit setUp method, before starting the new Symfony stack. This, as expected triggers Faker\Generator::__destruct and thereby seed before the new instance is created and sets the expected seed.

From our observation, that seems to have fixed our breaking tests.

But I am not sure if that really what we want to do here, and in all tests that use Alice/Faker.
The reason these exact test cases failed, and all others using the same mechanism did not, I can only guess.
Most likely it is just the amount of data. We usually try to limit test data to a minimum, so do not created tens or hundreds of entities with it.
For the new one, we required more data, and therefor I figure we just pushed it "over the edge" where this test case would have a considerably higher chance of triggering a GC while building fixtures than every other test.

So for me, there are a few different ways this can be resolved:

either add gc_collect_cycles to every test to be run before building the new stack, or even add it to the kernel itself.
or use Faker differently, preventing multiple instances of if in the first place. Not trivial with the way symfony works in the test cases, and not sure if I like all implications of that.
do not seed on __destruct. Breaking change, not sure about all implications of that, seems like rats nest of problems.
?? maybe we are holding something very wrong and all of this is just me being stupid? Always an option!

I file the issue here, and not in on of the upstream projects, because I have to start somewhere and Faker seemed like the correct place.

Versions

	Version
PHP	>8.2.9 (also noticed with 8.2.16 and 8.3.2)
`fakerphp/faker`	1.23.1

Self-enclosed code snippet for reproduction

<?php

declare(strict_types=1);

require_once __DIR__ . '/vendor/autoload.php';

$pn1 = null;
$pn2 = null;

for ($i = 0; $i < 3; $i++) {
    $faker = Faker\Factory::create('en_US');
    $faker->seed(1234567);

    $n1 = $faker->numberBetween(1000, 10000);
    if ($pn1 !== null && $pn1 !== $n1) {
        echo sprintf('DELTA in N1: previous: %d, this time: %d, iteration: %d', $pn1, $n1, $i) . PHP_EOL;
    }
    $pn1 = $n1;

    gc_collect_cycles(); // force GC run

    $n2 = $faker->numberBetween(1000, 10000);
    if ($pn2 !== null && $pn2 !== $n2) {
        echo sprintf('DELTA in N2: previous: %d, this time: %d, iteration: %d', $pn2, $n2, $i) . PHP_EOL;
    }
    $pn2 = $n2;

    var_dump([$n1, $n2]);
}

I use gc_collect_cycles to urge PHP to collect the dangling Generator instance here. In reality, this happens "somewhere" down the line as more and more new object are allocated.

Expected output (with PHP 8.3 and thus MT_RAND_MT19937)

array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}

Actual output

array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}
DELTA in N2: previous: 5714, this time: 3052, iteration: 1
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(3052)
}
DELTA in N2: previous: 3052, this time: 2717, iteration: 2
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(2717)
}

The text was updated successfully, but these errors were encountered:

coajaxial · 2024-03-19T13:30:32Z

I have the exact same problem when using zenstruck/foundry! When creating around 50+ Fixtures using a createMany() call, the seeding started to break. It took me a while to find out it was the __destruct call from some previous Generator instances, that get created by either booting the kernel more then one time in a single test, but also because Foundry creates some Generator instances on the fly (I didn't investigate why). Thanks for the detailed bug report and I hope this will get fixed soon :)

themasch linked a pull request Mar 18, 2024 that will close this issue

only set a new random seed if _the last_ Generator gets destruct'ed #871

Open

7 tasks

jdecool mentioned this issue Jun 20, 2024

Is it maintained ? #887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Generator::seed` in `Generator::__destruct` may break determinism #870

`Generator::seed` in `Generator::__destruct` may break determinism #870

themasch commented Mar 13, 2024

coajaxial commented Mar 19, 2024

Generator::seed in Generator::__destruct may break determinism #870

Generator::seed in Generator::__destruct may break determinism #870

Comments

themasch commented Mar 13, 2024

Summary

TL;DR:

Context

The Problem

Setup

probable root cause

Conclusion

Versions

Self-enclosed code snippet for reproduction

Expected output (with PHP 8.3 and thus MT_RAND_MT19937)

Actual output

coajaxial commented Mar 19, 2024

`Generator::seed` in `Generator::__destruct` may break determinism #870

`Generator::seed` in `Generator::__destruct` may break determinism #870