Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GUID scrubber not handling GUIDs between letters and/or numbers #1316

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Crown0815
Copy link

I was using AutoFixture which produces strings consisting of "constantPrefixPlusGUID".
I was surprised that the GUID scrubber did not scrub those GUIDs and found that the algorithm to detect the GUIDs could be improved.

Please let me know if any other changes need to be made.

@SimonCropp
Copy link
Member

i understand u r generating random data. but do you actually have any real scenario where a guid would appear in text without some kind of delineation?

@Crown0815
Copy link
Author

Crown0815 commented Oct 10, 2024

@SimonCropp, our scenario is to verify the structure of classes/records which are used in persistence. For this purpose we automatically collect all of these classes and generate a random instance using AutoFixture. Our classes/records contain a substantial amount of strings. AutoFixture will set those to "propertyNameGUID" (this is simply how AutoFixture does it).

For me this is a very "real" scenario.

Did I miss something about the specification of the scrubber? I assumed, it is supposed to detect GUIDs at any place in a string? Are you worried about false positives?

@SimonCropp
Copy link
Member

i understand your use of autofixture. but does that format of a guid ever appear in your production data. if not then testing, and scrubbing, for that scenario is not a valid test

@Crown0815
Copy link
Author

Crown0815 commented Oct 12, 2024

TLDR; No, the format of unseparated GUIDs does not appear in our production data. But the test I am working on is still a valid test.

—-

The purpose of the test is to document/verify the structure of a generic example (I.e. with randomized strings, and other randomized values) of each of our persistable classes/records as it is written by a 3rd party library. For this we use our full persistence system and then read the raw entries from the database. These read entries are then given to Verifier.Verify.

The most efficient way to generate the generic examples is the usage of AutoFixture. AutoFixture does not add any separator, but it otherwise produces perfect test cases, because it takes property names into account. For example we have Dictionary<string, string>, for which keys have to be unique but deterministic. Using AutoFixture plus the (corrected) GUID-Scrubber is perfect for this purpose.

Alternatively, we could of course scan the strings beforehand for GUIDs and replace them ourselves, but I thought that is exactly what the scrubber is for.

I am probably missing something, but why does it matter what symbols the GUIDs are contained in? The identification of the GUIDs is perfectly deterministic due to the identification by the ‚-‚. There is virtually no chance of a false positive and the synergy with AutoFixture allows for amazing usage of AutoFixture and Verfier.Verfiy in Architecture tests (like ours).

If we would create some data which would use a letter to separate the GUIDs, would that be a more valid usecase?

Again, I am sure I am missing something, so I am curious to know: in which situation would one have a GUIDs within a string framed with letters/numbers and not want it scrubbed ? Especially since the GUID scrubber is opt in.

To be honest, I was very confused when the scrubber did not detect the GUIDs in our example data. That is why two of us went looking and thought we discovered a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants