Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature srandom - A seeded random number #2252

Closed
wants to merge 1 commit into from

Conversation

holroy
Copy link
Contributor

@holroy holroy commented Feb 28, 2024

A function generating seeded random numbers, allowing for sorting lists in a random, but predictively fashion.

This function allows for queries to return random items when combined with sorting on the random number, and using LIMIT to get the number of random results you want. Due to the seed, which could be a date like YYYY-MM-DD, the random result will be consistent as long as the source list or the date actually changes.

A simple example query to get three random links:

```dataview
LIST FLATTEN srandom(dateformat(date(today), "yyyy-MM-dd")) as randomValue
SORT randomValue
LIMIT 3
```

I originally hoped to put the expression directly into SORT ... but it requires a field, so it seems I have to use the intermediate FLATTEN ... as randomValue to get it to be that field. But it do work consistently.

A function generating seeded random numbers, allowing for sorting lists in a random, but predictively fashion.
@blacksmithgu
Copy link
Owner

I like this idea but I worry that slight changes to the execution order of function calls (such as more calls from more data, or maybe a patch in dataview shifting around execution order when executing queries) will make this frustrating to use.

As an alternative, maybe just a random hash function? One that takes hash(<seed>, <value>) and produces a pseudo-random value for it. To use it, you would probably provide file names and dates, like hash(date(today), file.name) to get a consistent value you could then sort on.

@GottZ
Copy link
Contributor

GottZ commented Mar 17, 2024

hm. I consider these algo's to be too costly for generating pseudo randomness.
Especially since we are talking about JavaScript here.
cyrb etc. should only be used for seed derivation as it's quite slow, while something like the following or sfc32 can then take over:

just for reference, i've done some tech demos before, that are basically based on this prandom gen concept:

const generator = (seed) => {
  const func = () => {
    seed *= 1103515245 + 12345;
    seed--;
    return (seed %= Number.MAX_SAFE_INTEGER) / Number.MAX_SAFE_INTEGER;
  };
  func();
  return func;
}

const rand = generator(1337);
console.log(rand(), rand(), rand()); // 0.37070590492482164 0.860457525371235 0.2702827315720024

I like this idea but I worry that slight changes to the execution order of function calls (such as more calls from more data, or maybe a patch in dataview shifting around execution order when executing queries) will make this frustrating to use.

@blacksmithgu well.. that's literally the functionality pseudo randomness guarantees. it's a feature.
one can obviously branch a new generator off of ones output, to create a separate path in parallel.

@holroy
Copy link
Contributor Author

holroy commented Mar 17, 2024

I'm not quite sure I understand your concern regarding the execution order due to this being a random function. But then again the random number generated for a given row will change if the execution order for the rows change.

On the other hand, using just a hash function like the cyrb128 which I've already implemented could simplify the logic so I wouldn't need to tap into the evaluation context and cache the generator function.

Should we close this PR, and create another for that hash implementation and documentation of it?

@GottZ
Copy link
Contributor

GottZ commented Mar 17, 2024

hm. technically @blacksmithgu has a point tho.

list(srandom("2024-02-28"), srandom("2024-02-28"), srandom("2024-02-28")) would indeed output different values on each call.

but.. since we are talking about dql here, how else could it be implemented then?
people obviously don't want to type boilerplate in short queries, just to have a reference to a prand instance like in my example.

@holroy
Copy link
Contributor Author

holroy commented Mar 17, 2024

list(srandom("2024-02-28"), srandom("2024-02-28"), srandom("2024-02-28")) would indeed output different values on each call.

How did you get that to return different values? In all of my tests, and even the unit tests has a very similar setup where it consistently returned the same values over and over again.

@holroy
Copy link
Contributor Author

holroy commented Mar 17, 2024

I'm going to build another PR using just a hash function, and it's going to be a slightly simpler version of the cyrb128. And even though we don't need a seeded random number, but can use a random hash, I still believe we need to use some unique text to be the base of the hash, and not a number.

I did the following https://jsperf.app/bowuze to compare some variation of a hash function. Feel free to extend with some other variants, but as it stands currently the cyrb53 seems to be the fastest variant.

If (or when) the other PR is created, and no-one opposes to it this PR can be closed down, and the other one integrated into the master. :-D

@GottZ
Copy link
Contributor

GottZ commented Mar 17, 2024

How did you get that to return different values? In all of my tests, and even the unit tests has a very similar setup where it consistently returned the same values over and over again.

na don't worry about it. I wasn't talking about true randomness. I was just referring to the way this keeps track of nounces

@blacksmithgu
Copy link
Owner

I've merged the related hash() function - I'm not caught up on the discussion here, but are we good to close this one out?

@holroy
Copy link
Contributor Author

holroy commented Mar 21, 2024

The linked PR has been pushed into master, with similar, but simpler, logic, so we can then close down this PR.

@holroy holroy closed this Mar 21, 2024
@holroy holroy deleted the feature-seeded-random branch March 21, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants