Implement Asimov dataset creation for unbinned models #576

ikrommyd · 2024-07-24T21:23:30Z

There is also a way to create Asimov datasets from unbinned models either by:

Making a binning on the spot
Generating it from many weighted unbinned events.
Check the combine docs for more info: https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/part3/runningthetool/#toy-data-generation

jonas-eschle · 2024-07-25T16:33:43Z

That's an interesting issue, because in hepstats, this creates currently a binned asymov set, also for unbinned data, which is not optimal and should not happen. But it could be a very high stats dataset, that could be a possibility.

The question is a bit conceptually, what is needed?
Creating a binned is probably already possible with the to_binned and to_binneddata (I think, right?)

What's the weighted unbinned events, why weighted? Not sure about where the weigths are coming from.

And it reminds me of another discussion about the "best binning", as we're doing a lot of unbinned fits in LHCb that could, in prinziple, be binned. So implementing something like this https://arxiv.org/abs/2210.02848 could be useful.

I guess things are currently possible already to do, hepstats should have an automatic binning, or zfit itself.
And modulo that it isn't as easy accessible to the user as it maybe should be. It is, but more in a way of how to communicate this to the user?

ikrommyd · 2024-07-26T15:12:59Z

I was just looking at the zfit code (not hepstats). Cool so maybe a shortcut of to_binned -> to_binneddata would be nice and easy to have for unbinned models. For the weighted unbinned events, look at the "Pseudo-Asimov" dataset of the combine docs I linked above.

When it comes to visibility to, If I search the code for the word "Asimov", I would find to_binneddata. However If I didn't search like that, I would expect something like model.create_asimov. Or even as part of the sampler. Since we do sampler = model.create_sampler() to generate toys, we could have a sampler method other than resample that is make_asimov or something like that.

jonas-eschle · 2024-08-02T14:41:21Z

I would expect something like model.create_asimov

I think this is a crucial difference between having a nicely named API and good enough docs: the problem with adding this is that the expectations may be different. Should it be binned, unbinned? But what is more crucial is to have something where this is explained I think

How did you come across "asimov", just to collect a bit of data?

And agree, the to_binneddata as a shortcut wouldn't harm!

ikrommyd added the discussion For discussions on concepts and ideas label Jul 24, 2024

ikrommyd changed the title ~~Implement Asimov dataset creation for unbinned models as well.~~ Implement Asimov dataset creation for unbinned models Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Asimov dataset creation for unbinned models #576

Implement Asimov dataset creation for unbinned models #576

ikrommyd commented Jul 24, 2024 •

edited

Loading

jonas-eschle commented Jul 25, 2024

ikrommyd commented Jul 26, 2024 •

edited

Loading

jonas-eschle commented Aug 2, 2024

Implement Asimov dataset creation for unbinned models #576

Implement Asimov dataset creation for unbinned models #576

Comments

ikrommyd commented Jul 24, 2024 • edited Loading

jonas-eschle commented Jul 25, 2024

ikrommyd commented Jul 26, 2024 • edited Loading

jonas-eschle commented Aug 2, 2024

ikrommyd commented Jul 24, 2024 •

edited

Loading

ikrommyd commented Jul 26, 2024 •

edited

Loading