Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Asimov dataset creation for unbinned models #576

Open
ikrommyd opened this issue Jul 24, 2024 · 3 comments
Open

Implement Asimov dataset creation for unbinned models #576

ikrommyd opened this issue Jul 24, 2024 · 3 comments
Labels
discussion For discussions on concepts and ideas

Comments

@ikrommyd
Copy link
Contributor

ikrommyd commented Jul 24, 2024

There is also a way to create Asimov datasets from unbinned models either by:

  1. Making a binning on the spot
  2. Generating it from many weighted unbinned events.
    Check the combine docs for more info: https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/part3/runningthetool/#toy-data-generation
@ikrommyd ikrommyd added the discussion For discussions on concepts and ideas label Jul 24, 2024
@ikrommyd ikrommyd changed the title Implement Asimov dataset creation for unbinned models as well. Implement Asimov dataset creation for unbinned models Jul 24, 2024
@jonas-eschle
Copy link
Contributor

That's an interesting issue, because in hepstats, this creates currently a binned asymov set, also for unbinned data, which is not optimal and should not happen. But it could be a very high stats dataset, that could be a possibility.

The question is a bit conceptually, what is needed?
Creating a binned is probably already possible with the to_binned and to_binneddata (I think, right?)

What's the weighted unbinned events, why weighted? Not sure about where the weigths are coming from.

And it reminds me of another discussion about the "best binning", as we're doing a lot of unbinned fits in LHCb that could, in prinziple, be binned. So implementing something like this https://arxiv.org/abs/2210.02848 could be useful.

I guess things are currently possible already to do, hepstats should have an automatic binning, or zfit itself.
And modulo that it isn't as easy accessible to the user as it maybe should be. It is, but more in a way of how to communicate this to the user?

@ikrommyd
Copy link
Contributor Author

ikrommyd commented Jul 26, 2024

I was just looking at the zfit code (not hepstats). Cool so maybe a shortcut of to_binned -> to_binneddata would be nice and easy to have for unbinned models. For the weighted unbinned events, look at the "Pseudo-Asimov" dataset of the combine docs I linked above.

When it comes to visibility to, If I search the code for the word "Asimov", I would find to_binneddata. However If I didn't search like that, I would expect something like model.create_asimov. Or even as part of the sampler. Since we do sampler = model.create_sampler() to generate toys, we could have a sampler method other than resample that is make_asimov or something like that.

@jonas-eschle
Copy link
Contributor

I would expect something like model.create_asimov

I think this is a crucial difference between having a nicely named API and good enough docs: the problem with adding this is that the expectations may be different. Should it be binned, unbinned? But what is more crucial is to have something where this is explained I think

How did you come across "asimov", just to collect a bit of data?

And agree, the to_binneddata as a shortcut wouldn't harm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion For discussions on concepts and ideas
Projects
None yet
Development

No branches or pull requests

2 participants