Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added HPD calculation to ensemble #1431

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

shoepfl
Copy link
Contributor

@shoepfl shoepfl commented Jul 17, 2024

The ensemble from_sample method now has an additional rel_cutoff argument. rel_cutoff allows the posterior to be cut to the alpha % highest density (here, non-normalized posterior probability).

I checked the code with my workflow, and it worked. Anyway, I would be happy to receive suggestions for improvement.

Adresses #1117

The ensemble from_sample method has now an additional rel_cutoff argument. rel_cutoff allwos to cut the posterior to the alpha % highest density (here non-normalized posterior probability).
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.

Project coverage is 83.45%. Comparing base (fd652e8) to head (16c1c25).

Files with missing lines Patch % Lines
pypesto/ensemble/ensemble.py 95.23% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1431      +/-   ##
===========================================
+ Coverage    83.43%   83.45%   +0.02%     
===========================================
  Files          160      160              
  Lines        13473    13494      +21     
===========================================
+ Hits         11241    11262      +21     
  Misses        2232     2232              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fixes print statement that was only used for debugging
@dilpath
Copy link
Member

dilpath commented Jul 17, 2024

@shoepfl Thanks very much!

@PaulJonasJost any thoughts on how to consolidate the two different cutoff methods? With this PR, using rel_cutoff with Ensemble.from_optimization_* will give chi2-based cutoffs, but Ensemble.from_sample will give hpd-based cutoffs. Ideally, the user would be able to choose. Should we create some Cutoff class that can be used like so?

class Cutoff:
    ...
    def __call__(self, xs, fs): # or rather pass a list of vectors and their fvals?
        return self.apply_cutoff(xs=xs, fs=fs)
    @abstractmethod
    def apply_cutoff(self, xs=xs, fs=fs):
        ...  # specific rel/chi2/hpd cutoff method here
class RelCutoff(Cutoff): ...
class Chi2Cutoff(Cutoff): ...
class HpdCutoff(Cutoff): ...

cutoff = Chi2Cutoff(percentile=...)

ensemble = Ensemble.from_sample(result=result, cutoff=cutoff)

and inside Ensemble.from_sample:

def from_sample(..., cutoff: Cutoff):
    ...
    x_vectors = cutoff(xs=result.sample_result.trace_x[0, burn_in:, :], fs=result.sample_result.trace_fval[0, burn_in:])
    ...

We could also integrate things like remove_burn_in and chain_slice (see Ensemble.from_sample) into Cutoff.

Copy link
Collaborator

@PaulJonasJost PaulJonasJost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition. I think a test should be added as well, if you want we can help with that.

pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pass
else:
x_vectors = calculate_hpd(result=result, burn_in=burn_in, ci_level=rel_cutoff)

if chain_slice is not None:
x_vectors = x_vectors[chain_slice]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general remark, but it is not quite clear whether this is the desired order, it might be that we want to first slice and then calculate hpd. But that is only a minor thing here i guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I would rather calculate the HPD first and then take a slice, as the HPD should become more precise that way; computationally, it's not an issue. But if there are other reasons to change the order, that would also be fine with me.

pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
@PaulJonasJost
Copy link
Collaborator

@shoepfl Thanks very much!

@PaulJonasJost any thoughts on how to consolidate the two different cutoff methods? With this PR, using rel_cutoff with Ensemble.from_optimization_* will give chi2-based cutoffs, but Ensemble.from_sample will give hpd-based cutoffs. Ideally, the user would be able to choose. Should we create some Cutoff class that can be used like so?

class Cutoff:
    ...
    def __call__(self, xs, fs): # or rather pass a list of vectors and their fvals?
        return self.apply_cutoff(xs=xs, fs=fs)
    @abstractmethod
    def apply_cutoff(self, xs=xs, fs=fs):
        ...  # specific rel/chi2/hpd cutoff method here
class RelCutoff(Cutoff): ...
class Chi2Cutoff(Cutoff): ...
class HpdCutoff(Cutoff): ...

cutoff = Chi2Cutoff(percentile=...)

ensemble = Ensemble.from_sample(result=result, cutoff=cutoff)

and inside Ensemble.from_sample:

def from_sample(..., cutoff: Cutoff):
    ...
    x_vectors = cutoff(xs=result.sample_result.trace_x[0, burn_in:, :], fs=result.sample_result.trace_fval[0, burn_in:])
    ...

We could also integrate things like remove_burn_in and chain_slice (see Ensemble.from_sample) into Cutoff.

I think as it is currently used rel_cutoff here might be renamed. If i understand your suggestion correctly, the main advantage would be a reduction in direct parameters when calling Ensemble.from_...with a potential easier overview? Removing burn in and chain slice are quite sample specific, i am not entirely sure whether from optimization could straightforward be used as equivalents. But the idea itself sounds good.

shoepfl and others added 2 commits July 23, 2024 16:24
Adds default value for burn_in

Co-authored-by: Paul Jonas Jost <[email protected]>
Fixes wrong default value for the ci_level

Co-authored-by: Paul Jonas Jost <[email protected]>
@shoepfl
Copy link
Contributor Author

shoepfl commented Jul 23, 2024

@PaulJonasJost, a test sounds good. Indeed, it would be nice if you could assist here, as there may already be similar test cases in the toolbox that we could add to.

@PaulJonasJost
Copy link
Collaborator

@PaulJonasJost, a test sounds good. Indeed, it would be nice if you could assist here, as there may already be similar test cases in the toolbox that we could add to.

I will look into it and write a test if you are fine with me editing the branch 👍🏼

@shoepfl
Copy link
Contributor Author

shoepfl commented Jul 25, 2024

yes of course, and if I can help you can just assign me some tasks

@PaulJonasJost
Copy link
Collaborator

added the test and renamed. Now i guess someone else has to review (@dilpath)?

Copy link
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good 👍

Independently of a re-write into classes like HpdCutoff as discussed above, I think it would already be valuable to change calculate_hpd to instead take parameter vectors, and their objective function values (i.e., objects like result.sample_result.trace_x[0] and result.sample_result.trace_fval instead of result). Currently, calculate_hpd handles burn-in and is only applicable to result.sample_result objects, which seems suboptimal, since the logic should apply to any collection of vectors and their functional values.

Anyway, fine to merge as-is, since these Ensemble.from_... methods need some work anyway.

Comment on lines 607 to 608
else:
burn_in = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually just write burn_in = 0 above the if statement.

Comment on lines 610 to 613
# added cutoff
if f_quantile is None:
pass
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# added cutoff
if f_quantile is None:
pass
else:
# added cutoff
if f_quantile is not None:

pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
pypesto/ensemble/ensemble.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants