PC prior distribution for Student T dof #252

bwengals · 2023-10-12T18:52:08Z

Moving this over from here.

What is this PR about?

Adding penalized complexity prior for the Student-T degrees of freedom parameter. Useful in models where the likelihood was normal, but you'd need some robustness so you switch to a Student T likelihood. It's implemented already in INLA.

The reason this is useful for modeling is you can "robustify" your Gaussian likelihood by making it a student t in a more principled way. You can use this prior to express in a meaningful way "I think there is a 50% or 20% or whatever chance that the degrees of freedom is over 30 (~normal likelihood)", whereas if you use a Gamma(2, 0.1) or worse fix the degrees of freedom to some value, you risk watering down the information coming from the data via the likelihood.

Outstanding issues

Still need to add tests but I think it'll be good enough for now to compare the logp to INLA.
Might be better to use BoundedContiuous instead of PositiveContinuous? Went down that rabbit hole some already, might need some help here. Seems to work fine now for normal use, but I haven't tested too many weird edge cases.

…zation

ferrine · 2023-11-28T08:29:27Z

pymc_experimental/distributions/dist_math.py

+    )
+
+
+def tri_gamma_approx(x):


it is already implemented

This approximation will be much more performant

I saw you added trigamma recently, I'll give that a try. I used this approx because at the time the gradient wasn't implement yet, where the gradient for the approx is easy. Wasn't concerned with performance at the time, but will take another look

ferrine · 2023-11-28T08:30:55Z

pymc_experimental/distributions/continuous.py

@@ -216,3 +226,62 @@ def moment(rv, size, mu, sigma, xi):
        if not rv_size_is_none(size):
            mode = pt.full(size, mode)
        return mode
+
+
+class PCPriorStudentT_dof_RV(RandomVariable):


needs a docstring

Usually we don't document the RV, but the Distribution class, which doesn't have a docstring either

ricardoV94 · 2023-11-28T09:04:24Z

pymc_experimental/distributions/dist_math.py

+    )
+
+
+def tri_gamma_approx(x):


This approximation will be much more performant

ricardoV94 · 2023-11-28T09:06:11Z

pymc_experimental/distributions/dist_math.py

+    NU_MIN = 2.0 + 1e-6
+    nu = np.concatenate((np.linspace(NU_MIN, 2.4, 2000), np.linspace(2.4 + 1e-4, 4000, 10000)))
+    return UnivariateSpline(
+        studentt_kld_distance(nu).eval()[::-1],


Having an eval is a bit dangerous. If it comes up from an RV you're going to get a random value. The safe thing to do is to constant_fold and raise if it can't be done.

Or create a PyTensor Op that wraps UnivariateSpline

don't we have such an op?

does this help? https://github.com/pymc-devs/pymc-experimental/blob/main/pymc_experimental/utils/spline.py

@ricardoV94 It only comes from nu, which is passed in above as a fixed numpy array, so I think eval is safe here (unless I'm missing your point). I'm using this to get a spline approximation to the inverse of this function, which is what the [::-1] bit at the end of the inputs is about.

Thanks @ferrine, will look into that. I remember needing to use UnivariateSpline this way because I needed this particular behavior

if ext=3 of ‘const’, return the boundary value.

as nu goes to infinity.

Ah I missed the inputs were constant, nvm on my end

If it's always a known constant could you use .data instead of .eval()?

ricardoV94 · 2023-11-28T09:07:33Z

pymc_experimental/distributions/continuous.py

+    @classmethod
+    def get_lam(cls, alpha=None, U=None, lam=None):
+        if (alpha is not None) and (U is not None):
+            return -np.log(alpha) / studentt_kld_distance(U)


Suggested change

return -np.log(alpha) / studentt_kld_distance(U)

return -pt.log(alpha) / studentt_kld_distance(U)

bwengals · 2023-12-08T03:40:46Z

To update where this is at, tests were passing except for windows and I couldn't figure out why. Not sure what here would depend on windows so maybe need to dig into that in a VM or something to figure that out. Definitely some room to improve how some of this is structured, but was working well enough I was comfortable using it.

ricardoV94 · 2023-12-08T15:08:58Z

@bwengals the biggest difference with Windows generally is that they sometimes default to int32 dtypes where linux defaults to int64 IIRC. Maybe something in the np.linspace overflows in windows? (I didn't look at which test was even failing, so apologies if this is completely off the target)

bwengals · 2023-12-11T18:03:50Z

overflows in windows?

Ah I bet that's it! The test produces inf and it shouldn't. Well thanks for giving me a starting place

ricardoV94 · 2023-12-11T18:05:10Z

In case that's the reason:
https://stackoverflow.com/questions/36278590/numpy-array-dtype-is-coming-as-int32-by-default-in-a-windows-10-64-bit-machine

ricardoV94 · 2024-02-06T09:48:32Z

@bwengals interested in updating this one?

bwengals added 5 commits October 12, 2023 11:35

fix import error

9dd5573

add studentt dof pc prior and helper funcs

8b6e391

make docstring a little better for negbinom pc prior dist helper func

f8dc8d0

add test

310f7e1

fix import issue in dist math

a2026f5

ricardoV94 approved these changes Oct 23, 2023

View reviewed changes

ricardoV94 added the enhancements New feature or request label Oct 23, 2023

bwengals added 5 commits October 23, 2023 21:38

fix imports in dist_math

62fda67

remove negbinom kld distance, pymc actually uses a weirder parameteri…

14ea01e

…zation

run from less extreme part of the domain

60077e8

fix test docstring

67549a2

pasted wrong number in from inla

8f61efe

ferrine reviewed Nov 28, 2023

View reviewed changes

ricardoV94 reviewed Nov 28, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PC prior distribution for Student T dof #252

PC prior distribution for Student T dof #252

bwengals commented Oct 12, 2023

ferrine Nov 28, 2023

ricardoV94 Nov 28, 2023

bwengals Dec 8, 2023

ferrine Nov 28, 2023

ricardoV94 Nov 28, 2023

ricardoV94 Nov 28, 2023

ricardoV94 Nov 28, 2023

ferrine Nov 29, 2023

ferrine Nov 29, 2023

bwengals Dec 8, 2023 •

edited

Loading

ricardoV94 Dec 8, 2023

jessegrabowski Dec 19, 2023 •

edited

Loading

ricardoV94 Nov 28, 2023

bwengals commented Dec 8, 2023

ricardoV94 commented Dec 8, 2023 •

edited

Loading

bwengals commented Dec 11, 2023

ricardoV94 commented Dec 11, 2023 •

edited

Loading

ricardoV94 commented Feb 6, 2024

	return -np.log(alpha) / studentt_kld_distance(U)
	return -pt.log(alpha) / studentt_kld_distance(U)

PC prior distribution for Student T dof #252

Are you sure you want to change the base?

PC prior distribution for Student T dof #252

Conversation

bwengals commented Oct 12, 2023

What is this PR about?

Outstanding issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwengals Dec 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessegrabowski Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwengals commented Dec 8, 2023

ricardoV94 commented Dec 8, 2023 • edited Loading

bwengals commented Dec 11, 2023

ricardoV94 commented Dec 11, 2023 • edited Loading

ricardoV94 commented Feb 6, 2024

bwengals Dec 8, 2023 •

edited

Loading

jessegrabowski Dec 19, 2023 •

edited

Loading

ricardoV94 commented Dec 8, 2023 •

edited

Loading

ricardoV94 commented Dec 11, 2023 •

edited

Loading