-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unsupported dtype from PyMC sampler warnings with ClickHouseBackend
#75
Conversation
5080fcd
to
5ead2f6
Compare
5ead2f6
to
16400d9
Compare
Codecov ReportBase: 95.46% // Head: 95.65% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #75 +/- ##
==========================================
+ Coverage 95.46% 95.65% +0.19%
==========================================
Files 9 9
Lines 617 644 +27
==========================================
+ Hits 589 616 +27
Misses 28 28
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
7be3a9f
to
13ec55e
Compare
In earlier versions of ClickHouse, the `UInt8` type had to be used, but now this is causing trouble with NumPy bool arrays.
13ec55e
to
eb639ca
Compare
@lhelleckes @qacwnfq this was a somewhat tricky PR. If you have a chance to review that's cool, but don't feel obliged to do so. Knowing that you two are short on time and not familiar with the codebase, I challenged myself to do the changes such that each commit passes the tests. The The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -142,7 +142,9 @@ def setup( | |||
name, | |||
str(self.var_dtypes[name]), | |||
list(self.var_shapes[name]), | |||
dims=list(self.model.RV_dims[name]) if name in self.model.RV_dims else [], | |||
dims=list(self.model.named_vars_to_dims[name]) | |||
if name in self.model.named_vars_to_dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a little unorthodox to use a ternary operator to compute the default arg. Couldn't it be just None and if it is None then it is computed? Just a nitpick though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did this because it's in a list comprehension, and there's no way to conditionally NOT pass the kwarg
# This 👇 is needed until PyMC provides shapes ahead of time. | ||
undefined_ndim=True, | ||
) | ||
if statname == "warning": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning that if we would want to record warnings with hopsy we need to give the stat the name "warning"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for now this is only for stats named "warning", which also get special treatment by PyMC.
One could extend this to more arbitrary objects, but generally I think we should try to stick to standard data types and avoid objects, so I didn't want to add more flexibility than needed.
tune=3, | ||
draws=5, | ||
tune=30, | ||
draws=50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the rng set somewhere to guarantee a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
within these steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, but with this few steps the NUTS is pretty much guaranteed to emit warnings.
And even if it occasionally does not, that should not break things.
This MR fixes some incompatibilities with
NUTS
sampling on recent PyMC versions.Background
Since PyMC v4.3.0 or more specifically since pymc-devs/pymc#6192, the
SamplerWarning
s emitted by some step methods --- most importantlyNUTS
--- are no longer collected on the PyMCBaseTrace
object, but rather as any other sampler stat.This created an incompatibility with McBackend, where the
ClickHouseBackend
could not store theseobject
-dtyped things into the database.Solution
Instead of filtering/ignoring the
"warning"
stat, I went all the way to actually support storing it by pickling it, base64-encoding the bytes and storing the base64 encoded data as an ASCII string.For this, the following changes were needed:
str
dtypes in theNumPyBackend
andClickHouseBackend
.mcbackend.adapters.pymc.TraceBackend
to encode/decode the"warning"
stat such that themcbackend.Chain
only seesstr
data coming in/out.NUTS
to provoke needing to storeSamplerWarning
.Further changes along the way:
"localhost"
, viaCLICKHOUSE_HOST
,CLICKHOUSE_PORT
andCLICKHOUSE_PASS
environment variables.ClickHouseChain
batch inserting assumes that all calls to.append()
pass similar dicts as the first call #74 was added.bool
data as the ClickHouseBool
type, which was not supported with earlier versions.