-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook about the effect of distance metric and dimension of summary statistic on computational cost of ABC model #8
base: master
Are you sure you want to change the base?
Conversation
Overall, the assignment is well-done. The notebook is pretty easy to follow. There are a few details to note though:
|
The assignment was easy to follow and understand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great and important topic, but the notebook needs more work. Some comments:
- Remove the checkpoint file
- Excellent to have theoretical background with references
- Maybe comment what "optimal choice of tolerance" means, especially since typically unknown
- "This example follows same conventions that are used in ELFI tutorial." More specific, please
- What is setting the tolerance to 0.5 based on?
- Note
elfi.examples.gauss.gauss_nd_mean
- Perhaps the best way to pass constant arguments to a simulator is using
functools.partial
(i.e.simulator_with_const = partial(simulator, cov=cov)
), other options includelambda
or just giving a constant node as a parent - "# NoteToSelf: Statement above is not true, see bug 268" ? You should set ELFI to use the global seed if you want it to use it
- "# NoteToSelf: y is not a numpy array" ? It should be
- "# NoteToSelf: if batch_size = 1 elfi uses different dimensions." No, it's the same. In ELFI
axis=0
is always assumed the dimension of thebatch_size
. On the other hand, (in the default case) the output from your Simulator is directly the input to your Summary. So if your simulator does not have a dimension corresponding tobatch_size
, then it doesn't appear to the Summary either. In this case, if you did usebatch_size
, the output from your simulator would have shape(batch_size, 30, 6)
. - Very nice usage of OutputPool!
- It would be informative to see output from
elfi.draw
- From a Bayesian perspective it is a bit confusing to name the parameters
priors
. What is the posterior mean of prior? Preferably use something like:mu1
,mu2
etc. - Would be good to see how the inference succeeds in cases with fewer summaries
- Note that Euclidean distance sums over squares, i.e. the more dimensions you have, the higher the distance typically is. Hence the validity of comparison using a fixed threshold is questionable
- The statistical power of the different summaries in the 6d Gaussian case is very different, and I think the validity of the comparison should be discussed further. It is unfortunately not easy to come up with a meaningful case to try this with!
- You should say
elfi.new_model()
before starting building the graph for MA2 to clear the previous model from memory - The MA2 case works very well with a larger
batch_size
, and is much faster - Using only 100 samples from rejection sampling (or SMC) is quite few and prone to much stochasticity
- Comparison of results from inference seems to be based on posterior means only. Would be informative to see marginal distributions as well
- Using the same threshold for Euclidean and Manhattan distance is unfounded
- Autocorrelation is just normalized autocovariance, and for MA2 the difference is small, possibly insignificant as a summary statistic. In any case it's hardly 'silly'.
- Please explain how ELFI was unstable? Issues with the
multiprocessing
module are not really issues with ELFI - Neither of the mentioned bugs are actually bugs in ELFI (poor documentation is another thing, apologies for that)
We will try to improve the documentation to avoid the mentioned confusions. Thanks for bringing these misunderstandings to our knowledge!
@cagatayyildiz Great suggestion with the vectorized multivariate_normal. It seems that the code in |
Summary:
Notebook about the effect of distance metric and dimension of summary statistic on computational cost of ABC model. Notebook is related to course CS-E4070 in Aalto University.
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Ossi Galkin
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: