Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get OptimizationManopt ready for registration #763

Merged
merged 18 commits into from
Jun 4, 2024
Merged

Conversation

Vaibhavdixit02
Copy link
Member

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

Copy link
Member Author

@Vaibhavdixit02 Vaibhavdixit02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateuszbaran can you review the hessian wrraper here, it's incorrect right now the issue has to do with the representation of the riemannian hessian which seems to be a vector and not matrix, Ronny did comment on this a bit in another issue but I think I might not have understood it well?

Also for the sgd test I am not sure how to handle the gradient, maybe it can't be supported right now?

I have not added any of the proximal based methods that can be added when I add support for proximal oracles in general in Optimization and shouldn't be a blocker for this right now

@mateuszbaran
Copy link

Hi! I will check your PR soon. A few quick comments:

@mateuszbaran can you review the hessian wrraper here, it's incorrect right now the issue has to do with the representation of the riemannian hessian which seems to be a vector and not matrix, Ronny did comment on this a bit in another issue but I think I might not have understood it well?

Manopt currently doesn't support actual full Hessians. riemannian_Hessian corresponds to Hessian-vector products. Actually even the theory of converting Euclidean Hessians to Riemannian ones isn't particularly well developed.

Also for the sgd test I am not sure how to handle the gradient, maybe it can't be supported right now?

SGD currently has a somewhat weird interface (you need to provide a list of functions that are called in some order) but with JuliaManifolds/Manopt.jl#386 (or soon after) we should have a nicer interface with batch number passed as an index. I'd suggest waiting a bit instead of trying the current interface.

@kellertuer
Copy link

For the Hessian, in Euclidean sense we do not return the matrix $H$ but always need a direction $d$ and compute $Hd$.

So instead of providing a point (we want the Hessian at) and returning a matrix,
we take a point and. a direction (tangent vector) and the result $Hd$ is actually also a tangent vector we return.
That especially avoids having to work in charts / bases of tangent spaces (which the matrix form would require).

The theory for (Euclidean) Hessian to Riemannian Hessian conversion is quite developed, but there is basically a chain rule involved where the Christoffel symbol appear, which makes that a bit tricky to implement every now and then.

@Vaibhavdixit02
Copy link
Member Author

Oops I thought I replied here earlier, sorry. Thanks for the clarifications both, it's very helpful!

Based on these suggestions I am leaning towards not supporting SGD (along with the proximal methods) for now. I will try to get the hessian methods working now that's clear, hopefully, we can wrap this first version up pretty soon.

@kellertuer
Copy link

I think that is a good choice. SGD might get a bit of a rework after the PR that is currently being finished.
Proximal Methods should stay as they are, but I do see they are not so super common in general.

@mateuszbaran
Copy link

The theory for (Euclidean) Hessian to Riemannian Hessian conversion is quite developed, but there is basically a chain rule involved where the Christoffel symbol appear, which makes that a bit tricky to implement every now and then.

Yes, that is known, I forgot to mention that I mean the chart-free and coordinate-free variant where you start from the Hessian matrix in the embedding (flattened to one index). It doesn't look hopelessly impossible for symmetric spaces so maybe we will have that at some point?

@kellertuer
Copy link

I am not so sure, for now I can only do that for every manifold individually, using “Christoffelistical tools” and hoping the result can be expressed chart-free.

@kellertuer
Copy link

I tried to understand the code a bit for the last hour while trying to help with the questions raised.
I am not so able to understand what this is doing. Am I just not clever enough or is the code intended to be “just a hack” that is understood by exactly the person that wrote it?

From the current errors a conclusion might also be that Manopt.jl – though started about 7 years ago, but many developed by me with the help of a few students – is maybe not mature enough if it errors so much?

@Vaibhavdixit02
Copy link
Member Author

What are the hacks that you keep seeing, I don't think I am doing anything here that should be referred to as that but if you have some specific parts you don't find rigorous please suggest them. I want to think this has been going pretty well and your comments have been respected and addressed so it's confusing why you have that view.

From the current errors a conclusion might also be that Manopt.jl – though started about 7 years ago, but many developed by me with the help of a few students – is maybe not mature enough if it errors so much?

No piece of software exists without bugs, 1000 people teams of exceptional engineers ship things like Microsoft teams. I didn't mean anything negative when I mentioned "possible" bugs above since I am not 100% sure where or why that's happening yet, and it seemed you and/or @mateuszbaran would have more insight about it.

@kellertuer
Copy link

What are the hacks that you keep seeing, I don't think I am doing anything here that should be referred to as that but if you have some specific parts you don't find rigorous please suggest them.

As I just wrote in a comment – for now I do not understand much of this (or the last PRs) code. But whenever I spent half an hour or 45 minutes like figuring out that one line with the hasfield hack – it is all more like hacked.
Finding all of them – well I do have a usual job as well, so I can not read all your hurried PRs – sorry.

The same for the insight on bugs. I can check that, but probably not before end of June. So then the question is – can this wait so long or whether this has to be merged in 2-3 days and is in a hurry? Last PR I had that feeling, all has to be rushed.
This would not help reducing bugs.

@kellertuer
Copy link

Oh – and to clarify – I also did not understand it as negative. Just that if Manopt is too buggy, it should maybe not published as an interface here. Even more if no one finds time to fix them.
I am happy if people would use Manopt, but the current PR here shows a bit more that one person alone – even now 1.5 since Mateusz joined quite a bit – is too less coverage for a package to be maintained. So this is not due to your feedback being negative, it is not. Just whether something like Manopt.jl would be worth being added.

@Vaibhavdixit02 Vaibhavdixit02 merged commit 22bf121 into master Jun 4, 2024
22 of 42 checks passed
@Vaibhavdixit02 Vaibhavdixit02 deleted the manopt branch June 4, 2024 00:37
@kellertuer
Copy link

This endet up being quite a rush again. Most tests fail still, but it was merged nevertheless.

I‘ll wait for a more stable version before I add it to Manopt then.

@Vaibhavdixit02
Copy link
Member Author

Those are Downgrade tests - the failure is completely unrelated to any work here. All tests that were expected to pass here and the example in the docs are all being run on CI and passing, but I'll leave the decision to you.

@kellertuer
Copy link

We discussed 2 days ago that I feel this is all in a rush.
I do not yet follow most of the code as we discussed here, nor can I reproduce the bugs you claim.
Hence I lack a bit of confidence here. That is nothing against your work, just my trust to it.

@ChrisRackauckas
Copy link
Member

@Vaibhavdixit02
Copy link
Member Author

Screenshot 2024-06-04 at 08 01 35

@kellertuer
Copy link

That for me just illustrates that at least these are failing.

However, I do not want to intervene with the common practices in SciML or this package; my approach is different, but the only conclusion I draw from this I already mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants