-
-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choosing model serialization format(s) for cross-framework support (like HuggingFace) #1907
Comments
On popular (well, one person) demand, I did register ONNXNaiveNASflux which has a couple of ONNX -> Flux layer translations here which I'd be happy to donate to some other package. What is not Flux native is the computation graph format which instead comes from NaiveNASlib. I might be burned by my pre-JuMP attempts to align parameter shapes by traversing the graph, but I think that a general ONNX-Graph to Chain translation function might be hard to get right, but I haven't given it much thought. The "impedance mismatch" as @ToucheSir called it is a thing in the sense that there are alot of valid ONNX operators which can't be turned into Flux layers (which is why I think the ONNX.jl approach is the right way if one has the ambition to cover the spec). |
Looking at initializers of
Also, semi-automatic code generation might be a good approach. For example, given a set of structured names like Once I finish the current task in Yota, I'm going to allocate one or two months purely for ONNX.jl, and now I know what will my next target model to support :) |
Hmm, looking at the distributions of “likes” in this thread makes me believe I have some grave misunderstanding of the topic, but I’ll make one more attempt. Sorry if this is just noise.
Why would you need to rely on guesses/heuristics based on names? Can’t you just look at the OP-type and verify that everything which the corresponding Flux layer struct needs as parameters is present, either as an initializer or as a (possibly propagated) constant? This (minus constant propagation) is pretty much what I also don’t see the connection between this and the part about the |
Here I assume that we don't have a Flux model to map ONNX graph to, but only the graph itself, and we need to infer corresponding Flux layers purely from the initializers. I agree with the argument about |
But why not from the OP types? For example, optype=Conv means we can use |
For simple ops like convolution, this works, and it's exactly what ONNX.jl does on dev (except with NNlib instead of Flux). But the example above features MHA layers which has no ONNX op type. We'd rather map the full MHA layer to something from NeuralAttentionlib for performance. We also can't directly infer that this is MHA from the graph of op types cause it looks like batched matmul + normalization. But from the structured names, we can make a pretty good guess. So, of course, we want to make the 1-1 mapping when possible, but some cases might require some guess work. |
Ah, now it is clear to me. I read |
Not noise, the whole reason for the issue is to discuss 😀 |
I guess this can be closed now that we recommend to serialize the output of |
(continuing Slack discussion)
Julia to other frameworks
Since we are moving away from
@save "weight.bson" params(model)
in #1875, we should probably think about a recommended object to serialize that will also be friendly with other frameworks. State dicts (i.e. PyTorch) use the keys to encode some structural information likemodel["encoder/weight"]
. The closest match to that in Flux right now would be using Functors to turn the model into a nested named tuple. This can be saved using your favorite Julia serializer, and also loaded into a Flux model with types via #1875. For interfacing with external sources like HuggingFace, a thin translation betweennt[:encoder][:weight]
anddict["encoder/weight"]
is do-able.It would be good to get a collection of other commonly used storage formats outside Julia/Flux to make sure we choose something that has the widest compatibility.
Other frameworks to Julia
In terms of Flux users being able to use models saved from other frameworks (e.g. downloaded from HuggingFace), ONNX is probably the ideal format here:
Simply loading and running an ONNX model is do-able with ONNXRuntime.jl. Translating an ONNX graph to Julia functions from e.g. NNlib is semi-do-able in ONNX.jl#master but not yet stable/released. Translating an ONNX graph to Flux layer types does not exist except maybe one-off examples (at least using SkipConnection etc...a sequence of Convs is of course easy).
cc @dfdx @DrChainsaw
The text was updated successfully, but these errors were encountered: