-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing dlopen
in the component model
#401
Comments
Thank you for this writeup! ❤️ This all looks great and makes sense to me, with one exception: I think it might make sense to start with something a bit more conservative instead of the That way, hosts that don't want to expose compilation abilities can do so, and we could additionally provide a separate interface for doing actual compilation for environments where that makes sense. That is of course very similar to your I'm thinking about something along these lines: package wasi:module-loader;
interface loader {
enum error { /* ... */ }
load: func(id: string) -> result<module, error>;
// Optionally, we could add a way to get a list of known modules:
available-modules: func() -> list<string>;
} |
That's what the preopen API that Alex sketched is for. FWIW I would bikeshed the name and suggest "precompiles" or something along those lines.
They can return an error instead of compiling anything, but we could always layer the |
Also, listing all pre-compiled modules might not be something we want to expose, since the pre-compiled modules could come from the network in a FaaS platform, and then we would have TOCTOU bugs. Better to just try and get it if we have it, otherwise fall back to the back up plan (either find the module on disk and compile it, or propagate an error). |
The key thing is that JIT-compilation is a different, much more powerful capability, which I think should be explicitly targeted via its own interface that can be statically checked for, instead of dynamically returning an error. Content that can't be run in an environment that doesn't support this should be rejected pre-deployment, ideally.
That's a very good point, yes. I'd be happy with just having the "have id, want module" interface and nothing else :) |
Ideally wasi-libc would have an optional import on "compile these bytes", but in lieu of that my thinking is that hosts would, by default, deny "compile these bytes" and you'd be able to opt-in on some hosts (e.g. the |
What I'm trying to convey is that I think these should be different interfaces, so they can be included in different worlds. Worlds that don't include the "compile some bytes for me" interface would be applicable much more broadly. So far at least we've treated packages, and certainly interfaces as an all-or-nothing thing instead of saying that it's okay to implement only parts of an interface and omit certain functions. Regarding wasi-libc integration: would we even have the "compile some bytes for me" interface integrated into libc? What would that look like? It seems like loading a library based on an ID would map much more readily to dlopen? |
That makes sense yeah, and I tried to sketch above separate interfaces as well. My point is that wasi-libc would want the "compile the bytes" interface by default because that's what native platforms expect (e.g. Python). That interface would be allowed to fail, though, and until we have optional imports I think that's the best we can do for wasi-libc. For wasi-libc specifically a native-like experience would be a |
To add on to Alex's response here: |
I feel like I'm missing something, because this still doesn't make sense to me. Isn't the much more equivalent-to-native thing a I.e., wouldn't we want to leave it up to the host to decide how to go from the file path (aka, opaque ID) to a loadable module? In the wasmtime case, we'd look for a All of this is effectively the preopens thing, and I guess all I'm saying is we should start out having only that, but not call it preopens or anything :) |
Till and I talked a bit more about this over video and the general conclusions we reached were:
Personally I think that'd be reasonable since there's not a huge use case right now for "generate wasm in content and then compile it", and that can always be satisfied with a filesystem too. |
Perhaps it could be |
The thing that seems very important to me is that the primitive used by libc should not require being able to acquire the wasm bytes to get a module ref. |
To expand on my reasoning here: For environments that want to/can only handle precompiled binaries, we really don't want to require the I guess I would still somewhat prefer not to call this |
I liked the ability to JIT dotnet IL to wasm stream or bytes. Are you saying that I need to store those bytes to FS first ? I'm not 100% sure but I think that chrome already stores precompiled wasm. Maybe they calculate hash ? |
I agree that JIT compilation is important—it'll not be supported in all environments though, so I think it should not be part of the default way to support What I'm imagining is that we'd have a separate WASI interface, potentially in a separate package, that'd allow you to get a That way, environments that can't support actual JIT compilation can support |
Would this allow me to get native guest bindings to the dynamically loaded module? If so, how does this interface allow me to specify the expected shape of the interface in the loaded module? |
At the BA summit this past weekend I discussed with a few folks about what it might look like to implement
dlopen
from C in the component model. What follows is a rough sketch about how this might be possible which is intended to capture the conversations that happened. At this time I don't believe anyone's lined up to work on this, but nevertheless I wanted to capture the context we discussed and what might be necessary. This is a rough shape of a solution and will need more work to get standardized and implemented.The general idea is that we'd like to explore adding component model intrinsics which support the ability to load an arbitrary wasm module at runtime, open it, and start executing it. This is what
dlopen
does on native platforms and is useful for a variety of use cases. Perhaps chiefly though is that existing language ecosystems expect this to work, so getting them to work requires an implementation ofdlopen
.The other general idea is that we'd like to standardize as-general-as-possible intrinsics and building blocks as necessary. Emscripten for example has a model of dynamic linking today but we don't want to bake that exactly as-is into the component model. Instead it should be possible to build various other forms of dynamic linking, if necessary, on top of component model intrinsics. The north star for now is the Emscripten-style dynamic linking since that's what tooling supports, but it's hoped that implementation support can still be generalized.
Component Model Changes
Supporting a full-fledged
dlopen
will require changes to the component model today.Component Model: New Types
A new built-in resource type will be added to the component model, a "moduleref". For example in the component model you'll be able to do:
A
module
here is a resource definition of a new type that the host understands. This is similar to declaring and importing a resource except that it's provided by the host and is the same across all components. This resource type can haveown
andborrow
handles like other resources in the component model.This new type would additionally be added to WIT, too.
Component Model: New WASI APIs
With this new type available in the component model the thinking is that new WASI APIs would be added for acquiring modules. This enables hosts to implement a variety of methods of identifying and loading modules. Furthermore by being WASI APIs it enables virtualizing these implementations as necessary too. Currently the rough idea is:
Here a host can provide the ability to compile arbitrary wasm bytes. These bytes might be loaded through the filesystem, for example, or through other means. Hosts should be able to return "not supported" for
compile
or this would also be a great use case for optional imports.Hosts also can provide a set of propened modules (perhaps with a better name). This represents ahead-of-time compiled modules for examples and might be more suitable in contexts where fully dynamic runtime compilation is not allowed.
When implementing
dlopen
it's expected thatwasi-libc
would locate the module-to-instantiate by doing something like:preopens/get
method. Use that if present.compile
. If that fails, then return an error.At this point
dlopen
has a handle to a module to instantiate, so the next bit is instantiating it.Component Model: New Intrinsincs
Instantiation is sketched here as entirely outside the realm of WIT. Everything that follows is purely a component model intrinsic (similar to
resource.drop
) and can be synthesized in any component.First up are intrinsics to perform runtime inspection of a
module
. Everything here is listed as-if it had mostly-WIT types but each intrinsic here is actually producing a core module.module.imports_len : func(m: borrow<module>) -> u32
- returns the number of imports a module hasmodule.import_{module,name}_len : func(m: borrow<module>, import: u32) -> u32
- returns the byte length of the import name (utf-8 encoded)module.import_{module,name} $memory : func(m: borrow<module>, import: u32, ptr: i32)
- fills inptr
in linear memory with the contents of the nth import name.Note that at this time type-reflection of modules isn't supported. It's expected that can be added later if needed, but it's hopefully not needed yet. (TODO: maybe these should just be component-model WIT types?)
Next there will additionally be an API to read custom sections of modules, for example
dylink.0
in the Emscripten-based ABI:module.custom_section_size : func(m: borrow<module>, name: string) -> option<u32>
- returns the byte length of the custom sectionname
, ornone
if it's not present.module.custom_section_read $memory : func(m: borrow<module>, dst: i32, len: i32, src: i32)
- reads a custom section into linear memory with a memcpy-style API.(TODO: like above, maybe this is better modeled with component model types? Also needs to handle the possibility of repeated custom sections too)
Next there needs to be the ability to build up the set of imports that will be used to instantiate a module. This is done with an "imports builder" type which acts like a resource but doesn't actually have any definition in WIT or the component model itself (at least not at this time)
imports_builder.new : func() -> IB
- create a new blank imports builderimports_builder.drop : func(IB)
- destroys a builder (TODO: mayberesource.drop
?)imports_builder.bind_{memory,global,table,func} $index : func(borrow<IB>, string, string)
- binds the statically provided item to the names provided. This is used, for example, to provide a module's own memory to the import listimports_builder.new_global_i32 : func(borrow<IB>, string, string, i32)
- creates a brand new wasm global (mutable? new parameter?) with the provided initial value. (this is assumed it's needed for the Emscripten ABI)imports_builder.bind_funcref : func(borrow<IB>, string, string, funcref)
- binds the provided function to the specified import name. This is used to provide a module's own functions to imports.It's hoped that with all of the above it's possible to implement basically everything in
dlopen
from the Emscripten dynamic linking ABI. With all of this it culminates in a single intrinsic:imports_builder.instantiate : func(borrow<module>, borrow<IB>) -> result<instance, string>
where this final
instantiate
intrinsic is used to perform instantiation itself (TODO: return type here needs some work).There will also need to be an API or two to lookup globals/functions on the returned
instance
.Integration with
wasi-libc
It's hoped that all of the above will be implementations of
dlopen
inwasi-libc
. It's not expected that applications will necessarily be manipulating the intrinsics themselves and such. All the details of how the Emscripten dynamic linking ABI, for example, would be encoded inwasi-libc
in terms of matching names, providing imports, manipulating memories and globals, etc.This is very much a work-in-progress design. Even just writing this up I feel like we may want to shift more things into WIT or similar or have WIT-defined builtins rather than so many intrinsics. Furthermore there's a lot of details here to prove out and also ensure that there's enough functionality to fully implement Emscripten's dynamic linking ABI.
cc @dicej, @fitzgen, @sunfishcode
The text was updated successfully, but these errors were encountered: