Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing dlopen in the component model #401

Open
alexcrichton opened this issue Sep 30, 2024 · 16 comments
Open

Implementing dlopen in the component model #401

alexcrichton opened this issue Sep 30, 2024 · 16 comments

Comments

@alexcrichton
Copy link
Collaborator

At the BA summit this past weekend I discussed with a few folks about what it might look like to implement dlopen from C in the component model. What follows is a rough sketch about how this might be possible which is intended to capture the conversations that happened. At this time I don't believe anyone's lined up to work on this, but nevertheless I wanted to capture the context we discussed and what might be necessary. This is a rough shape of a solution and will need more work to get standardized and implemented.

The general idea is that we'd like to explore adding component model intrinsics which support the ability to load an arbitrary wasm module at runtime, open it, and start executing it. This is what dlopen does on native platforms and is useful for a variety of use cases. Perhaps chiefly though is that existing language ecosystems expect this to work, so getting them to work requires an implementation of dlopen.

The other general idea is that we'd like to standardize as-general-as-possible intrinsics and building blocks as necessary. Emscripten for example has a model of dynamic linking today but we don't want to bake that exactly as-is into the component model. Instead it should be possible to build various other forms of dynamic linking, if necessary, on top of component model intrinsics. The north star for now is the Emscripten-style dynamic linking since that's what tooling supports, but it's hoped that implementation support can still be generalized.

Component Model Changes

Supporting a full-fledged dlopen will require changes to the component model today.

Component Model: New Types

A new built-in resource type will be added to the component model, a "moduleref". For example in the component model you'll be able to do:

(component 
  (type $moduleref module)
  (import "x" (func (result (own $moduleref))))
)

A module here is a resource definition of a new type that the host understands. This is similar to declaring and importing a resource except that it's provided by the host and is the same across all components. This resource type can have own and borrow handles like other resources in the component model.

This new type would additionally be added to WIT, too.

Component Model: New WASI APIs

With this new type available in the component model the thinking is that new WASI APIs would be added for acquiring modules. This enables hosts to implement a variety of methods of identifying and loading modules. Furthermore by being WASI APIs it enables virtualizing these implementations as necessary too. Currently the rough idea is:

package wasi:compile;

interface compile {
    enum error { /* ... */ }

    // bikeshed this name, `wasi:compile/compile/compile` is a lot
    compile: func(wasm: list<u8>) -> result<module, error>;
}

interface preopens {
    get: func(name: string) -> option<module>;
}

Here a host can provide the ability to compile arbitrary wasm bytes. These bytes might be loaded through the filesystem, for example, or through other means. Hosts should be able to return "not supported" for compile or this would also be a great use case for optional imports.

Hosts also can provide a set of propened modules (perhaps with a better name). This represents ahead-of-time compiled modules for examples and might be more suitable in contexts where fully dynamic runtime compilation is not allowed.

When implementing dlopen it's expected that wasi-libc would locate the module-to-instantiate by doing something like:

  • First lookup the module name with the preopens/get method. Use that if present.
  • Otherwise interpret the module name and try to find a file on the filesystem.
  • If found, compile it with compile. If that fails, then return an error.

At this point dlopen has a handle to a module to instantiate, so the next bit is instantiating it.

Component Model: New Intrinsincs

Instantiation is sketched here as entirely outside the realm of WIT. Everything that follows is purely a component model intrinsic (similar to resource.drop) and can be synthesized in any component.

First up are intrinsics to perform runtime inspection of a module. Everything here is listed as-if it had mostly-WIT types but each intrinsic here is actually producing a core module.

  • module.imports_len : func(m: borrow<module>) -> u32 - returns the number of imports a module has
  • module.import_{module,name}_len : func(m: borrow<module>, import: u32) -> u32 - returns the byte length of the import name (utf-8 encoded)
  • module.import_{module,name} $memory : func(m: borrow<module>, import: u32, ptr: i32) - fills in ptr in linear memory with the contents of the nth import name.

Note that at this time type-reflection of modules isn't supported. It's expected that can be added later if needed, but it's hopefully not needed yet. (TODO: maybe these should just be component-model WIT types?)

Next there will additionally be an API to read custom sections of modules, for example dylink.0 in the Emscripten-based ABI:

  • module.custom_section_size : func(m: borrow<module>, name: string) -> option<u32> - returns the byte length of the custom section name, or none if it's not present.
  • module.custom_section_read $memory : func(m: borrow<module>, dst: i32, len: i32, src: i32) - reads a custom section into linear memory with a memcpy-style API.

(TODO: like above, maybe this is better modeled with component model types? Also needs to handle the possibility of repeated custom sections too)

Next there needs to be the ability to build up the set of imports that will be used to instantiate a module. This is done with an "imports builder" type which acts like a resource but doesn't actually have any definition in WIT or the component model itself (at least not at this time)

  • imports_builder.new : func() -> IB - create a new blank imports builder
  • imports_builder.drop : func(IB) - destroys a builder (TODO: maybe resource.drop?)
  • imports_builder.bind_{memory,global,table,func} $index : func(borrow<IB>, string, string) - binds the statically provided item to the names provided. This is used, for example, to provide a module's own memory to the import list
  • imports_builder.new_global_i32 : func(borrow<IB>, string, string, i32) - creates a brand new wasm global (mutable? new parameter?) with the provided initial value. (this is assumed it's needed for the Emscripten ABI)
  • imports_builder.bind_funcref : func(borrow<IB>, string, string, funcref) - binds the provided function to the specified import name. This is used to provide a module's own functions to imports.

It's hoped that with all of the above it's possible to implement basically everything in dlopen from the Emscripten dynamic linking ABI. With all of this it culminates in a single intrinsic:

  • imports_builder.instantiate : func(borrow<module>, borrow<IB>) -> result<instance, string>

where this final instantiate intrinsic is used to perform instantiation itself (TODO: return type here needs some work).

There will also need to be an API or two to lookup globals/functions on the returned instance.

Integration with wasi-libc

It's hoped that all of the above will be implementations of dlopen in wasi-libc. It's not expected that applications will necessarily be manipulating the intrinsics themselves and such. All the details of how the Emscripten dynamic linking ABI, for example, would be encoded in wasi-libc in terms of matching names, providing imports, manipulating memories and globals, etc.


This is very much a work-in-progress design. Even just writing this up I feel like we may want to shift more things into WIT or similar or have WIT-defined builtins rather than so many intrinsics. Furthermore there's a lot of details here to prove out and also ensure that there's enough functionality to fully implement Emscripten's dynamic linking ABI.

cc @dicej, @fitzgen, @sunfishcode

@tschneidereit
Copy link
Member

tschneidereit commented Sep 30, 2024

Thank you for this writeup! ❤️

This all looks great and makes sense to me, with one exception: I think it might make sense to start with something a bit more conservative instead of the compile functionality. I was thinking about something that's actually if anything closer to dlopen: the ability to say "give me a module based on this (to me) opaque identifier."

That way, hosts that don't want to expose compilation abilities can do so, and we could additionally provide a separate interface for doing actual compilation for environments where that makes sense.

That is of course very similar to your preopens interface, but I think it's the for now better primitive to provide.

I'm thinking about something along these lines:

package wasi:module-loader;

interface loader {
    enum error { /* ... */ }

    load: func(id: string) -> result<module, error>;

    // Optionally, we could add a way to get a list of known modules:
    available-modules: func() -> list<string>;
}

@fitzgen
Copy link
Collaborator

fitzgen commented Sep 30, 2024

the ability to say "give me a module based on this (to me) opaque identifier."

That's what the preopen API that Alex sketched is for.

FWIW I would bikeshed the name and suggest "precompiles" or something along those lines.

hosts that don't want to expose compilation abilities can do so

They can return an error instead of compiling anything, but we could always layer the compile interface on top to extend this into a new world too.

@fitzgen
Copy link
Collaborator

fitzgen commented Sep 30, 2024

Also, listing all pre-compiled modules might not be something we want to expose, since the pre-compiled modules could come from the network in a FaaS platform, and then we would have TOCTOU bugs. Better to just try and get it if we have it, otherwise fall back to the back up plan (either find the module on disk and compile it, or propagate an error).

@tschneidereit
Copy link
Member

They can return an error instead of compiling anything, but we could always layer the compile interface on top to extend this into a new world too.

The key thing is that JIT-compilation is a different, much more powerful capability, which I think should be explicitly targeted via its own interface that can be statically checked for, instead of dynamically returning an error. Content that can't be run in an environment that doesn't support this should be rejected pre-deployment, ideally.

Also, listing all pre-compiled modules might not be something we want to expose, since the pre-compiled modules could come from the network in a FaaS platform, and then we would have TOCTOU bugs. Better to just try and get it if we have it, otherwise fall back to the back up plan (either find the module on disk and compile it, or propagate an error).

That's a very good point, yes.

I'd be happy with just having the "have id, want module" interface and nothing else :)

@alexcrichton
Copy link
Collaborator Author

Ideally wasi-libc would have an optional import on "compile these bytes", but in lieu of that my thinking is that hosts would, by default, deny "compile these bytes" and you'd be able to opt-in on some hosts (e.g. the wasmtime CLI). It's possible to make it so wasi-libc doesn't, by default, statically pull in the "compile these bytes" function but that would mean there would have to be a wasi-libc-specific API for "enable that" which would be another portability hazard

@tschneidereit
Copy link
Member

What I'm trying to convey is that I think these should be different interfaces, so they can be included in different worlds. Worlds that don't include the "compile some bytes for me" interface would be applicable much more broadly. So far at least we've treated packages, and certainly interfaces as an all-or-nothing thing instead of saying that it's okay to implement only parts of an interface and omit certain functions.

Regarding wasi-libc integration: would we even have the "compile some bytes for me" interface integrated into libc? What would that look like? It seems like loading a library based on an ID would map much more readily to dlopen?

@alexcrichton
Copy link
Collaborator Author

That makes sense yeah, and I tried to sketch above separate interfaces as well. My point is that wasi-libc would want the "compile the bytes" interface by default because that's what native platforms expect (e.g. Python). That interface would be allowed to fail, though, and until we have optional imports I think that's the best we can do for wasi-libc.

For wasi-libc specifically a native-like experience would be a dlopen function that interprets the input string as a file path. It would probe for the file, read the contents, and then pass the result to a "compile the bytes" function from the host. If that failed then the dlopen call would fail, but that's how I'm imagining it'd be integrated into wasi-libc.

@fitzgen
Copy link
Collaborator

fitzgen commented Oct 1, 2024

For wasi-libc specifically a native-like experience would be a dlopen function that interprets the input string as a file path. It would probe for the file, read the contents, and then pass the result to a "compile the bytes" function from the host. If that failed then the dlopen call would fail, but that's how I'm imagining it'd be integrated into wasi-libc.

To add on to Alex's response here: wasi-libc can't probe for (for example) Wasmtime's .cwasm files (which contain the native code for a compiled .wasm, for those that aren't familiar) because those are a Wasmtime-internal detail (and the precompiles/preopens interface is intended to satisfy that no-compilation use case, which could use things like .cwasms under the hood). At the portable Wasm/WASI/CM standards level, all we can work with for this fully dynamic case are Wasm modules.

@tschneidereit
Copy link
Member

For wasi-libc specifically a native-like experience would be a dlopen function that interprets the input string as a file path. It would probe for the file, read the contents, and then pass the result to a "compile the bytes" function from the host. If that failed then the dlopen call would fail, but that's how I'm imagining it'd be integrated into wasi-libc.

I feel like I'm missing something, because this still doesn't make sense to me. Isn't the much more equivalent-to-native thing a dlopen that, as you say, interprets the input string as a file path, but then takes that to be the thing to load directly, instead of the source of something to load?

I.e., wouldn't we want to leave it up to the host to decide how to go from the file path (aka, opaque ID) to a loadable module? In the wasmtime case, we'd look for a .cwasm file, whereas in JCO for example, we'd look for a .wasm file, but then compile it behind the scenes using WebAssembly.compileStreaming(). And in any case, all libc would ever see is a module reference to instantiate.

All of this is effectively the preopens thing, and I guess all I'm saying is we should start out having only that, but not call it preopens or anything :)

@alexcrichton
Copy link
Collaborator Author

Till and I talked a bit more about this over video and the general conclusions we reached were:

  • I forgot to mention that dlopen in wasi-libc would pass through the string to the "get the preexisting module" first
  • Till made a case that the "compile these bytes" API should look like compile: func(borrow<descriptor>) -> result<module>. Which is to say it takes a reference to an open file (or something like that) rather than the bytes itself.

Personally I think that'd be reasonable since there's not a huge use case right now for "generate wasm in content and then compile it", and that can always be satisfied with a filesystem too.

@sunfishcode
Copy link
Member

Perhaps it could be compile: func(input-stream) -> result<module>, as you can get an input-stream from a descriptor using read-via-stream, and that would free it from being tied to wasi-filesystem.

@tschneidereit
Copy link
Member

The thing that seems very important to me is that the primitive used by libc should not require being able to acquire the wasm bytes to get a module ref. compile: func(borrow<descriptor>) -> result<module> seems borderline, but still just okay to me in that regard, because we can validly make read operations on the descriptor fail. compile: func(input-stream) -> result<module> seems like a step too far, otoh: once you have an input-stream, you really should be able to actually read from it.

@tschneidereit
Copy link
Member

To expand on my reasoning here: For environments that want to/can only handle precompiled binaries, we really don't want to require the .wasm file to be available, and we certainly don't want them to need to do something like "detect if this byte stream came from a .wasm file and then swap in a precompiled version of that .wasm file instead. If we operate on the descriptor level and disallow reading from the descriptor, then it seems reasonable to me to hand out the descriptor, but attach internal state that indicates that this really is a façade and can only be used as input for a module loading operation and nothing else.

I guess I would still somewhat prefer not to call this compile, and instead something like load-module, but that seems less important to me

@pavelsavara
Copy link

pavelsavara commented Oct 3, 2024

Personally I think that'd be reasonable since there's not a huge use case right now for "generate wasm in content and then compile it", and that can always be satisfied with a filesystem too.

I liked the ability to JIT dotnet IL to wasm stream or bytes. Are you saying that I need to store those bytes to FS first ?

I'm not 100% sure but I think that chrome already stores precompiled wasm. Maybe they calculate hash ?

@tschneidereit
Copy link
Member

I agree that JIT compilation is important—it'll not be supported in all environments though, so I think it should not be part of the default way to support dlopen.

What I'm imagining is that we'd have a separate WASI interface, potentially in a separate package, that'd allow you to get a descriptor from a list of bytes (and/or a stream), which could then be used with the interface proposed here.

That way, environments that can't support actual JIT compilation can support dlopen, but not expose this JIT-supporting interface.

@James-Mart
Copy link

Would this allow me to get native guest bindings to the dynamically loaded module? If so, how does this interface allow me to specify the expected shape of the interface in the loaded module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants