polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles #542

pavelsavara · 2024-09-25T10:47:25Z

In context of dotnet sockets I would like to be able to wasi:poll_poll on mix of handles

which were created by consuming other WASI APIs, like wasi:http
and file handles obtained by using libc sockets APIs

The motivation to do it on single "system" call is to

be able to make progress on events from HTTP while socket is not ready and vice versa
- if we called each select/poll or poll_poll separately, we would get blocked only on part of pollables
to be able to block (stop spinning) if there are no I/O events (and no CPU bound jobs)

At the moment, we can hack it via using libc internal descriptor_table_get_ref and it's data structures to obtain underlying pollables.
@dicej made similar hack for mio
But it's fragile and bad practice.

I can see few possible solutions

expose libc function accepting both lists of resources
- void wasi_poll_plus_fd(poll_list_borrow_pollable_t *in, list_pollfd_t *in, wasip2_list_u32_t *ret)
- this is with wit-bindgen generated C structures on APIs and with list of pollfd struct
we can try simpler C types, something like
- void wasi_poll_plus_fd(int *pollables, int pollables_count, struct pollfd *fds, int fds_count, int **ret, int *ret_count)
or we can expose just the mapping from fds to list of pollables
- void socket_fds_to_wasi_pollables(struct pollfd *fds, int fds_count, int **pollables, int pollables_count)
- this would have to be called before each call to wasi:poll_poll
create libc file descriptor by registering external pollable
- this would have to be un-registered later
- calling socket oriented poll() API on any pollable (clock) feels confusing to me

Relevant code, possibly to be refactored and reused for above.

wasi-libc/libc-bottom-half/sources/poll-wasip2.c

Lines 27 to 151 in 7d4d3b8

    
           	switch (entry->tag) { 
        
           	case DESCRIPTOR_TABLE_ENTRY_TCP_SOCKET: { 
        
           		tcp_socket_t *socket = &(entry->tcp_socket); 
        
           		switch (socket->state.tag) { 
        
           		case TCP_SOCKET_STATE_CONNECTING: 
        
           		case TCP_SOCKET_STATE_LISTENING: { 
        
           			if ((pollfd->events & 
        
           			     (POLLRDNORM | POLLWRNORM)) != 0) { 
        
           				states[state_index++] = (state_t){ 
        
           					.pollable = 
        
           						socket->socket_pollable, 
        
           					.pollfd = pollfd, 
        
           					.entry = entry, 
        
           					.events = pollfd->events 
        
           				}; 
        
           			} 
        
           			break; 
        
           		} 
        
           		case TCP_SOCKET_STATE_CONNECTED: { 
        
           			if ((pollfd->events & POLLRDNORM) != 
        
           			    0) { 
        
           				states[state_index++] = (state_t){ 
        
           					.pollable = 
        
           						socket->state 
        
           							.connected 
        
           							.input_pollable, 
        
           					.pollfd = pollfd, 
        
           					.entry = entry, 
        
           					.events = POLLRDNORM 
        
           				}; 
        
           			} 
        
           			if ((pollfd->events & POLLWRNORM) != 
        
           			    0) { 
        
           				states[state_index++] = (state_t){ 
        
           					.pollable = 
        
           						socket->state 
        
           							.connected 
        
           							.output_pollable, 
        
           					.pollfd = pollfd, 
        
           					.entry = entry, 
        
           					.events = POLLWRNORM 
        
           				}; 
        
           			} 
        
           			break; 
        
           		} 
        
           		case TCP_SOCKET_STATE_CONNECT_FAILED: { 
        
           			if (pollfd->revents == 0) { 
        
           				++event_count; 
        
           			} 
        
           			pollfd->revents |= pollfd->events; 
        
           			break; 
        
           		} 
        
           		default: 
        
           			errno = ENOTSUP; 
        
           			return -1; 
        
           		} 
        
           		break; 
        
           	} 
        
           	case DESCRIPTOR_TABLE_ENTRY_UDP_SOCKET: { 
        
           		udp_socket_t *socket = &(entry->udp_socket); 
        
           		switch (socket->state.tag) { 
        
           		case UDP_SOCKET_STATE_UNBOUND: 
        
           		case UDP_SOCKET_STATE_BOUND_NOSTREAMS: { 
        
           			if (pollfd->revents == 0) { 
        
           				++event_count; 
        
           			} 
        
           			pollfd->revents |= pollfd->events; 
        
           			break; 
        
           		} 
        
           		case UDP_SOCKET_STATE_BOUND_STREAMING: 
        
           		case UDP_SOCKET_STATE_CONNECTED: { 
        
           			udp_socket_streams_t *streams; 
        
           			if (socket->state.tag == 
        
           			    UDP_SOCKET_STATE_BOUND_STREAMING) { 
        
           				streams = &( 
        
           					socket->state 
        
           						.bound_streaming 
        
           						.streams); 
        
           			} else { 
        
           				streams = &( 
        
           					socket->state.connected 
        
           						.streams); 
        
           			} 
        
           			if ((pollfd->events & POLLRDNORM) != 
        
           			    0) { 
        
           				states[state_index++] = (state_t){ 
        
           					.pollable = 
        
           						streams->incoming_pollable, 
        
           					.pollfd = pollfd, 
        
           					.entry = entry, 
        
           					.events = POLLRDNORM 
        
           				}; 
        
           			} 
        
           			if ((pollfd->events & POLLWRNORM) != 
        
           			    0) { 
        
           				states[state_index++] = (state_t){ 
        
           					.pollable = 
        
           						streams->outgoing_pollable, 
        
           					.pollfd = pollfd, 
        
           					.entry = entry, 
        
           					.events = POLLWRNORM 
        
           				}; 
        
           			} 
        
           			break; 
        
           		} 
        
           		default: 
        
           			errno = ENOTSUP; 
        
           			return -1; 
        
           		} 
        
           		break; 
        
           	} 
        
           	default: 
        
           		errno = ENOTSUP; 
        
           		return -1; 
        
           	} 
        
           } else { 
        
           	abort(); 
        
           }

cc @badeend

The text was updated successfully, but these errors were encountered:

badeend · 2024-09-25T19:35:19Z

Out of the four options, 1 and 2 seem the best to me.

or we can expose just the mapping from fds to list of pollables

Note that one libc socket can map to anywhere from 0 to 3 pollable resources depending on its state. With the proposed function signature the user can't know which pollable relates to which aspect of the socket.

calling socket oriented poll() API on any pollable (clock) feels confusing to me

POSIX poll is for any async I/O and not specific to sockets, so there wouldn't be any confusion for me. Regardless, I'd still go for option 1 or 2 as those accomplish the desired goal in a single call.

badeend · 2024-10-02T10:28:44Z

After discussing with @sunfishcode we came up with another alternative solution:
Rather than attempting to integrate external resources into the libc/POSIX polling mechanism, we could expose the libc-managed WASI resources for external use:

int32_t get_wasip2_input_stream(int fd); // wasi:io/[email protected]
int32_t get_wasip2_output_stream(int fd); // wasi:io/[email protected]
int32_t get_wasip2_filesystem_descriptor(int fd); // wasi:filesystem/[email protected]
int32_t get_wasip2_tcp_socket(int fd); // wasi:sockets/[email protected]
int32_t get_wasip2_udp_socket(int fd); // wasi:sockets/[email protected]
// etc..

The consumer is in charge of getting the pollables from them.

These come with some implicit assumptions that should be documented:

These should be considered specific to Preview2. They will not work on preview1 and may not work on preview3, depending on how the future pans out.
Any one of them may fail (with a negative value?). E.g.
- Calling get_wasip2_filesystem_descriptor on stdout will always fail.
- Calling get_wasip2_output_stream on a socket may fail if it isn't connected yet.
The resources remain "owned" by wasi-libc:
- The consumer should not drop these resources.
- Calling POSIX close invalidates any of these resources.
- Child resources (e.g. Pollables) derived from these resources must be dropped before calling POSIX close. Failing to do so may trap.

pavelsavara · 2024-10-02T11:46:59Z

If WASPp3 will call back on promises instead of central polling. And if that lands in few months, I'm happy to wait for that and drop this issue.

I'm not 100% clear about WASIp3 callbacks from streams. I'm worried about performance.
(I admit I didn't study WIT draft yet)

Maybe WASI callback would be cheaper than unix syscall ? (I understand that epoll is optimization of that)
Are we going to marshal and allocate new Promise for each event ?
Would there be any throttling (Nagle's algorithm) ? Or it could happen that stream sender (component) would trigger callback in receiver (component) for each byte if they push byte by byte ?

pavelsavara · 2024-10-02T11:49:20Z

Also the way how we hide those promises/streams and callback behind wasi-libc matters.

dicej · 2024-10-02T14:05:31Z

Maybe WASI callback would be cheaper than unix syscall ? (I understand that epoll is optimization of that)

If the read or write call can complete immediately (because there are bytes available or the host is ready to receive bytes, respectively), then there's no need for the host to allocate a task or call a callback. So we only pay the cost of allocating a task and calling the callback when we're I/O limited anyway. Also, e.g. Wasmtime is optimized to make these guest<->host calls as cheap as possible.

As usual, we'll want to benchmark and optimize as necessary if we do discover bottlenecks.

Are we going to marshal and allocate new Promise for each event ?

We will create a new task each time an async host function is unable to complete immediately, yes.

Would there be any throttling (Nagle's algorithm) ? Or it could happen that stream sender (component) would trigger callback in receiver (component) for each byte if they push byte by byte ?

I guess that depends on whether the host implementation of TCP streams buffers output (either in user space or at the kernel level).

badeend · 2024-10-02T14:18:19Z

We will create a new task each time an async host function is unable to complete immediately, yes.

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

dicej · 2024-10-02T14:24:55Z

We will create a new task each time an async host function is unable to complete immediately, yes.

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

Luke has not yet written down a spec for streams AFAIK, so I made one up for my prototype implementation, and that creates a new Task as necessary for each read or write. I'm trying to remember what Luke was planning for the final spec; perhaps it just uses the stream ID for callback notifications.

pavelsavara · 2024-10-02T14:25:42Z

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

dotnet Task is single resolve only.

dicej · 2024-10-02T14:26:57Z

dotnet Task is single resolve only.

Sure, but p3 tasks might not map one-to-one with .NET Tasks

badeend · 2024-10-02T14:28:53Z

@dicej Okay! It doesn't really matter for this issue. Was just curious.

lukewagner · 2024-10-02T16:09:44Z

With the exception of streams, right? I assume they will reuse the same Task for all reads/writes?

Right, the current proposal (which I'm trying to finish up and post) has:

stream.read $t: [readable-stream-index:i32 ptr:i32 n:i32] -> [result:i32]

where the result bits may indicate that stream.read blocked and that the caller needs to task.wait for the read to complete, where task.wait will return a "stream read" event-code along with the readable-stream-index and how much (<= n) was written into ptr. And after that, another stream.read can be started with the same readable-stream-index, and so on. However, bindgen should be quite able to bind each stream.read invocation to a single-shot task/future/promise that is resolved when the corresponding "stream read" event is returned by task.wait.

This implementation currently uses a mix of POSIX-style APIs (provided by `wasi-libc` via the `libc` crate) and WASIp2-native APIs (provided by the `wasi` crate). Alternatively, we could implement `Selector` using only POSIX APIs, e.g. `poll(2)`. However, that would add an extra layer of abstraction to support and debug, as well as make it impossible to support polling `wasi:io/poll/pollable` objects which cannot be represented as POSIX file descriptors (e.g. timer events, DNS queries, HTTP requests, etc.). Another approach would be to use _only_ the WASIp2 APIs and bypass `wasi-libc` entirely. However, that would break interoperability with both Rust `std` and e.g. C libraries which expect to work with file descriptors. Since `wasi-libc` does not yet provide a public API for converting between file descriptors and WASIp2 resource handles, we currently use a non-public API (see the `netc` module below) to do so. Once WebAssembly/wasi-libc#542 is addressed, we'll be able to switch to a public API. I've tested this end-to-end using https://github.com/dicej/wasi-sockets-tests, which includes smoke tests for `mio`, `tokio`, `tokio-postgres`, etc. Signed-off-by: Joel Dice <[email protected]>

pavelsavara mentioned this issue Sep 25, 2024

[WASI] sockets dotnet/runtime#106977

Merged

1 task

dicej mentioned this issue Oct 9, 2024

add wasm32-wasip2 support tokio-rs/mio#1836

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles #542

polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles #542

pavelsavara commented Sep 25, 2024 •

edited

Loading

badeend commented Sep 25, 2024

badeend commented Oct 2, 2024

pavelsavara commented Oct 2, 2024 •

edited

Loading

pavelsavara commented Oct 2, 2024

dicej commented Oct 2, 2024 •

edited

Loading

badeend commented Oct 2, 2024

dicej commented Oct 2, 2024 •

edited

Loading

pavelsavara commented Oct 2, 2024

dicej commented Oct 2, 2024

badeend commented Oct 2, 2024

lukewagner commented Oct 2, 2024

polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles #542

polling on mix of libc-posix socket/file descriptors with non-libc WASI pollable handles #542

Comments

pavelsavara commented Sep 25, 2024 • edited Loading

badeend commented Sep 25, 2024

badeend commented Oct 2, 2024

pavelsavara commented Oct 2, 2024 • edited Loading

pavelsavara commented Oct 2, 2024

dicej commented Oct 2, 2024 • edited Loading

badeend commented Oct 2, 2024

dicej commented Oct 2, 2024 • edited Loading

pavelsavara commented Oct 2, 2024

dicej commented Oct 2, 2024

badeend commented Oct 2, 2024

lukewagner commented Oct 2, 2024

pavelsavara commented Sep 25, 2024 •

edited

Loading

pavelsavara commented Oct 2, 2024 •

edited

Loading

dicej commented Oct 2, 2024 •

edited

Loading

dicej commented Oct 2, 2024 •

edited

Loading