Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP socket control #10

Open
l0kod opened this issue Jan 18, 2024 · 10 comments
Open

UDP socket control #10

l0kod opened this issue Jan 18, 2024 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@l0kod
Copy link
Member

l0kod commented Jan 18, 2024

We can now control TCP actions (bind(2) and connect(2)), and it would be useful to have a similar semantic for UDP. It's a bit tricky because of the datagram nature of UDP though.

However, it should not be an issue to check sendto(2) and recvfrom(2) for UDP because performant-sensitive applications should use connect/bind and send/recv instead, and we can check connect/bind without checking send/recv. Indeed, bind and connect can be used to configure an UDP socket, even if it is not a connected protocol. This approach enables to limit the number of kernel checks and copies.

We first need to check real use cases to validate these assumptions...

See https://lore.kernel.org/all/[email protected]/

Cc @BoardzMaster

@l0kod l0kod added the enhancement New feature or request label Jan 18, 2024
@sm1ling-knight
Copy link

sm1ling-knight commented Apr 1, 2024

I found out that send is used in linux kernel implementation of AFS when server sends some UDP messages (e.g. ACK) in response to a client's request. I'm not sure if there could be any overhead due to landlock LSM, but this could be an example of latency-sensitive app, that doesn't use pre-connected send calls.

In general, server can be used to respond promptly to requests from different clients and it would be unreasonable to establish pseudo-connection for every client. What do you think about this case?

@l0kod
Copy link
Member Author

l0kod commented Apr 2, 2024

I found out that send is used in linux kernel implementation of AFS when server sends some UDP messages (e.g. ACK) in response to a client's request. I'm not sure if there could be any overhead due to landlock LSM, but this could be an example of latency-sensitive app, that doesn't use pre-connected send calls.

Do you mean sendto? send should be OK. If sendto is used, the question is: could the related code be rewritten to use send instead (which may improve performance for the app)?

In general, server can be used to respond promptly to requests from different clients and it would be unreasonable to establish pseudo-connection for every client. What do you think about this case?

If a request is received by a server, wouldn't it be possible to just call send on the related socket?

@sm1ling-knight
Copy link

I found out that send is used in linux kernel implementation of AFS when server sends some UDP messages (e.g. ACK) in response to a client's request. I'm not sure if there could be any overhead due to landlock LSM, but this could be an example of latency-sensitive app, that doesn't use pre-connected send calls.

Do you mean sendto? send should be OK.

They use udp_sendmsg() under the hood which is equivalent to sendto in our case (since they both specify destination address for each call).

If sendto is used, the question is: could the related code be rewritten to use send instead (which may improve performance for the app)?

Yes, it can be rewritten using pre-connected mechanism, but i don't know how this will affect performance. In some cases, server could send only 1-2 ACK packets to the client, so it may be unreasonable to call connect for every client.

In general, server can be used to respond promptly to requests from different clients and it would be unreasonable to establish pseudo-connection for every client. What do you think about this case?

If a request is received by a server, wouldn't it be possible to just call send on the related socket?

Server can use single socket both for receiving and responding to messages. So, it must specify the destination address for each client via connect or sendto/sendmsg calls.

@sm1ling-knight
Copy link

Btw we may have some issues with UDP multicast sockets. From man connect(2):

If the socket sockfd is of type SOCK_DGRAM, then addr is the
address to which datagrams are sent by default, and the only
address from which datagrams are received.

This means that restricted sandbox can only use sendto calls for multicast sockets. What do you think?

@l0kod
Copy link
Member Author

l0kod commented Apr 5, 2024

Btw we may have some issues with UDP multicast sockets. From man connect(2):

If the socket sockfd is of type SOCK_DGRAM, then addr is the
address to which datagrams are sent by default, and the only
address from which datagrams are received.

This means that restricted sandbox can only use sendto calls for multicast sockets. What do you think?

Yes, we'll need to support sendto and recvfrom. Anyway, I think the performance impact might be OK because UDP sandboxing will be opt-in (as other firewalls), and performance-sensitive services may not want to use this Landlock feature. After some benchmarks (#24), we'll need to think about performance improvements. #1 should help, but we may want to look at LRU, bloom filters, and other algorithms (taking into account #16).

@mtth-bfft
Copy link

Hi Mickaël,

I've read the linked discussion threads, and I can try working on a first shot at the problem if that's still ok with your roadmap, e.g. considering #16 ?
My current understanding is roughly:

  • add a couple new LANDLOCK_ACCESS_NET_RECV_UDP and LANDLOCK_ACCESS_NET_SEND_UDP handled accesses based on struct landlock_net_port_attr
  • update the socket_connect hook accordingly
  • add socket_sendmsg and socket_recvmsg hooks to filter sendto, sendmsg, sendmmsg, recvfrom, recvmsg, recvmmsg whilst letting calls with no address specified (e.g. from send, recv) through

About performance: I haven't had time to look at AFS, but more generally if anyone has potential applications (clients or servers) that could have issues with such a patch, I'd be interested to benchmark it.
As a side note, for the usecase where an app needs to send/recv messages to too many hosts to be able to maintain a socket opened to each one, and has to send enough messages to each that LSM checks start having an impact, recvmmsg and sendmmsg both cache LSM verdicts and look like potential options.

I'll read up on general kernel dev in parallel, worst that can happen is I realize how much complexity is hidden within a few weeks and come back empty handed.

Cheers :)

@l0kod
Copy link
Member Author

l0kod commented Jun 24, 2024

Hi Mickaël,

Hi Matthieu!

I've read the linked discussion threads, and I can try working on a first shot at the problem if that's still ok with your roadmap, e.g. considering #16 ?

Sure! #16 is not a blocker.

My current understanding is roughly:

  • add a couple new LANDLOCK_ACCESS_NET_RECV_UDP and LANDLOCK_ACCESS_NET_SEND_UDP handled accesses based on struct landlock_net_port_attr
  • update the socket_connect hook accordingly
  • add socket_sendmsg and socket_recvmsg hooks to filter sendto, sendmsg, sendmmsg, recvfrom, recvmsg, recvmmsg whilst letting calls with no address specified (e.g. from send, recv) through

Yes, that's the idea.

About performance: I haven't had time to look at AFS, but more generally if anyone has potential applications (clients or servers) that could have issues with such a patch, I'd be interested to benchmark it. As a side note, for the usecase where an app needs to send/recv messages to too many hosts to be able to maintain a socket opened to each one, and has to send enough messages to each that LSM checks start having an impact, recvmmsg and sendmmsg both cache LSM verdicts and look like potential options.

We could indeed rely on caching, but let's ignore this optimization for now. 😉

About the performance impact, I think we should start with a first implementation and do some benchmarks to see the impact. I think it should be OK for most use cases, and controlling UDP (like other Landlock restrictions) will always be optional anyway. @sm1ling-knight is working on #24, which should help benchmark worse cases.

A simple optimization for pseudo-connected UDP sockets would be to tag the sockets when calling bind() and connect(), and check this tag on the sendmsg() and recvmsg() LSM hooks. That would not work for pure sendmsg() and recvmsg() use cases, but that should not be any worse than other access controls. 🤷

This makes me think about more appropriate access rights: LANDLOCK_ACCESS_NET_BIND_UDP (explicit call to bind()) and LANDLOCK_ACCESS_NET_SENDTO_UDP (controlling both connect() and sendto() calls).

I'll read up on general kernel dev in parallel, worst that can happen is I realize how much complexity is hidden within a few weeks and come back empty handed.

I think it should be a good first contribution. We now have all the mechanic in place. The main new think that we may need would be a generic socket blob (i.e. there is no field dedicated to sk_security in lsm_blob_sizes for now) to be able to use Landlock with other LSMs (and enable each of them to tag the same socket), but you should start without implementing this part for now.

Cheers :)

Thinking more about this new access rights, here are more thoughts:

With TCP (i.e. connected protocol), we can deny arbitrary peer from sending data to a sandboxed process if we forbid LANDLOCK_ACCESS_NET_BIND and (upcoming) LANDLOCK_ACCESS_NET_LISTEN_TCP. (In practice this can be bypassed if a (remote) peer can spoof the connection, but in this case we have a more serious issue and sandboxing cannot help.)

With UDP (and other unconnected protocol), the local port can be set with a bind() call and the destination with either sendto/sendmsg() or connect() calls. However, without explicit binding, I guess the socket will automatically be binded to a port defined by /proc/sys/net/ipv4/ip_local_port_range when not explicitly calling bind(). For instance, I think the following scenario works (but it should be tested):

  • s = socket(UDP)
  • sendto(s, port-2000) // auto-binding to port 33000 (according to /proc/sys/net/ipv4/ip_local_port_range)
  • recvfrom(s) // get data from anyone sending to port 33000

If this is correct, we should think about a complementary access right to control automatic binding with sendto().

What do you think?

Cc @sm1ling-knight @gnoack

@mtth-bfft
Copy link

mtth-bfft commented Aug 4, 2024

Hi Mickaël,

Just a follow-up as I now have a few bits of answers and I see time flew since my last message.

With UDP (and other unconnected protocol), the local port can be set with a bind() call and the destination with either sendto/sendmsg() or connect() calls. However, without explicit binding, I guess the socket will automatically be binded to a port defined by /proc/sys/net/ipv4/ip_local_port_range when not explicitly calling bind().
[...]
If this is correct, we should think about a complementary access right to control automatic binding with sendto().

An implicit call equivalent to bind(0) is made when sending a first datagram, or when using connect(). I tested it, and it does indeed allow the process to receive datagrams, without any explicit bind. For example, the exact call chain can be inet_sendmsg() -> inet_send_prepare() -> inet_autobind() -> udp_v4_get_port() which assigns a port. There is no LSM hook in any of this code, so unless we add one, Landlock won't prevent any process with >= 1 UDP access right from binding to any port in /proc/sys/net/ipv4/ip_local_port_range. Since already bound ports cannot be assigned as ephemeral ports (also tested), I guess the impact is being able to snoop on traffic not directed to the host (e.g. replies for another host on a hub network, or replies in multicast/broadcast). Not great, not terrible®.

This makes me think about more appropriate access rights: LANDLOCK_ACCESS_NET_BIND_UDP (explicit call to bind()) and LANDLOCK_ACCESS_NET_SENDTO_UDP (controlling both connect() and sendto() calls).

While working on a first patch for this model, I found two usecases in which the results are sub-optimal:

  • I need to bind() to set the source port of the packets I will send (e.g. because my protocol requires it, like multicast DNS). This means I need LANDLOCK_ACCESS_NET_BIND_UDP for that port. But now I've given the program the right to receive traffic on that port, too, even if it only transmits;
  • I need to connect() to set the IP address I want to receive traffic from (e.g. a client's IP address because I create client-specific sockets over a main unconnected one). This means I need LANDLOCK_ACCESS_NET_SENDTO_UDP for that port, but now I've given the right to send traffic, even if it only receives.

Since bind() can be needed to send traffic, and connect() can be needed to receive traffic, I guess the only way to treat these two cases correctly is to have security blobs in sockets and tag them like you said. That's for another day, first I'll try to finish a V0 patch.

@l0kod
Copy link
Member Author

l0kod commented Aug 7, 2024

Thanks for the heads up.

  • I need to bind() to set the source port of the packets I will send (e.g. because my protocol requires it, like multicast DNS). This means I need LANDLOCK_ACCESS_NET_BIND_UDP for that port. But now I've given the program the right to receive traffic on that port, too, even if it only transmits;

Good point. LANDLOCK_ACCESS_NET_RECVFROM_UDP might be more appropriate in this case. LANDLOCK_ACCESS_NET_BIND_UDP could still be useful for explicit binding (i.e. not ephemeral ports) to control available host-side ports and avoid service impersonation.

  • I need to connect() to set the IP address I want to receive traffic from (e.g. a client's IP address because I create client-specific sockets over a main unconnected one). This means I need LANDLOCK_ACCESS_NET_SENDTO_UDP for that port, but now I've given the right to send traffic, even if it only receives.

OK, so similarly we could also have a LANDLOCK_ACCESS_NET_CONNECT_UDP to only configure the socket with an explicit call to connect(2).

These two use cases are good examples and should be documented in a dedicated patch.

Since bind() can be needed to send traffic, and connect() can be needed to receive traffic, I guess the only way to treat these two cases correctly is to have security blobs in sockets and tag them like you said. That's for another day, first I'll try to finish a V0 patch.

I think with these four new UDP access rights we should be fine. The documentation should highlight that connect and bind for UDP is only about socket configuration, not sending nor receiving. What do you think?

Socket tagging would be an optimization for when send(2) or recv(2) is used on a configured socket instead of sendto(2) or recvfrom(2) (or similar unconnected syscalls). But yes, no need to add that for the first patch series.

About tagging:

The main new think that we may need would be a generic socket blob (i.e. there is no field dedicated to sk_security in lsm_blob_sizes for now) to be able to use Landlock with other LSMs (and enable each of them to tag the same socket), but you should start without implementing this part for now.

The infrastructure management of the sock security landed in LSM next, so it should be part of Linux 5.12 and we can use this blob in Landlock. I guess it will not be needed though, nor landlock_file(sock->file)->allowed_access.

l0kod pushed a commit that referenced this issue Sep 9, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
@l0kod
Copy link
Member Author

l0kod commented Sep 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In review
Development

No branches or pull requests

3 participants