-
Notifications
You must be signed in to change notification settings - Fork 1.6k
PVF: Move landlock out of thread into process; add landlock exceptions #7580
base: master
Are you sure you want to change the base?
Conversation
- [X] Add check for whether --socket-path or --cache-path are `None` (both are required)
- [X] Fix some compile errors - [X] Update some docs - [X] Update landlock tests
if !artifact_path.starts_with(cache_path) { | ||
return Err(io::Error::new(io::ErrorKind::Other, format!("received an artifact path {artifact_path:?} that does not belong to expected artifact dir {cache_path:?}"))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: I wonder if an io::Error is the most appropriate error here. To change it we would have to change the error type of this function (would need a new error kind) which would be a bit annoying. But since this would be a logic bug from our side it can be an assert!
instead.
This PR adds a landlock exception for the PVF artifacts cache directory. We have to do it for the whole directory because, at the time of process startup, we of course can't know yet which artifacts are going to come in. Everything outside this directory is still totally restricted for the process. This has one issue which is that execution jobs, which need read permissions on the cache directory, can now read other artifacts in the directory. As different validators may have different dir contents this can be a source of randomness for attackers. This makes it possible to attack the chain by crafting a PVF+candidate which results in arbitrary code execution which: reads the contents of the cache directory, uses it to seed some randomness, and causes the execution job to vote for or against the candidate with 50% chance, thus stalling the chain which assumes that at least 66% of validators are trustworthy and not compromised We could spwn a brand new process for each job, but that seems like a lot of overhead and I'm not sure it's feasible. cc @s0me0ne-unkn0wn But, there are so many sources of randomness, that I have been thinking to abandon this goal to avoid it. In the end we would need virtualization and that is not a free win but comes with challenges. Perhaps it is better to rely on governance to deal with attacks on the chain. cc @eskimor I forget if we discussed this already or not At any rate, securing validators themselves is a priority right now so we need this landlock fix. |
Spawning a new process is not that much overhead. Spawning a new process for each and every http request as some webservers do is quite bad, but if the useful work is in the hundreds of milliseconds to seconds, it should be negligible. |
@eskimor according to our metrics, execution jobs are on the order of 10ms - so spawning a process may not be so negligible. I also remember that when @s0me0ne-unkn0wn was redesigning the execution job queue, one of the goals was to reuse the same worker process as much as possible for jobs. I don't remember the details at this point, but @s0me0ne-unkn0wn can surely provide more insight to how feasible is a new process for each job. |
Spawning a new process on Unix is a But there are two variables here, one new and one old. The old one is the parachain logic. If it's purely computational, it's likely that spawning a new process overhead will take more time than the execution itself. But if the parachain does some i/o ops, calling host functions reading and writing storage, it may turn the other way. The new variable is separate binary workers. Their memory footprint is small and that may reduce that overhead significantly and we might see it's negligible now. But again, those are empirical estimations. It's easy to check, just build a version with the execution worker gracefully exiting after every execution, burn it in on Versi, and compare the results. |
Well, I've probably screwed up the explanation somewhat. We still need to fork the main process (not the whole process indeed, only a calling thread) to spawn a new process from an external binary. But prior to worker separation, we needed that forked process to execute the whole |
Cool, I've raised an issue here: paritytech/polkadot-sdk#584. It's a follow-up because it doesn't block this PR, and it would likely come with changes to the execution queue logic, so it should be a separate PR. Thanks @s0me0ne-unkn0wn! |
TODO: The |
Allow an exception for reading from the artifact cache, but disallow listing the directory contents. Since we prepend artifact names with a random hash, this means attackers can't discover artifacts apart from the current job.
We already checked whether landlock is enabled in the host. We can therefore only throw an error here if landlock is enabled and expected to work. Otherwise we shouldn't even log here, as errors are already logged in the host, and is just noise here.
This is an attempt at an improved chroot jail that doesn't require root, but still allows us to use sockets and artifacts from the host.
The CI pipeline was cancelled due to failure one of the required jobs. |
if libc::unshare(libc::CLONE_NEWUSER) < 0 { | ||
return Err("unshare user namespace") | ||
} | ||
if libc::unshare(libc::CLONE_NEWNS) < 0 { | ||
return Err("unshare mount namespace") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- These are flags so this can be done in a single
unshare
call. - Since this is also switching to a new user namespace this function should be renamed as it's not strictly only changing the root.
- On some distros this could actually fail (due to user namespaces being disabled). In this case we'd probably want to abort execution unless a flag is passed on the command-line?
let cache_path_c = CString::new(cache_path.as_os_str().as_bytes()).unwrap(); | ||
let root_absolute_c = CString::new("/").unwrap(); | ||
// Append a random string to prevent races and to avoid dealing with the dir already existing. | ||
let oldroot_relative_c = CString::new(format!("{}/oldroot-{}", cache_path_str, s)).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use join
or push
instead of manually merging these with /
.
let mut buf = Vec::with_capacity(RANDOM_LEN); | ||
buf.extend(rand::thread_rng().sample_iter(&Alphanumeric).take(RANDOM_LEN)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to do a .collect
here directly into the buffer.
if libc::mkdir(oldroot_relative_c.as_ptr(), 0755) < 0 { | ||
return Err("mkdir oldroot") | ||
} | ||
if libc::syscall( | ||
libc::SYS_pivot_root, | ||
cache_path_c.as_ptr(), | ||
oldroot_relative_c.as_ptr(), | ||
) < 0 | ||
{ | ||
return Err("pivot_root") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call the pivot root syscall with both paths being "."
then this mkdir
should be unnecessary.
Currently we apply sandboxing per-thread, when it should be per-process. This shouldn't be a big change, we just need sandboxing exceptions for the artifacts/cache directories.
Without this, the sandboxing we have with landlock is not really secure.
TODO
We can just pass around a cloned config instead of all the values separately. Linkassert!
instead ofio::Error
. Link--artifact-dir
should be passed through the handshake instead of through a CLI param. LinkRelated
Closes paritytech/polkadot-sdk#600