Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LKRG immunity module & kernel vaccination #150

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

milabs
Copy link

@milabs milabs commented Jan 20, 2022

Description

The idea behind the immunity module is the following: make it difficult for the attacker to identify target system's kernel version so it would be hard or even not possible to find proper offsets and even determine the vulnerability presence.

There are 2 things implemented in the commit:

  • /boot, /lib/modules folders access restriction (root only access allowed)
  • /proc/cmdline data forged

Few more things to do:

  • forge utsname()
  • restrict access to kernel sources directory, if any
  • something else?

How Has This Been Tested?

Tested on Ubuntu 20.04 (5.11.0-25-generic)

@solardiz
Copy link
Contributor

Thank you, @milabs! This is a welcome contribution.

A related task in my LKRG notes was summarized as "protect /proc/kallsyms, /boot/System.map*". This PR takes care of /boot/System.map* and more, but doesn't yet take care of /proc/kallsyms... which may be already partially taken care of by upstream's kernel.kptr_restrict = 2 on kernels recent enough to have that (I think some of the older kernels we support in LKRG lack it?)

Just to give some context, IIRC Adam and I debated briefly via private e-mail about whether restrictions on filesystem permissions should be in LKRG itself or maybe in its startup script. The latter would be less invasive, but then the restrictions wouldn't be put in place when LKRG is loaded by different means, and wouldn't be consistently undone on LKRG unload. Another question that came up was about /proc and it being one vs. multiple instances (shared vs. per-mount permissions), which IIRC changed with some kernel hardening changes proposed a few years ago (did they get in?) - /proc entry permissions used to be shared between its multiple mounts, but there was intent to make them per-mount. Per-mount would be a problem for the script approach, as we'd leave additional /proc mounts unprotected then.

Also in my notes is this subtly related task: "hide=1 should hide sysctl's from non-root" (no, I don't suggest adding that to this same PR - it should be separate).

I guess @a13xp0p0v might also have comments on this.

I have mixed feelings about the vaccine/immunization naming used here. On one hand, it's fine pun on the current covid situation. On the other hand, LKRG should outlive covid, so the pun would be obsolete (hopefully soon). Perhaps more importantly, LKRG's other features and LKRG as a whole could also fit this naming - so it could be illogical to use it for this specific feature only. Anyway, the separation into modules might be reconsidered or even gone in our planned code cleanups - by the way, something we'd appreciate all contributors' input on.

An issue to fix, CentOS 7:

  CC [M]  /__w/lkrg/lkrg/src/modules/immunity/p_immunity.o
/__w/lkrg/lkrg/src/modules/immunity/p_immunity.c: In function 'p_vaccinate':
/__w/lkrg/lkrg/src/modules/immunity/p_immunity.c:68:3: error: implicit declaration of function 'proc_create_single' [-Werror=implicit-function-declaration]
   proc_create_single("cmdline", 0, NULL, p_cmdline_proc_show);
   ^
cc1: some warnings being treated as errors
make[2]: *** [/__w/lkrg/lkrg/src/modules/immunity/p_immunity.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [_module_/__w/lkrg/lkrg] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-1160.53.1.el7.x86_64'

@Adam-pi3
Copy link
Collaborator

@solardiz maybe it's a good time to have a list what else do we want to protect. Additionally, maybe we want to make this feature configurable where user can add specific file / directory which wants to protect (e.g., log file directory)?

@wladmis
Copy link
Contributor

wladmis commented Jan 20, 2022

Isn't this directory access restriction overmuch? This should be done by proper file permissions, should LKRG try to solve all issues and possible misconfigurations (and I think that permissive access to sensitive directories is a misconfiguration)?

@solardiz
Copy link
Contributor

maybe it's a good time to have a list what else do we want to protect.

Yes, but not necessarily before merging this PR.

Additionally, maybe we want to make this feature configurable where user can add specific file / directory which wants to protect (e.g., log file directory)?

No, I think that's a different feature. At least as currently implemented in this PR, the protection here is only against non-root. If a sysadmin wants to explicitly protect e.g. a log file directory from non-root, they can just chmod it.

@solardiz
Copy link
Contributor

@wladmis I think it's reasonable for LKRG itself to restrict access to the most obvious and generic places that would provide information on attacking the kernel and bypassing LKRG. We can't expect most systems/sysadmins to manually harden permissions in addition to loading LKRG. Conversely, we can expect canned exploits to automatically look into those obvious and generic places to bypass LKRG.

@milabs
Copy link
Author

milabs commented Jan 24, 2022

Hey, thanks for replying. I should have started with the description, so let me clarify the idea.

Most of kernel exploits start with information gathering phase which consists of steps to get target system kernel version which is used to align various offsets (for example, to build a ROP chain). Here is what comes to my mind when it comes to target's information collection:

  • /boot folder examination (might have kernel versions + system.map's)
  • /proc/cmdline examination (might have kernel version)
  • /proc/kallsyms examination (might have kernel symbols addresses)
  • /proc/kmsg & /dev/kmsg examination (access to kernel log, might have kernel version & addresses)
  • /etc/lsb_release or similar examination (might have information about distro & version)
  • uname(2) examination (kernel version)

So, I thought that it might be easy yet extremely useful in terms of exploitation prevention to alter system's behavior to protect listed (and similar) information and/or even fake it. For example, altering uname(2) allows to change minor kernel version which has no sense but usually hardcoded in exploit's code (5.11.0-25-generic ...).

UPD: Few words about naming of such a feature as "immunity". For me altering kernel's behavior in a way that it becomes resistant to exploits is a kind of vaccination resulting in kernel's immunity.

@wladmis
Copy link
Contributor

wladmis commented Jan 24, 2022

Hey, thanks for replying.

Hi!

I should have started with the description, so let me clarify the idea.
Most of kernel exploits start with information gathering phase which consists of steps to get target system kernel version which is used to align various offsets (for example, to build a ROP chain). Here is what comes to my mind when it comes to target's information collection:

* `/boot` folder examination (might have kernel versions + system.map's)

* `/proc/cmdline` examination (might have kernel version)

* `/proc/kallsyms` examination (might have kernel symbols addresses)

* `/proc/kmsg` & `/dev/kmsg` examination (access to kernel log, might have kernel version & addresses)

* `/etc/lsb_release` or similar examination (might have information about distro & version)

* `uname(2)` examination (kernel version)

So, I thought that it might be easy yet extremely useful in terms of exploitation prevention to alter system's behavior to protect listed (and similar) information and/or even fake it. For example, altering uname(2) allows to change minor kernel version which has no sense but usually hardcoded in exploit's code (5.11.0-25-generic ...).

UPD: Few words about naming of such a feature as "immunity". For me altering kernel's behavior in a way that it becomes resistant to exploits is a kind of vaccination resulting in kernel's immunity.

What do you think about filtering of reading of /etc/passwd and /etc/group files? For example, when some process read them, it read only those lines that match this process credentials. Otherwise an attacker can inspect what logins are members of privileged groups. The main issue of this proposal is possible complexity of its implementation.

@milabs
Copy link
Author

milabs commented Jan 24, 2022

The main issue of this proposal is possible complexity of its implementation.

@wladmis Seriously? It takes 1% of LKRG complexity to implement that and it gives 99% of LKRG's protection level agains known exploits and probably most of yet unknown (except things like dirtycow, though LKRG doesn't fit it either).

@wladmis
Copy link
Contributor

wladmis commented Jan 24, 2022

The main issue of this proposal is possible complexity of its implementation.

@wladmis Seriously? It takes 1% of LKRG complexity to implement that and it gives 99% of LKRG's protection level agains known exploits and probably most of yet unknown (except things like dirtycow, though LKRG doesn't fit it either).

Sorry, that notice about complexity was about my proposal of reading filtering, not about your. Sorry for the ambiguous reply.

@milabs
Copy link
Author

milabs commented Jan 24, 2022

To be clear: that notice about complexity was about my proposal of reading filtering, not about your.

@wsandin Got it, sorry.

@solardiz
Copy link
Contributor

Adam and I briefly discussed this PR yesterday. While we like it in principle, we're unsure about merging it as-is because:

  1. We might want to first agree on the naming and sysctl, see below.
  2. The RHEL7 build needs to be fixed - otherwise we'd break it by merging this. (The mainline failure seen here is unrelated and we've already fixed it, so would be gone on merge.)

naming of such a feature as "immunity". For me altering kernel's behavior in a way that it becomes resistant to exploits is a kind of vaccination resulting in kernel's immunity.

This makes sense, but most LKRG features contribute to that vaccination and (partial) immunity, so it's weird to single out this one feature for that naming. Also, to someone who doesn't yet know what this feature does and how it differs from other LKRG features, the "vaccination" setting name would not tell anything additional.

I think this new feature is most closely related to LKRG's "hide", so maybe we can enhance "hide" to cover it or maybe we can come up with consistent new naming for both. Suggestions are welcome.

@milabs
Copy link
Author

milabs commented Jan 27, 2022

@solardiz I'm going to make other implementation with more features added so closing this PR which acted as initial discussion.

@milabs milabs closed this Jan 27, 2022
@solardiz
Copy link
Contributor

@milabs OK, and looking forward to that, although it'd be easier for us to review and merge a PR that doesn't yet have "more features added" (they can be added with separate PRs later). This now-closed PR of yours was very close to what's needed from a first PR adding this set of features - like I wrote, only the naming and portability issues needed to be addressed before merging.

@milabs
Copy link
Author

milabs commented Jan 28, 2022

@solardiz @Adam-pi3 @wladmis

Please, review feature prototype based on KHOOK:

https://github.com/milabs/khook/tree/dev-lkrg

@milabs milabs reopened this Jan 28, 2022
@solardiz
Copy link
Contributor

@milabs Thank you! Since this is a prototype, I assume it's for us to discuss its feature set rather than code, so here are some observations (for others reading this who might not have looked at the code yet) and thoughts. Please correct me if I got anything wrong, which I very well may have.

  1. Access to /dev/kmsg is blocked for non-root. The kernel's Documentation/ABI/testing/dev-kmsg says:
Users:          dmesg(1), userspace kernel log consumers

Using strace dmesg on an older system, I see it only uses syslog(2), not /dev/kmsg (which apparently didn't exist until 2012, per the documentation above). On a newer system, it goes through:

open("/dev/kmsg", O_RDONLY|O_NONBLOCK)  = -1 EPERM (Operation not permitted)
syslog(SYSLOG_ACTION_SIZE_BUFFER, NULL, 0) = -1 EPERM (Operation not permitted)
syslog(SYSLOG_ACTION_READ_ALL, 0x6135ec035de0, 16392) = -1 EPERM (Operation not permitted)
  1. The list of restrict_inodes looks about right to me - it's several /proc entries, /boot, and /lib/modules. They're restricted by forcing them to be kept in dcache and forcing the 077 mode bits off, and:

  2. Somehow both restrict_devs (currently only /dev/kmsg) and the restrict_inodes are also checked on inode_permission. I guess for restrict_inodes that's redundant, isn't it, or can the patched cached entries somehow be bypassed? While we hook that function in LKRG anyway, adding more checks to it would have additional performance impact.

  3. Moreover, chmod of the restrict_inodes is disallowed in hooked chmod_common. With this, it becomes not only hardening of defaults, but also policy enforcement against sysadmin or a distro package, etc., relaxing the settings. This feels excessive, but maybe is a valid setting to have.

  4. syslog(2) (and thus dmesg) is disallowed for non-root.

  5. On systems I've checked, /proc/kmsg is mode 400 as-is, so what are we restricting? Maybe the distinction between host vs. container root (make it unavailable to container root)? Also, is this perhaps why we're checking on inode_permission, not merely rely on the hardened modes? (That would answer my question in 3.)

  6. For utsname(), the kernel build time is patched to be the module's build time instead, and the kernel version's "sublevel" (third) component is patched to be a random number from 0 to 255. This is a reasonable proposal, but I think we should also allow the sysadmin to provide an arbitrary kernel version string, similar to OpenVZ's kernel.virt_osrelease sysctl, which I used to set to indicate kernel ABI version to libc and nothing else - so it can be even less specific than the real first two kernel version number components. For the default hardening, the proposed timestamp replacement and sublevel randomization makes sense.

  7. There's the // TODO: http://linuxmafia.com/faq/Admin/release-files.html comment, which basically asks: should we also restrict access to files with distro version info, like /etc/*-release? I'm not sure. Probing for e.g. glibc (package) version can be even more revealing and specific, and that wouldn't be covered by restrictions of access to those files. However, what would typical exploits probe?

Overall, this prototype is helpful in bringing up so many little questions and design decisions for us to make. It's quite concentrated in terms of those per line of code. Good stuff.

I might also make some nitpicks on code in GitHub inline comments over the referenced link. Those are not essential to the discussion here.

@solardiz
Copy link
Contributor

Also, we need to discuss the bigger picture to decide on most suitable naming and ways to control these and related features. Here are some related categories of potential features, off the top of my head:

  1. LKRG's existing "hide" - make the LKRG module invisible (but easily detectable indirectly, most obviously via sysctl).
  2. Hiding LKRG's sysctl's (which implies making it "impossible" for a sysadmin to alter them? of course, other than by hacks like having another LKM patch their values).
  3. Hiding kernel info (the set of features proposed via the prototype above).
  4. Hiding non-kernel-focused userland info (the TODO comment above, etc.) - but maybe we should not?
  5. Making it "impossible" to unhide kernel info (by chmod or by changing the sysctl, although the latter only matters when LKRG sysctl's are not hidden? so does not need to be a separate feature?)
  6. Making it impossible to unload LKRG.
  7. Making it impossible to load other kernel modules (lkrg.block_modules, kernel.modules_disabled) or patch the kernel (other than via a kernel or previously-loaded module vulnerability).

Maybe these are separate, or a bitmask in one or several settings, or protection levels in several settings (like two - one for LKRG itself, the other for the kernel).

@solardiz
Copy link
Contributor

several settings (like two - one for LKRG itself, the other for the kernel).

Or the grouping could be different: one setting for protection set (what to hide from non-root) and the other for policy (against root, so whether to disallow unhiding/chmod, unloading, loading/patching).

@Adam-pi3
Copy link
Collaborator

I would also consider some of the /sys interfaces (e.g., /sys/module) but probably more...

@solardiz
Copy link
Contributor

We can definitely consider more, but I think our priority is to get some basic functionality into a form that can be merged without breaking things (e.g., not even RHEL7) and preferably without us ending up introducing incompatible changes to the new sysadmin knobs soon thereafter. We can then add more with separate PRs.

@solardiz
Copy link
Contributor

BTW, the breakage of RHEL7 here was in code forging /proc/cmdline contents. The new prototype doesn't have that code, instead restricting access. I guess the assumption was that some non-privileged programs (perhaps via libraries?) read /proc/cmdline and would break if they can't read it? If so, what are those, or was this assumption found to be, uh, unfounded?

@milabs
Copy link
Author

milabs commented Jan 29, 2022

@solardiz I've found recently that hooking security_syslog allows to restrict dmesg effectively without need to alter /dev/kmsg and /proc/kmsg access rights. So we could stick to that approach only. As for /proc/cmdline and /proc/version, I don't know is it safe to return -EPERM preventing access to those files or forging (filtering) their content required. Any applications you know which require /proc/{cmdline,version} to work?

@solardiz
Copy link
Contributor

@milabs As I recall, glibc cares (or at least used to care) about kernel version, but I don't know by what means it obtains that - we need to check. Some system info is available to glibc via AUXV at program startup, but apparently not kernel version.

@milabs
Copy link
Author

milabs commented Jan 30, 2022

@solardiz @Adam-pi3 @wladmis

Have a look at v2 version of the feature prototype:
https://github.com/milabs/khook/commits/dev-lkrg (3e6987)

Implemented:

  • dmesg restriction (security_syslog hook)
  • files/directories restriction (inode_permission, chmod_common hooks)
  • seq-files forging for /proc/{cmdline,version} cases (single_open hook)
  • utsname forging (see the note below)

Note on utsname. I've found that utsname forging is not implemented correctly as it changes values globally so root will get the same changed value as non-root user. And here is a question of mine how do you think it's better to implement it.

@Adam-pi3
Copy link
Collaborator

Thanks for your work @milabs. Some of my notes:

  1. Your previous PR was compatible with LKRG, this one is using KHOOK as hooking layer, which can't be merged. Would you be able to use LKRG compatible hooking mechanism?
  2. utsname is problematic but in theory you can use similar trick with checking UID as you do in other places. It can create a weird race when UID 0 and != 0 is trying to read the same structure and hooking function changes it twice which can create interesting results. In practice, it should be a rare situation.
  • One of the solution could be to hook syscall which return the value instead of overwriting the structure itself.
  • Another one would be to 'lie' to root as well.
  1. Current logic of hooking utsname does not include the logic of supporting namespaces. We might need to have it as well. @solardiz what do you think?
  • If yes, our namespace monitoring logic might need to be taken into account (dunno, need to check it, maybe it won't)
  1. As a sysctl name, as @solardiz mentioned, it might be a sub-option for p_hide or something new compatible with LKRG like p_block_info (block information disclosure). What do you think @milabs and @solardiz ?
  2. Thanks for your work :)
  • Adam

@milabs
Copy link
Author

milabs commented Jan 31, 2022

@solardiz @Adam-pi3 @wladmis

Have a look at v3 version of the feature prototype:
https://github.com/milabs/khook/commits/dev-lkrg (d62f7c)

Implemented:

  • utsname forging (via __do_sys_{uname,newuname} hooks)

Looks for me as a semi-final prototype version. Going to start working on making the change LKRG.

@solardiz
Copy link
Contributor

solardiz commented Feb 9, 2022

I'm sorry I didn't review/comment for a while. I just took a close enough look to comment. As I understand, the utsname forging issues discussed previously are gone in v3: the code no longer modifies the shared structure, but instead reimplements the syscalls calling the underlying utsname() function and making and modifying a copy of what it returns, and that copy is local to the non-root case. That's great.

I think there's still a race condition, though: if the syscall is made by 2+ non-root tasks concurrently, then unforged or partially-forged data may leak: when one invocation proceeds to copy_to_user, another may still be at memcpy(&tmp, utsname(), sizeof(tmp)); and would temporarily unforge the data. This can probably be fixed by moving up_read(kinfo.sem); to right after copy_to_user.

@Adam-pi3 Regarding supporting namespaces, I assume that was about CLONE_NEWUTS - if so, my understanding is that now the underlying utsname() will take care of that, and we'll forge on top of that. I think that's enough, at least initially. And it's another reason to keep the semaphore until after copy_to_user: the original data we're forging might differ in concurrent calls.

@milabs You hook two of the *uname syscalls, but I think on archs that have two there's also a third one: olduname. (I think on arm* there's just one?) Also, you hook the underlying __do_sys_ functions, but I think they're very likely to be optimized out and inlined (so you might not be able to look them up, or the hooks might not be in all the right places). I think you'll need to be hooking P_GET_SYSCALL_NAME(uname), etc.

Regarding changes in v2, I wonder if using private_data for old function pointer is (future-)safe. Do we risk clashing with the kernel's own use of this later, and potentially in a way where we'd have a vulnerability (would pass control over some pointer that isn't what we wrote)? If so, any ideas that would avoid this risk?

For the knobs, while we could logically make hide a bitmask and fit this in there, it has a drawback that changing just one of these sub-settings would then be tricky (would require a read-modify-write e.g. by a script using sysctl, which is also racy). So maybe the new setting for hiding (and forging?) kernel details should be hide_kernel? Or maybe forge? So hide would be for LKRG itself, and forge would be for LKRG hiding and forging kernel details. Then the "sub-module" name in the LKRG source tree could also be forge. Sounds good to you, @milabs?

Thank you!

@solardiz
Copy link
Contributor

I think there's still a race condition, though

No, I'm wrong. Somehow I thought tmp was static, but that struct is actually local on the stack. Then we're fine.

@Adam-pi3
Copy link
Collaborator

@solardiz @milabs when we deploy direct hook on the syscall, we must take into account potential COMPAT and X32 support. Each syscall is potential 3 hooks, 3 syscalls is 9 hooks. That's one of the reason that I tried to avoid direct syscall hooks. However, sometimes that's the only viable option.

About the name, what about forge_info instead of forge ?

@solardiz
Copy link
Contributor

About the name, what about forge_info instead of forge ?

If we go for more than one word, then I'd prefer hide_kernel.

We could then also have hide_distro for /etc/*-release, etc. if we choose to have that functionality at all. I'm unconvinced we'd want to have that, but I thought I'd mention how it'd fit in the naming anyway.

@sempervictus
Copy link

sempervictus commented Feb 17, 2022

Thanks @milabs - this is cool stuff, looking forward to more standoff in public builds.

FS restrictions are kind of fraught in that its very hard to predict what actually depends on the real data being obfuscated to work properly and whether that something is even in the userspace intended to run atop the kernel or some container of another userspace hatefully thrown into the mix to make the kernel's life that much worse.
One of the things grsec adds atop pax is this class of protections - restrictions for various contexts to access parts of the VFS in order to prevent infoleaks or worse yet actual RW access. Back in the day, we used to turn most of these off due to the myriad of weird problems when building software (build systems like to have this data for build targeting), running containers, etc. These days, OSS has cleaned up the edge cases a lot and its much easier to enable them across the range of build targets, though we still turn them off for certain buildbots and such as they can still break things.
Something to consider with forgery and obfuscation/access restriction is the built-in use case: how will these things play at early-init when the code isn't a module at all, but built directly into the binary? What depends on that data being accurate during init both in the kernel and userspace, and how much surface does red team get to get at the real data before obfuscation if there's a need to delay the effect of this work for built-in? I tend to be the crash-test dummy for that situation, so bringing it up in the PR comments 😄
Do we have any sort of systems test harnesses which can spin up complicated host-level workloads for compatibility testing? I'm a fan of what this does, just would like to be fairly certain that its safe to enable on production systems running the kitchen sink of workloads.

EDIT: we sometimes use RSBAC to provide the policy-based MAC functionality when a target environment can't use real RBAC (therefore in environments where LKRG plays), so it might be worth looking into compatibility with them as well since hardened kernels are often built out of various pieces, and that's another good public source of defensive functionality.

@milabs
Copy link
Author

milabs commented Feb 29, 2024

It's been a while since the idea was suggested and recently I had some time to implement what was discussed more or less nicely. Please, guys have a look to the module I've posted:

https://github.com/milabs/kiddy

@solardiz @Adam-pi3 @wladmis @sempervictus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants