Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernelCTF: Add CVE-2024-39503 LTS/COS #125

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
311 changes: 311 additions & 0 deletions pocs/linux/kernelctf/CVE-2024-39503_lts_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
# CVE-2024-39503

Exploit Documentation for CVE-2024-39503 against lts-6.6.30 / cos-109-17800.218.20 instance.

## Stage 1: Triggering the vulnerability

As described in the vulnerability documentation, we are targeting a race condition in the ip set
subsystem. A successful trigger would result in a user-after-free on a `struct ip_set` in
`kmalloc-192`.
```c
/* A generic IP set */
struct ip_set {
/* For call_cru in destroy */
struct rcu_head rcu;
/* The name of the set */
char name[IPSET_MAXNAMELEN];
/* Lock protecting the set data */
spinlock_t lock;
/* References to the set */
u32 ref;
/* References to the set for netlink events like dump,
* ref can be swapped out by ip_set_swap
*/
u32 ref_netlink;
/* The core set type */
struct ip_set_type *type;
/* The type variant doing the real job */
const struct ip_set_type_variant *variant;
/* The actual INET family of the set */
u8 family;
/* The type revision */
u8 revision;
/* Extensions */
u8 extensions; // [0.1]
/* Create flags */
u8 flags;
/* Default timeout value, if enabled */
u32 timeout;
/* Number of elements (vs timeout) */
u32 elements;
/* Size of the dynamic extensions (vs timeout) */
size_t ext_size;
/* Element data size */
size_t dsize;
/* Offsets to extensions in elements */
size_t offset[IPSET_EXT_ID_MAX]; // [0.2]
/* The type specific data */
void *data; // [0.3]
};
```

A successful trigger could result from a scenario which looks like this:
```
CPU 0 CPU 1
// cleanup_net()
synchronize_rcu(); ...

GC runs, list_set_del [1.1]

ip_set_net_exit [1.2]
< GC is cleaned up >
ip_set_destroy_set [1.3]
< set is free now > ...

[ spray window ]

< rcu clean up runs >
__list_set_del_rcu [1.4]
==> use-after-free
```

The general setup for this will be seperated into three processes:
- main: this is the root process which will spawn the spray process and repeat on failure
- spray: spawns the bug trigger process and will perform the heap spray
- bug: sets up the bug trigger in its own namespace which will exit when the process exits
and thus performs one try at hitting the race.

Because our bug requires interaction with multiple namespaces such "complex" process
structure is sadly required.

Let's look at each process in more detail.
The main process is not really important for now, its main purpose is to provide a
retry loop.

The spray process is arguably the most important one.
It runs once for each try of hitting the race.
In the initial stage it will do the following things in order:
1. Prepare the bug trigger process in a new usernamespace
2. Prepare spraying primitives and other post-trigger required setup
3. Signal the bug trigger process to perform one try
4. Wait for the bug trigger process to exit
5. Perform the heap spray and check for success.

By timing the delay between 4. and 5. in a "good" way, the heap spray will
run concurrently to the namespace cleanup triggered by the bug process.
Special care is taken to assign the CPU cores in order to ensure that the spray
runs on the same core as the trigger. Additionally the cleanup has to run on
another core so that they can run truly concurrently.

If the bug was triggered successfully _and_ the spray successfully reclaimed one of
the freed sets in time, the `__list_set_del_rcu` cleanup path in [1.4] will
use our sprayed payload and we proceed to the next stage.

The bug process will try to prepare good conditions for a positive race outcome.
Specifically it will do the following:
Prepare 10 list sets (which introduce the vulnerability), each with a garbage
collector that runs after a 1 second timeout (+- some jiffies).
To each of those sets we add the same one element with a short timeout.
(We choose `bitmap:port` as the element set, for no specific reason)
We do not send this payload straight away, rather pack it into one large netlink
message which will be send all at once to increase control over the timings.
At this point we wait for the signal to trigger the bug.

With the signal ready, we setup a timer in our process which triggers after a certain
timeout close to 1 second to match the garbage collector.
With the timer setup, we send the full netlink payload actually creating all the
sets and their elements.
We then wait for the timer to expire and exit the process as it happens.

This way, we force the namespace cleanup to run approximately at the same time
as the garbage collector will run.
The larger number of sets increases our likelyhood of hitting the race for one of
them.

### Stage 1 Payload Considerations

Stage 1 is basically a one-shot scenario: We only have a brief time window where
we can reclaim the freed object with a payload in `kmalloc-192`.
Therefore some special considerations are required for the payload.

Luckily, the RCU callback proves to be very helpful:
```c
static void
__list_set_del_rcu(struct rcu_head * rcu)
{
struct set_elem *e = container_of(rcu, struct set_elem, rcu);
struct ip_set *set = e->set; // [2.1]

ip_set_ext_destroy(set, e); // [2.2]
kfree(e);
}

#define ext_comment(e, s) \
((struct ip_set_comment *)(((void *)(e)) + (s)->offset[IPSET_EXT_ID_COMMENT]))

static inline void
ip_set_ext_destroy(struct ip_set *set, void *data)
{
/* Check that the extension is enabled for the set and
* call it's destroy function for its extension part in data.
*/
if (SET_WITH_COMMENT(set)) {
struct ip_set_comment *c = ext_comment(data, set); // [2.3]

ip_set_extensions[IPSET_EXT_ID_COMMENT].destroy(set, c);
}
}
```

Note that we are spraying a fake set, specifically our payload will correspond
to the set pointer fetched at [2.1].
Following the call chain [2.2] to `ip_set_ext_destroy` we can modify the set
to contain a comment extension ([0.1]) which will result in the "comment" being
freed. For the `list:set` type, extensions live on the element itself
(i.e. `struct set_elem`) and are referred to by an offset value ([0.2]) which is
stored in the owning set (i.e. our payload). Therefor we can set arbitrary
offsets here and essentially cause an arbitrary free.
To better understand the primitive have a closer look at the comment destroy function:
```c
struct ip_set_comment_rcu {
struct rcu_head rcu;
char str[];
};

struct ip_set_comment {
struct ip_set_comment_rcu __rcu *c;
};

static void
ip_set_comment_free(struct ip_set *set, void *ptr)
{
struct ip_set_comment *comment = ptr;
struct ip_set_comment_rcu *c;

c = rcu_dereference_protected(comment->c, 1); // [2.4]
if (unlikely(!c))
return;
set->ext_size -= sizeof(*c) + strlen(c->str) + 1; // [2.5]
kfree_rcu(c, rcu); // [2.6]
rcu_assign_pointer(comment->c, NULL);
}
```

It will read the actual comment pointer from offset we specified ([2.4]) and,
given that it is not `NULL`, run a `kfree_rcu` on it ([2.6]).
This means, by choosing the offset in such a way that it adds to a location with
a useful pointer value, we can free that (possibly arbitrary) object.

The simplest victim object to choose for this is the `*set` itself, as it is
already present on the `struct set_elem` object at offset 32:
```c
/* Member elements */
struct set_elem {
struct rcu_head rcu;
struct list_head list;
struct ip_set *set; /* Sigh, in order to cleanup reference */
ip_set_id_t id;
} __aligned(__alignof__(u64));
```

Remember that this set pointer is a pointer to the fake object which we sprayed.
This means we can convert our first stage racy use-after-free into a (possibly)
more stable one.

Considering all of this, the most versatile payload to perform the stage 1 spray,
seems to be the well known `struct user_key_payload`:

```c
struct user_key_payload {
struct rcu_head rcu; /* RCU destructor */
unsigned short datalen; /* length of this data */
char data[] __aligned(__alignof__(u64)); /* actual data */
};
```

(It even has a proper RCU head at the correct offset)
To summarize, we spray a `struct user_key_payload` which "looks like" a set with
a comment extension. This extension points to the `*set` member of `struct set_elem`
which in turn points back to the sprayed payload.

Since the set is modified when the comment is actually deleted ([2.5]), we can
easily detect whether the race was successful by reading back the key.
When this is the case, we continue to stage 2, with a reasonably stable
use-after-free on our key payload.

## Stage 2: Use-After-Free on Key Payload

To leverage the use-after-free I chose to simply re-claim the freed key object
with another `struct ip_set` object.
Specifically we will choose a `bitmap:port` set for this.
There are many good reasons for this:
- An ip set has many pointers as members. Since we control a key object we can leak a lot of data.
Specifically this allows us to bypass KASLR via the `type` member.
- It has (indirect) function pointer members, making it a prime candidate for RIP control
- *But most importantly*, by slightly corrupting the original set, we can construct a very simple arbitrary memory write primitive that is much more useful than any RIP control primitive in the first place.

To better understand the arbitrary write primitive let's have a closer look at
the bitmap ip set type:
```c
/* Type structure */
struct bitmap_port {
unsigned long *members; /* the set members */
u16 first_port; /* host byte order, included in range */
u16 last_port; /* host byte order, included in range */
u32 elements; /* number of max elements in the set */
size_t memsize; /* members size */
struct timer_list gc; /* garbage collection */
struct ip_set *set; /* attached to this ip_set */
unsigned char extensions[] /* data extensions */
__aligned(__alignof__(u64));
};
```

The general setup for a set consists of the generic `struct ip_set` structure
that contains type specific function templates and a `data` member ([0.3]).
For the `bitmap:port` type, this data member points to a `struct bitmap_port`
structure.
The elements are, as the name suggests, a simple bitmap in the `members` member.
Since elements are merely bits (contrary to the `list:set`) extensions are
directly stored on the type structure (see `extensions` member).
When an element is added to the set, the corresponding bit is set and the
extensions are stored at the given index.
The index for the (port) bitmap is determined by `(port to insert) - first_port`.

Knowing this, we construct our primitive like so:
1) Create a `bitmap:port` with a 16 byte extension that we can fully control
2) Add a single element to the bitmap as the first member. This allows us to fake
another `struct bitmap_port` header (specifically the `members`, `first_port`
and `last_port` fields) at `offsetof(struct bitmap_port, extensions) == 72`
3) Using our UaF, read the original `struct ip_set` leaking the `data` member
4) Again using our UaF, write back the `struct ip_set`, modifying the `data` member
by adding the offset (i.e. `72`)

Now we have an bit-level arbitrary read/write primitive through set element
add/remove operations.
(As a side note, an even better choice for this would be something like
`bitmap:ip` since it would allow a broader range compared to the limited `u16`
port type)

Additionally, step 3) contains an implicit oracle to whether we reclaimed the
key object successfully. The `set->name` member overlaps with the `key.len` member.
By making this "length" longer than the original key, we can observe failure
and deduce a successful spray.
Same thing applies to step 4). Since the set name is modified on success, we can
observe the set not being found when triggering any operations if the following
spray failed.

With the primitive in place we only need a target to overwrite.
We will use the `core_pattern`, setting it to `|/proc/%P/exe`.
A following segmentation fault in our exploit process will then invoke our exploit
again as the core dump handler which is a straight way out of the jail and to root.

## Reliability

The exploit is relatively stable. By default, there is no "comment extension"
(see stage 1), this means if our spray did not succeed we are unlikely to corrupt
anything through the RCU cleanup down the way.
Still, we are targeting a race condition which has its quirks. Specifically
success chances degrade over time as we are trashing the heap more and more.
In my local experiments the exploit was successful ~70-80% though this may vary
depending on the underlying CPU speed, noise, etc.
Loading
Loading