-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DWARFless Debugging #176
Comments
Update today:
|
Updated today:
Some of the new things in there include:
|
Updated today: A tentative roadmap based on discussions with Omar. Link to the first (small) pull request of several in the series. |
Some more technical notes from the discussion so I remember them when I work on the relevant parts:
|
Note to self: update CTF branch for 30ecdd9 |
The
With these changes, on one benchmark I have (which reads 100k elements from the dentry hash table), the CTF implementation now outperforms the DWARF:
|
I've updated the As it is now, drgn reads ORC only from the ELF files it has already located. If there are no ELF debuginfo files, there's no DWARF. A consequence of this is that the unwinder is tightly coupled with the So my changes add a binary tree mapping address ranges directly to ORC information, and this can be used regardless of what debuginfo is loaded. I'd expect that in the future as this goes upstream, there will be a better solution (e.g. the module API refactor) which eliminate the need for this. |
Hi @brenns10, first of all I want to say I really appreciate your work on this set of features. I am working on a project that will use drgn as an automatic incident response tool on production servers and these features will be very useful for me. Specifically, I'm very interested in BTF support as most servers I will be encountering do not come with CTF. Can you give a quick update on the status of BTF support? Will it be available anytime soon? Is there any way I can help with it? Thanks a lot! |
Hi @oshaked1 thanks, I'm glad this feature is interesting to you! In terms of drgn development progress, BTF and CTF are both type formats, and so both of them require using kallsyms as a symbol table. Basically, steps 1-3 (as well as 5) on my roadmap are shared for both CTF and BTF -- the only real difference is which type format gets used. So once step 3 is done (#388), a usable BTF implementation could be written & merged independently of CTF. My very old BTF branch here will be a good place to start. The major snag is that currently, the Linux kernel does not include BTF for kernel variables, just functions. So, while the BTF implementation could be updated and prepared for submission, it would not be useful until we get the Linux kernel (and its BTF generation program, For my development plans, I'm prioritizing the CTF implementation first, but I do plan to work on BTF after that. If you or somebody else wanted to take up that work in order to see it merged faster, I'd be happy to help with design & review. |
Hi @brenns10, thanks for the reply and sorry for the delay. For my use case, I can live without having types for kernel variables, as long as I can specify them manually. Would it be useful for the project if I were to contribute a solution that doesn't address the variable type issue? It would still be quite useful for use cases where the types of the variables being accessed are known, e.g. scripts. I could also leave my branch public so it can be used or built upon when the Linux kernel tooling is ready. Regarding implementation - I saw you mentioned switching to using libbtf. Did you mean libbpf? Also, how up to date is the code in your BTF branch as far as integration with the drgn framework goes? Have there been significant changes to the type finder system? |
The BTF type finder itself in drgn would look pretty much the same, regardless of whether Linux actually contained the types for the variables or not. So from that perspective, it would be useful! But without the Linux support, the code in drgn would not be terribly useful. Even though BTF currently contains many types, it only contains those referenced by the functions and percpu variables it covers. While this is a lot of types, it's not all types, so you wouldn't always be able to write scripts even if you specified the type for a variable. Regardless, it would be helpful to have the BTF branch updated since we would love to be able to use BTF someday, and it's a step in the right direction.
Yes, I meant libbpf, sorry! I saw that there are some BTF-related functions in it, and I haven't evaluated whether it is possible to use it for BPF. Honestly, I think I could get the BTF branch up and running on a much more recent version of drgn over the course of a couple hours. There's not a ton of drgn-specific stuff that has changed; instead, it's mainly that my kallsyms support has improved a lot. It perhaps a bit silly of me to suggest you update it, given that I have all the context, and I haven't done much in terms of documenting things! I will try to take the time to take a crack at it tomorrow, and report back on whether it was as straightforward as I hope it is. |
Ok, I pushed a new branch
It's based directly on top of the CTF branch, so it can make use of the existing support for ORC & kallsyms. I'd recommend reading the commit messages of the two BTF-related commits at the top of this branch. I tested it on a 5.15-based vmcore I had laying around, but you can also run it on the latest vmtest kernels. Here's an example:
Like I said, there's no general mapping from variable name to type, which means that for most variables, you'd need to explicitly specify the type. I added a >>> prog["slab_caches"]
Traceback (most recent call last):
File "<console>", line 1, in <module>
KeyError: 'slab_caches'
>>> Object(prog, "struct list_head", address=prog.symbol("slab_caches").addres)(struct list_head){
.next = (struct list_head *)0xffffa273022b8168,
.prev = (struct list_head *)0xffffa27301042068,
}
>>> from drgn.helpers.linux.btf import var
>>> var(prog, "slab_caches", "struct list_head")
(struct list_head){
.next = (struct list_head *)0xffffa273022b8168,
.prev = (struct list_head *)0xffffa27301042068,
} I did add a few core variable types into a special "hardcoded" object finder, because there are some kernel variables that drgn tries to access internally, and having the types handy prevents them from failing. But there's not really any guarantee that all the types you'd like to refer to will be present, unfortunately. However, it's enough to get drgn's built-in thread API to work, and thanks to the already present ORC plumbing, the stack tracing even works! >>> for thread in prog.threads():
... print(thread.object.comm.string_().decode())
... print(thread.stack_trace())
...
init
#0 __schedule+0x4d0/0x512
#1 schedule+0x2a/0x41
#2 do_wait+0xcb/0xf5
#3 kernel_wait4+0xd8/0x131
#4 __do_sys_wait4+0x49/0x9e
#5 do_syscall_64+0x82/0xe0
#6 entry_SYSCALL_64_after_hwframe+0x76/0x7e
#7 0x7ff936820a7a
kthreadd
#0 __schedule+0x4d0/0x512
#1 schedule+0x2a/0x41
#2 kthreadd+0x72/0x11f
#3 ret_from_fork+0x20/0x35
#4 ret_from_fork_asm+0x1a/0x30
pool_workqueue_
#0 __schedule+0x4d0/0x512
#1 schedule+0x2a/0x41
#2 kthread_worker_fn+0x154/0x1b8
#3 kthread+0xe0/0xeb
#4 ret_from_fork+0x20/0x35
#5 ret_from_fork_asm+0x1a/0x30
kworker/R-rcu_g
#0 __schedule+0x4d0/0x512
#1 schedule+0x2a/0x41
#2 rescuer_thread+0x224/0x23e
#3 kthread+0xe0/0xeb
#4 ret_from_fork+0x20/0x35
#5 ret_from_fork_asm+0x1a/0x30
... |
Looks great! Thanks so much for your effort, definitely would have taken me much longer to figure this out. The
I get the same error when using the built wheel outside the build container as well. Before that I also got the following error:
Which I fixed by moving the variable declarations to the beginning of the function. |
You can fix the undefined Unfortunately libctf doesn't have pkg-config scripts and so every distro / system has slightly different linker flags, it's a bit difficult to get it work generally. Better to turn it off if you're just using BTF. |
Force-pushed
edit: and now I've fixed the test failures |
It works now, thanks! |
Excellent! Your feedback has already been helpful, but I'd appreciate any more you have as you use it :) |
Hi just reporting a tiny issue - the
Everything works fine when ignoring the exception. |
Great timing, you're in luck. That restriction is being removed in the upstreaming, #388. I fixed it Friday and it will be in the next version of PR. I will rebase the CTF and BTF patch sets with those updates for the new revision as well. |
Awesome, I finally got around to integrating your BTF patch into my project so I may have some additional feedback in the upcoming days. I found myself adding quite a few hardcoded types that are required by existing helpers so I will share them with you when I'm done. |
That's great, I'm so glad you're finding this useful! I'll be glad to incorporate any of the hard-coded types into the BTF branch. I'm still aiming to get the variable types included in the kernel BTF but improving the usability as it is now is a great temporary measure. |
Here are the hardcoded types I added that are needed for some existing helpers, with some logic for types that depend on specific configurations: drgn.helpers.linux.btf.HARDCODED_TYPES["slab_kset"] = "struct kset *"
drgn.helpers.linux.btf.HARDCODED_TYPES["slab_caches"] = "struct list_head"
drgn.helpers.linux.btf.HARDCODED_TYPES["min_low_pfn"] = "unsigned long"
drgn.helpers.linux.btf.HARDCODED_TYPES["max_pfn"] = "unsigned long"
drgn.helpers.linux.btf.HARDCODED_TYPES["saved_command_line"] = "char *"
drgn.helpers.linux.btf.HARDCODED_TYPES["net_namespace_list"] = "struct list_head"
# We have CONFIG_SPARSEMEM_EXTREME, mem_section type is `struct mem_section **`
try:
self.prog.function('sparse_index_alloc')
drgn.helpers.linux.btf.HARDCODED_TYPES["mem_section"] = "struct mem_section **"
# We don't have CONFIG_SPARSEMEM_EXTREME, mem_section type is `struct mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]`.
# We ignore the sizes as they are calculated in `linux_kernel_get_vmemmap_address`.
# TODO: make sure this works, this isn't tested.
except LookupError:
drgn.helpers.linux.btf.HARDCODED_TYPES["mem_section"] = "struct mem_section[0][0]"
# Kernel >= 6.9, vmap_nodes is used
try:
self.prog.symbol("vmap_nodes")
drgn.helpers.linux.btf.HARDCODED_TYPES["vmap_nodes"] = "struct vmap_node *"
# Kernel < 6.9, vmap_area_list is used
except LookupError:
drgn.helpers.linux.btf.HARDCODED_TYPES["vmap_area_list"] = "struct list_head" Incorporating variable types in the kernel BTF is a great idea, but it would still be valuable to have the essential hardcoded types in drgn for backward compatibility. |
I will say that the hardcoded types may have some issues when we finally go to review & merge the BTF portion of the branch. Not sure that they will stand up to code review! Worst case I'm sure we could put them into the contrib directory, but I just wanted to let you know that, while I'm happy to add these into the branch, they may not make the final cut. |
If it's a matter of just calling a function from the contrib directory then that's completely fine :) |
Sent a new patch series upstream for dwarves/pahole to add global variables to BTF generation. This does interestingly expose a gap in the BTF support I've implemented so far: BTF contains a |
My patches enabling global variables have landed in the With that, I've updated the
So this is a pretty cool milestone! |
Last updated: 2024-10-11
This issue tracks support for non-DWARF sources of debugging information: specifically for the Linux kernel, but hopefully including userspace as we go. I'm editing this initial issue comment as the project takes shape, so hopefully this can provide at-a-glance status information.
Overview
This issue tracks the use of non-DWARF formats for debuginfo. Namely: CTF for types, ORC (without a debuginfo file) for stack unwinding, BTF for types as well, and kallsyms for symbols.
Objectives
The currently agreed upon end goal here is to get a pluggable symbol finder and vmlinux kallsyms implementation merged and available by default for Drgn. These are useful in and of themselves:
Program.read*
functionality would be available).Once these are available, we will get a basic CTF implementation for Linux kernel merged. The basic ground rules here are that it will be disabled at compile-time in the PyPI wheel distributions, and won't muddle into the internals of Drgn. Essentially, there should be some build-system related changes, and a file named
ctf.c
, and maybe some python wrappers, and that's it.Here are some non-goals at the moment. They may be revisited in time.
-gctf
has a.ctf
section. Compare that to the kernel implementations, which now just create avmlinux.ctfa
(ctfa = CTF Archive) file. As of now, the CTF implementation does support very simple userspace cores in order to add simple unit tests. However, proper support will need better integration with the drgn debug info system, so it won't be officially supported for the initial step.Roadmap
VMCOREINFO
to the Linux special objects. This is not controversial, it's just a nice piece of information to expose to Python helpers. The current design of my helpers does need this, but even if it didn't, it would be a useful piece.ctf
branch).ctf
branch).Current branches
These are links to branches that contain my current work, and they do roughly correspond to different points in the roadmap above. They are stacked, each one building on the prior one. They are subject to being rebased and force pushed at any time.
symbol_finder
- this branch adds the pluggable symbol finder API.kallsyms_finder
- this branch adds the kallsyms implementationctf
- this branch adds the CTF implementation. It also has the necessary plumbing to use ORC for unwinding, without needing to read it from ELF debuginfo files.btf_2024
- this branch adds a small BTF implementation. Support for global variables (which I recently added to BTF via dwarves) is in progress, see the code for details.If you take the latest branch (
ctf
) on an Oracle Linux 9 machine using UEK, then you should be able to build it and install it against a local kernel without installing any debuginfo packages!Future Work
libbpf
in the future, rather than hand-coding the format support.Old Branches & Work
I have created a few prototype branches on older drgn versions. Only the ones mentioned above are actively maintained and developed. The ones below are older and no longer maintained. For the most part, the commits in these branches were used as the basis for more recent branches, so it's not like the work is lost. The below list is from oldest to newest.
btf_debuginfo
- this was my original attempt at alternative debuginfo formats. It didn't have any symbol table support. I submitted Add BTF type finder #162 for this, and closed it as I worked on newer code.kallsyms_plus_btf
- builds on the above with kallsyms supportkallsyms_vmlinux
- this contains the kallsyms implementation, without any BTF code. This went into its own pull request (Implement kallsyms for vmlinux only #177) which I am about to close as well since the branch is outdated.kallsyms_ctf
- this is the old kallsyms implementation, with a preliminary, very hacky CTF implementation.The text was updated successfully, but these errors were encountered: