Skip to content

gdbstub protocol proposal: linux.vmcoreinfo query packet

Omar Sandoval edited this page Oct 10, 2024 · 7 revisions

Rationale

When debugging remote Linux kernel targets, it is useful to know several pieces of information:

  1. The KASLR offset (i.e. KERNELOFFSET) is very important for the debugger to get the correct address for kernel symbols.
  2. The kernel release and build ID are critical for identifying the specific debuginfo files necessary for debugging.
  3. The physical load address of the kernel, i.e. phys_base on x86_64, can be useful when virtual address mappings aren't available.
  4. For kernel debugging without the DWARF debuginfo (e.g. using kallsyms and CTF/BTF), symbol addresses related to the kallsyms table are necessary to load the symbol table.

Without this information, a kernel debugger won't be able to do much. For example, when using GDB to debug a QEMU Linux guest over the gdbstub, you must either boot with nokaslr in the command line, or explicitly provide GDB with the load address for the kernel. The first option isn't ideal for production diagnostic use cases, and the second adds quite a bit of friction -- plus it may not be possible to find the offset once a bug has occurred and a debugger is required. After all, Linux tries to keep its KASLR offset a secret!

All of the above information is provided in the Linux kernel's vmcoreinfo note. In order to better support kernel debugging over gdbstub, we propose extending the gdbstub protocol with a new query type, which returns the Linux kernel vmcoreinfo note.

Protocol Changes

According to the GDB Serial Protocol: General Query Packets document:

Packets starting with ‘q’ are general query packets

And:

The names of custom vendor packets should use a company prefix, in lower case, followed by a period. For example, packets designed at the Acme Corporation might begin with ‘qacme.foo’ (for querying foos) or ‘Qacme.bar’ (for setting bars).

It stands to reason that for Linux kernel specific debugging information, we should adopt a prefix of linux., and for this specific note, we should use linux.vmcoreinfo. So we propose adding the following query:

q linux.vmcoreinfo

The response will be of the form:

Q linux.vmcoreinfo VMCOREINFO-DATA

Where VMCOREINFO-DATA is encoded in the escaped binary format documented in the GDB Remote Protocol Overview.

Anticipated Changes

GDB

Add support for executing this query command via a GDB command like info vmcoreinfo or something similar.

QEMU

Implement support for this query command which relies on the VmCoreInfo device.

Linux

Implement support for this query command in the kgdb gdb stub based on the built-in vmcoreinfo. This will require implementing an encoder for the escaped binary format (kgdb_mem2ebin()).

drgn

Add client support for this query command as part of the gdbstub support feature for kernel targets.

Background on Binary Data Encoding

GDB's protocol seems to have two main approaches for encoding binary protocol, "M-type" and "X-type" packets. You can see them implemented in the gdbserver.

  • M type: each byte of data is transmitted as two hexadecimal digits.
  • X type: each byte is transmitted as-is, but the characters $#}* must be escaped. The encoding is described in Binary Data. Implemented here in GDB for read and write.

The X type seems to be the more appropriate option, given that some connections may be rather slow, and we don't expect this protocol to be used over any connection that is not 8-bit clean. Even if the connection is not 8-bit clean, the contents of vmcoreinfo are valid ASCII, and the escaping scheme doesn't introduce non-ASCII bytes.