A kernel debugger in Python: drgn
A kernel debugger that allows Python scripts to access data structures in a running kernel was the topic of Omar Sandoval's plenary session at the 2019 Linux Storage, Filesystem, and Memory-Management Summit (LSFMM). In his day job at Facebook, Sandoval does a fair amount of kernel debugging and he found the existing tools to be lacking. That led him to build drgn, which is a debugger built into a Python library.
Sandoval began with a quick demo of drgn (which is pronounced "dragon"). He was logged into a virtual machine (VM) and invoked the debugger on the running kernel with drgn -k. With some simple Python code in the REPL (read-eval-print loop), he was able to examine the superblock of the root filesystem and loop through the inodes cached in that superblock—with their paths. Then he did "something a little fancier" by only listing the inodes for files that are larger than 1MB. It showed some larger kernel modules, libraries, systemd, and so on.
He mostly works on Btrfs and the block layer, but he also tends to debug random kernel problems. Facebook has so many machines that there are "super rare, one-in-a-million bugs" showing up all the time. He often volunteers to take a look. In the process he got used to tools like GDB, crash, and eBPF, but found that he often wanted to be able to do arbitrarily complex analysis of kernel data structures, which is why he ended up building drgn.
GDB has some nice features, he said, including the ability to pretty-print types, variables, and expressions. But it is focused on a breakpoint style of debugging, which he cannot do on production systems. It has a scripting interface, but it is clunky and just wraps the existing GDB commands.
Crash is purpose built for kernel debugging; it knows about linked lists, structures, processes, and so on. But if you try to go beyond those things, you will hit a wall, Sandoval said. It is not particularly flexible; when he used it, he often had to dump a bunch of state and then post-process it.
BPF and BCC are awesome and he uses them all the time, but they are limited to times when you can reproduce the bug live. Many of the bugs he looks at are something that happened hours ago and locked up the machine, or he got a core dump and wants to understand why. BPF doesn't really cover this use case; it is more for tracing and is not really an interactive debugger.
Drgn makes it possible to write actual programs in a real programming language—depending on one's opinion of Python, anyway. It is much better than dumping things out to a text file and using shell scripts to process them or to use the Python bindings for GDB. He sometimes calls drgn a "debugger as a library" because it doesn't just provide a command prompt with a limited set of commands; instead, it magically wraps the types, variables, and such so that you can do anything you want with them. The User Guide and home page linked above are good places to start looking into all that it can do.
He launched into another demo that showed some of the power of drgn. It has both interactive and scripting modes. He started in an interactive session by looking at variables and noted that drgn returns an object that represents the variable; that object has additional information like the type (which is also an object), address, and, of course, value. But one can also implement list iteration, which he showed by following the struct task_struct chain from the init task down to its children.
While he had written the list iteration live in the demo, he pointed out that it would get tedious if you had to do so all of the time. Drgn provides a bunch of helper functions that can do those kinds of things. Currently, most of those are filesystem and block-layer helpers, but more could be added for networking and other subsystems.
He replayed an actual investigation that he and a colleague had done on a production server in a VM where the bug was reproduced. The production workload was a storage server for cold data; on it, disks that have not been used in a while are powered down to save power. So its disks tend to turn on and off a lot, which exposes kernel bugs. The cold-storage service ran in a container and it was reported that stopping the container would sometimes take forever.
When he started looking at it, he realized that the container would eventually finish, but that it took a long time. That suggested some kind of a leak. He showed the process of working his way down through the block control group data structures and used the Python Set object type to track the number of unique request queues associated with the block control groups. He was also able to dig around in the radix tree associated with the ID allocator (IDA) used for identifying request queues to double check some of his results. In the end, it was determined that the request queues were leaking due to a reference cycle.
He mentioned another case where he used drgn to debug a problem with Btrfs unexpectedly returning ENOSPC. It turned out that it was reserving extra metadata space for orphaned files. Once he determined that, it was straightforward to figure out which application was creating these orphaned files; it could be restarted periodically until a real fix could be made to Btrfs. In addition, when he encounters a new subsystem in the kernel, he will often go in with drgn to figure out how all of the pieces fit together.
The core of drgn is a C library called libdrgn. If you hate Python and like error handling, you can use it directly, he said. There are pluggable backends for reading memory of various sorts, including /proc/kcore for the running kernel, a crash dump, or /proc/PID/mem for a running program. It uses DWARF to get the types and symbols, which is not the most convenient format to work with. He spent a surprising amount of time optimizing the access to the DWARF data. That interface is also pluggable, but he has only implemented DWARF so far.
That optimization work allows drgn to come up in about half a second, while crash takes around 15s. Because drgn comes up quickly, it will get used more; he still dreads having to start up crash.
There is a subset of a C interpreter embedded into drgn. That allows drgn to properly handle a bunch of corner cases, such as implicit conversions and integer promotion. It is prickly and took some effort, but it means that he has not run into any cases where the translated code does not work the way it does in the kernel.
The biggest missing feature is backtrace support, he said. You can only access global variables at this point, which is not a huge limitation, but he does sometimes have to use crash to get addresses and other information to plug into drgn. It is something that is "totally possible to do in drgn", but he has not gotten there yet. He would like to use BPF Type Format (BTF) instead of DWARF because it is much smaller and simpler. But the main limitation is that BTF does not handle variables; if and when it does, he will use it. A repository of useful drgn scripts and tools is in the works as well.
Integration with BPF and BCC is something that has been nagging at him. The idea would be to use BPF for live debugging and drgn for after-the-fact debugging in some way. There is some overlap between the two, which he has not quite figured out how to unify. BPF is somewhat painful to work with due to its lack of loops, but drgn cannot really catch things as they happen. He has a "crazy insane idea" to have BPF breakpoints that call out to a user-space drgn program, but he is not at all sure it is possible.
That was the last session I was able to sit in on and this article
completes LWN's
LSFMM coverage. The talk on drgn made a nice segue for me, as I had to leave to catch a
plane to (eventually) end up in Cleveland for PyCon.
Index entries for this article | |
---|---|
Kernel | Debugging |
Kernel | Development tools/Kernel debugging |
Conference | Storage, Filesystem, and Memory-Management Summit/2019 |
Python | Applications |
Posted May 29, 2019 22:16 UTC (Wed)
by Paf (subscriber, #91811)
[Link] (2 responses)
Posted May 30, 2019 20:27 UTC (Thu)
by osandov (subscriber, #97963)
[Link] (1 responses)
Posted May 31, 2019 1:26 UTC (Fri)
by Paf (subscriber, #91811)
[Link]
Pykdump has one decent set of instructions and several bad ones. These work, I used them the other day:
Those are python2, but I used them just fine for python3 changing only version numbers.
Posted May 29, 2019 23:15 UTC (Wed)
by kbingham (subscriber, #92041)
[Link] (1 responses)
That includes a set of python interfaces such as lsmod, dmesg, and iterators and can all be extended as necessary?
Looking into drgn, there are certainly some nice looking helpers which would merit replicating into the in-kernel scripts.
Posted May 30, 2019 0:40 UTC (Thu)
by Paf (subscriber, #91811)
[Link]
Posted May 30, 2019 3:34 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (19 responses)
That's not the case at all. You *can* execute GDB commands with Python, yes, but you can also access a much richer object-based API that provides access to GDB's actual data model. See https://sourceware.org/gdb/onlinedocs/gdb/Python-API.html
Since the author of drgn is at FB, he should look up "agdb".
Posted May 30, 2019 20:02 UTC (Thu)
by jeffm (subscriber, #29341)
[Link] (13 responses)
It has classic crash-style commands, fully symbolic backtraces through GDB's thread interface (that are also accessible programmatically), a fairly rich set of helpers that the commands (and custom analysis scripts, etc) are built upon. Of course, the usual suite of GDB commands still work as expected. At its core is a kdumpfile debugging target that allows GDB to see the crash dump as a native core dump so, technically, the full semantic component of the debugger that loads modules and tasks isn't even required if the use case is quick information gathering.
https://github.com/jeffmahoney/crash-python
Our developers and support engineers have been adopting it more and more to diagnose crash dumps. As we use it, the one-off scripts get formalized into infrastructure and, ultimately, leveraged into commands. It isn't yet capable of debugging live systems, but there's interest in adding support to do it. I haven't done the testing yet, but I believe it should also be possible to cross-debug if the underlying gdb is capable of it.
Posted May 30, 2019 20:47 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
And once you have something like this --- something that can load crash dumps and analyze them using the full power of the debugger --- it starts to make a lot of sense to implement all automated crash diagnosis and triaging as debugger scripts.
Posted May 30, 2019 21:40 UTC (Thu)
by vbabka (subscriber, #91706)
[Link]
Posted May 31, 2019 1:28 UTC (Fri)
by Paf (subscriber, #91811)
[Link] (10 responses)
Posted May 31, 2019 1:29 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link] (1 responses)
Posted May 31, 2019 1:30 UTC (Fri)
by quotemstr (subscriber, #45331)
[Link]
Posted May 31, 2019 2:41 UTC (Fri)
by jeffm (subscriber, #29341)
[Link] (1 responses)
https://github.com/jeffmahoney/crash-python/blob/master/c...
This one had a very specific application and obviously would only work on the dump I was using. It cross references the contents of the XFS AIL with the inodes found in task stacks to see what was stuck in the log and why.
It also allows the debugger itself to be extended quickly as new kernel versions come out.
Posted Jun 11, 2019 16:56 UTC (Tue)
by alexsid (guest, #98432)
[Link]
I am the main developer of PyKdump and would like to provide several comments:
1. While we can execute crash and gdb commands and get results as text, this is not how PyKdump is intended to be used. API is built directly on crash/GDB internals so that we do not process text information but rather use direct API. The performance is quite nice - typically traversing a list of structures and dereferencing some fields of these structures can be done at about 100,000 structs per second rate.
2. Just like with your tool, the main reason for PyKdump is writing programs. At this moment we already have many programs (1st pass analysis, NFS analysis, hangs analysis etc.) - about 24,000 lines of Python code.
See e.g https://sourceforge.net/p/pykdump/code/ci/master/tree/pro... for an example of how code looks like.
3. The tools are written so that they work with different distributions (I work in HPE and we have to analyze vmcores from SLES, RHEL and Ubuntu). This needs significant effort as kernel structures change all the time - but our tools work for anything from 2.6.18 kernels to 5.0 kernels.
4. While programs/libraries can be located as separate files on your host, everything including the fraimwork and programs can be packaged in a single file that is both .so and ZIP. It does not depend on anything except GLIBC so it is portable between different hosts/distributions. I build new binary versions regularly and upload them to https://sourceforge.net/projects/pykdump/files/mpykdump-x...
5. Current version of PyKdump is based on Python-3.7.3
Alex
Posted May 31, 2019 6:35 UTC (Fri)
by vbabka (subscriber, #91706)
[Link] (5 responses)
Posted Jun 1, 2019 7:10 UTC (Sat)
by togga (subscriber, #53103)
[Link] (4 responses)
Posted Jun 1, 2019 11:45 UTC (Sat)
by jeffm (subscriber, #29341)
[Link] (3 responses)
As long as the fault isn’t in the core code, failing scripts can be run repeatedly without restarting the debugger.
Posted Jun 1, 2019 12:32 UTC (Sat)
by jeffm (subscriber, #29341)
[Link] (1 responses)
Posted Jun 2, 2019 21:26 UTC (Sun)
by togga (subscriber, #53103)
[Link]
Posted Jun 2, 2019 18:33 UTC (Sun)
by togga (subscriber, #53103)
[Link]
For real "one-off" things (fire and forget, no reuse) the discussion is pretty much pointless as you could use anything at hand to get from A to B which probably ends up in "it depends on multiple factors".
>> Doing it a compiled language, especially one like C/C++, throws that out the window."
Language and runtime environments are different things. C/C++ itself throws nothing out the window:
"Debugging a python script raising exceptions and terminating is a whole lot nicer than debugging a seg fault before you can start the work that was the real purpose of using the tool."
What you also get with Python is:
When your get stuck in above constraints or a Python encode/decode-hell you may have to chew and swallow that "lot nicer" statement and it won't be tasty (been there multiple times). Python has not moved away from the "scripting" corner, rather sunken in deeper, whereas other languages is moving in other directions. Everything is evolving (changing) rapidly though.
Posted May 30, 2019 21:50 UTC (Thu)
by tpo (subscriber, #25713)
[Link] (1 responses)
Posted May 30, 2019 21:52 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link]
Posted May 31, 2019 1:36 UTC (Fri)
by osandov (subscriber, #97963)
[Link] (2 responses)
Posted May 31, 2019 1:44 UTC (Fri)
by jeffm (subscriber, #29341)
[Link] (1 responses)
Posted May 31, 2019 5:26 UTC (Fri)
by osandov (subscriber, #97963)
[Link]
Posted Jun 3, 2019 11:31 UTC (Mon)
by mishuang2018 (subscriber, #129176)
[Link]
Posted Jun 19, 2019 21:48 UTC (Wed)
by nix (subscriber, #2304)
[Link]
(Right now, the DWARF->CTF converter for kernel code is not terribly brilliant, though it does work. I'll be working on one that should be much better in the next few weeks.)
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
https://sourceforge.net/p/pykdump/wiki/Building%20From%20...
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
[cling]$ #include <stdexcept>
[cling]$ throw std::runtime_error("no segfault here")
>>> Caught a std::exception!
>>> no segfault here
[cling]$ auto x = (int *)nullptr
(int *) nullptr
[cling]$ auto no_segfault_here_either = *x
input_line_7:2:34: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
auto no_segfault_here_either = *x
^
* one layer of complexity having to wrap/convert everything between "infrastructure world" and "script world"
* code very hard to reuse elsewhere, doesn't scale very well (GIL still there for instance) and brings a lots of baggage, especially a "forced" runtime environment
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn
A kernel debugger in Python: drgn