Debugging complex software stacks on Genode Linux
When dealing with large and especially ported software stacks on Genode, sophisticated debugging facilities become a must have. One way to achieve this is to develop or port your software stacks on Genode Linux and take advantage of the GNU Debugger (GDB). Of course this is not possible for low level software, like device driver or kernel, but in general useful for anything that does not access hardware directly. In this article I will describe a hands on experience example on how to debug Java using GDB.
While porting Java I used a run script with a demo Java application that I executed on Linux:
make KERNEL=linux run/java_demo
Suddenly Java triggered a segmentation fault:
[init -> java] OpenJDK 64-Bit Server VM warning: Reserved Stack Area not supported on this platform Kernel: Segmentation fault (signum=11), see Linux kernel log for details
A quick look into the system log revealed
ld.lib.so[16735]: segfault at 0 ip 00007fc6294094f3 sp 00007fc696dfefa8
the cause to be a memory access at null. The instruction and stack pointers are shown, but since Java is a dynamic application with shared libraries that can be loaded at arbitrary locations, this information is not sufficient for debugging.
So lets fire up GDB. The way to debug Genode applications is to attach GDB to a running Genode component, what makes it necessary to stop the component at a well defined state and attach GDB. Usually the main function is a good place to stop. Therefore, I added:
printf("Wait\n"); wait_for_continue();
to Java's main function. wait_for_continue stops Java and waits for a key press. This is only supported on Linux.
In case your run script uses a timeout with the run_genode_until function, change it to:
run_genode_until forever
Otherwise key presses will be ignored.
After restarting the scenario the following output appears:
[init -> java] Wait
Now we open a terminal and go to the debug directory of the Genode build:
cd <genode-dir>/build/x86_64/debug ln -s ld-linux.lib.so ld.lib.so
We have to create a link from the Linux specific linker name to the generic one, since all binaries are linked with the generic name. Next we need to find the process ID of Java
ps -ef | grep java user 24826 24816 0 11:08 pts/7 00:00:00 [Genode] init -> java
and are now able to attach GDB using the id:
gdb -p 24826 java GNU gdb (Gentoo 8.2 p1) 8.2 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://bugs.gentoo.org/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... SIGTRAP is used by the debugger. Are you sure you want to change it? (y or n) [answered Y; input not from terminal] Reading symbols from java...done. Attaching to program: /build/x86_64/debug/java, process 24826 [New LWP 24834] [New LWP 24835] pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 29 ret /* Return to caller. */ (gdb)
We can check if everything is loaded correctly by inspecting shared libraries:
(gdb) info shared From To Syms Read Shared Object Library 0x00007fcd16c085e0 0x00007fcd16cad954 Yes ld.lib.so 0x0000000001010fa0 0x00000000010298ad Yes java.lib.so 0x000000000106e260 0x0000000001161034 Yes libc.lib.so 0x00000000011dff70 0x0000000001213e12 Yes vfs.lib.so 0x000000000122b5b0 0x000000000122f41e Yes jimage.lib.so 0x0000000001236520 0x0000000001241fe5 Yes jli.lib.so 0x0000000001247f40 0x00000000012560cf Yes zlib.lib.so 0x00000000012611b0 0x000000000126cd32 Yes jnet.lib.so 0x0000000001a30ce0 0x0000000002844677 Yes jvm.lib.so 0x0000000002d3db00 0x0000000002d41e48 Yes jzip.lib.so 0x0000000002d47710 0x0000000002d494f9 Yes libc_pipe.lib.so 0x0000000002d50a40 0x0000000002d6e37c Yes libm.lib.so 0x0000000002d7bbf0 0x0000000002d83911 Yes nio.lib.so 0x0000000002e10180 0x0000000002f01eaf Yes stdcxx.lib.so 0x0000000002f4bc20 0x0000000002f4d186 Yes management.lib.so 0x0000000002f52fe0 0x0000000002f6207b Yes vfs_lxip.lib.so 0x0000000002f9ab40 0x000000000304c6f0 Yes lxip.lib.so
Java and all its dependencies have successfully been loaded. At this point hardware breakpoints (hbreak) may be set. Since I want to use the next and step instructions, which require software breakpoints, this commit has to be applied to the working tree. It is meant for debugging purposes only.
In the first step I want to see a backtrace from where the segmentation fault occurred and simply continue.
(gdb) c Continuing
Switch back to the terminal where Genode is executed and hit <ENTER> to continue Java execution.
Thread 4 "ld.lib.so" received signal SIGSEGV, Segmentation fault. [Switching to LWP 25576] 0x00007fccb17334f3 in ?? ()
In the GDB terminal the fault can be observed. Lets look at the backtrace:
(gdb) bt #0 0x00007fccb17334f3 in ?? () #1 0x0000000000000202 in ?? () #2 0x0000000000000000 in ?? ()
Now that is not good, because there are no symbols at this address, meaning we are in compiled byte code. For demonstration purposes I check another thread:
(gdb) info threads Id Target Id Frame 1 LWP 24826 "ld.lib.so" pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 2 LWP 24834 "ld.lib.so" pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 3 LWP 24835 "ld.lib.so" pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 * 4 LWP 25576 "ld.lib.so" 0x00007fccb17334f3 in ?? () 5 LWP 25577 "ld.lib.so" pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 6 LWP 25578 "ld.lib.so" pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29
There are currently six threads and here is the backtrace of thread 3:
(gdb) thread 3 [Switching to thread 3 (LWP 24835)] #0 pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 29 ret /* Return to caller. */ (gdb) bt #0 pseudo_end () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_syscall.S:29 #1 0x00007fcd16c3fc45 in lx_recvmsg (flags=0, msg=<optimized out>, sockfd=<optimized out>) at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/linux_syscalls.h:163 #2 Genode::ipc_call (dst=..., snd_msgbuf=..., rcv_msgbuf=...) at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/base/ipc.cc:435 #3 0x00007fcd16c80b12 in Genode::Capability<Genode::Signal_source>::_call<Genode::Signal_source::Rpc_wait_for_signal> (this=this@entry=0x7fcd16d956b0 <signal_handler_thread()::inst+208>, args=...) at /home/cbass/backup/src/genode/genode.git/repos/base/include/base/rpc_client.h:141 #4 0x00007fcd16c80f27 in Genode::Capability<Genode::Signal_source>::call<Genode::Signal_source::Rpc_wait_for_signal> (this=0x7fcd16d956b0 <signal_handler_thread()::inst+208>) at /home/cbass/backup/src/genode/genode.git/repos/base/include/base/capability.h:166 #5 Genode::Rpc_client<Genode::Signal_source>::call<Genode::Signal_source::Rpc_wait_for_signal> (this=0x7fcd16d956a8 <signal_handler_thread()::inst+200>) at /home/cbass/backup/src/genode/genode.git/repos/base/include/base/rpc_client.h:192 #6 Genode::Signal_source_client::wait_for_signal (this=0x7fcd16d956a8 <signal_handler_thread()::inst+200>) at /home/cbass/backup/src/genode/genode.git/repos/base/src/include/signal_source/client.h:28 #7 Genode::Signal_receiver::dispatch_signals (signal_source=signal_source@entry=0x7fcd16d956a8 <signal_handler_thread()::inst+200>) at /home/cbass/backup/src/genode/genode.git/repos/base/src/lib/base/signal.cc:304 #8 0x00007fcd16c813ca in Signal_handler_thread::entry (this=0x7fcd16d955e0 <signal_handler_thread()::inst>) at /home/cbass/backup/src/genode/genode.git/repos/base/src/lib/base/signal.cc:47 #9 0x00007fcd16c8534a in Genode::Thread::_thread_start () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/base/thread_linux.cc:80 #10 0x00007fcd16c99d5c in thread_start () at /home/cbass/backup/src/genode/genode.git/repos/base-linux/src/lib/syscall/spec/x86_64/lx_clone.S:62
Looks good and symbols are found. But this does not help with our problem. Lets disassemble the faulting instruction pointer:
(gdb) x/i 0x00007fccb17334f3 => 0x7fccb17334f3: mov (%rsi),%eax
rsi surely contains zero here, next check the instruction before:
(gdb) x/i 0x00007fccb17334f1 0x7fccb17334f1: xor %rsi,%rsi
Someone clears rsi on purpose! All that is left to do now is to locate the code that generates these instructions in the JVM to answer the question of why it is generated that way.
At last here a list of my most used GDB commands
-
print or p : print variable
-
x : examine memory
-
break or b : break at address, function, or line
-
continue or c : continue execution
-
next or n : step over function
-
step or s : step in function
-
info registers : dump CPU register state
-
backtrace or bt : backtrace from current position
-
frame : switch to specified frame (you bt to display frame numbers)
-
watch : set a watch point which stops when value or locations changed
-
info threads : show currently active threads
-
thread : switch to another thread
-
layout asm : show assembly