ServerDebugging

Debugging the Xserver

This minihowto attempts to explain how to debug the X server, particularly in the case where the server crashes. It assumes a basic familiarity with unix and a willingness to risk deadlocking the machine.

Just as a warning, if you try this with a closed-source driver, the output is not likely to be very useful.

Prerequisites

You'll really want to have a second machine around. It's very difficult to debug the X server from within itself; when it stops and returns control to the debugger, you won't be able to send events to the xterm running your debugger. ssh is your friend here. If you don't have a second machine, see the Debugging with one machine section, and good luck.

Your gdb needs to be reasonably recent, 5.3 or better is probably good.

And of course, you'll need a reproduceable way of crashing the X server, but if you've read this far you've probably got that already. This is your testcase.

Debug support

If you're debugging with a modern distribution, then they probably already have 'debuginfo' packages available. These packages (usually quite large) include the debugging symbols for the software you have installed, which makes tools like gdb much more useful. Refer to your distro's documentation for details on how to install these. You'll probably want at least the debuginfo for the X server itself, and for the video driver you're using. For example, on a Fedora machine, you'd say:

debuginfo-install xorg-x11-server-Xorg xorg-x11-drv-ati

On Debian or Ubuntu you'd say

apt-get install xserver-xorg-core-dbg xserver-xorg-video-ati-dbg

Otherwise, if you're building X yourself, you'll need to have built X with debugging information. To pass compiler flags in at build time, say:

  CFLAGS='-O0 -g3' ./configure --prefix=...

All the normal configure options should work as expected. You may want to put your debuggable server in a different prefix. Be careful of ModulePath and other such path statements in your xorg.conf.

Remember that if you're trying to debug into a driver, you'll want to repeat this step for the driver as well as for the server core.

The basics

Start the server normally. Go over to your second machine and ssh into the first one. su root, and type

gdb /opt/xorg-debug/Xorg $(pidof Xorg)

or

gdb /usr/bin/Xorg $(pidof X)

depending on your setup.

Note that even when running with a ssh, X might cripples the console. You can avoid this by passing this option:

  -keeptty         don't detach controlling tty (for debugging only)

gdb will attach to the running server and spin for a while reading in symbols from all the drivers. Eventually you'll reach a (gdb) prompt. Notice that the X server has halted; type cont at the gdb prompt to continue executing.

Go back to the machine running X, and run your testcase. This time, instead of the server crashing, it should freeze, and gdb should tell you the server got a signal (usually SIGSEGV), as well as the function and line of code where the problem happened. An example looks like:

  Program received signal SIGSEGV, Segmentation fault.
  0x403245a3 in fbBlt (srcLine=0xc1a1c180, srcStride=59742, srcX=0,
                dstLine=0x4240cb6c, dstStride=1152, dstX=0, width=32960, height=764,
                alu=-1046602744, pm=1111538028, bpp=32, reverse=0, upsidedown=0)
                at fbblt.c:174
  174     *dst++ = FbDoDestInvarientMergeRop(*src++);

This by itself is pretty helpful, but there's more info out there. At the gdb prompt, type bt f for a full stack backtrace. (Warning, this will be long!) This dumps out the full call chain of functions from main() on down, as well as the arguments they were called with and the value of all local variables. Keep hitting enter until you get back to the gdb prompt.

Get your mouse out, copy all the output from "Program received..." on down, and paste it into a file on your second machine. Type detach at the gdb prompt to detach gdb from the server and let it finish crashing. Go to http://bugs.freedesktop.org/ and file a new bug describing the testcase. Attach the gdb output to the bug (please don't just paste it into the comments section).

All the gdb commands you'll ever need

For any gdb command, you can say "help " at the (gdb) prompt to get a (hopefully informative) explanation.

Note that most commands can be used in an abbreviated version (e.g. n instead of next). Just try it yourself!

Things that can go wrong

The biggest thing to watch out for is attempting to print memory contents when that memory is located on the video card. It won't work, on x86 anyway, for some not-very-interesting reasons. You'll know when you did it because the machine will deadlock and you'll have to reboot. See the DebuggingHints file (below) for workarounds.

Some issues with running X under gdb may be resolved by passing the -dumbSched option to the X server. This worked for me to resolve crashes of gdb 6.3 and strange loops in gdb 5.3. You'll know if you need this option because gdb will get very confused by SIGALRM. Even if gdb isn't misbehaving, the -dumbSched option can be very helpful to avoid the SIGALRM peridocially interrupting your debugging session.

Likewise, some gdb versions crash when starting the X server when attempting to run xkbcomp. This is, amazingly enough, a bug in the kernel's DRM code for suppressing some signals; it should be fixed in 2.6.28 if not earlier. You can disable XKB by passing the -kb option on the server's command line; obviously if you're trying to debug XKB this may cause you some problems and you're probably better off attaching gdb to a running X instead. Alternatively, disable DRI, but again, if DRI is the thing you're trying to debug, that won't help.

When you compile with optimization, the values printed by bt can sometimes be confusing. Some variables can get optimized out of existance, some variables occupy the same position on the stack during different parts of a function's execution, and some functions might not show up on the stack at all. Also, single-stepping can be confusing because the function might get executed in a different order than listed in the source if the compiler determines that's safe to do. gcc 4.0 seems to be much more aggressive at confusing the debugger than earlier versions, although it does emit more debugging information such that you'll at least know when variables have been optimized away. As always, lowering the optimization level improves debuggability.

Further information

There is a DebuggingHints file available online. It contains a lot of helpful (if very dated) information on how to debug the server, including how to dump PCI memory without deadlocking the machine. In particular, you'll want to read this if you're trying to debug a server older than 6.9.

Debugging with one machine

Version 1

The script below allows you to run the server in gdb and catch the gdb output in a file. You cannot interactively control gdb, however the Xserver should not hang gdb by stopping inside the debugger while you cannot control it from a terminal. Store the following script in some file (for example: /tmp/Xdbg:

#!/bin/sh

#GDB=...
#XSERVER=...

ARGS=$*
PID=$$

test -z "$GDB" && GDB=gdb
test -z "$XSERVER" && XSERVER=/usr/bin/Xorg

cat > /tmp/.dbgfile.$PID << HERE
file $XSERVER
set confirm off
set args $ARGS
handle SIGUSR1 nostop
handle SIGUSR2 nostop
handle SIGPIPE nostop
run
bt full
cont
quit
HERE

$GDB --quiet --command=/tmp/.dbgfile.$PID &> /tmp/gdb_log.$PID

rm -f /tmp/.dbgfile.$PID
echo "Log written to: /tmp/gdb_log.$PID"

Then (as root) do:

chmod u+x /tmp/Xdbg
mv /usr/X11R6/bin/X /usr/X11R6/bin/X.org
ln -sf /tmp/Xdbg /usr/X11R6/bin/X

If you are using a module aware debugger you should remove the comment sign # form the line starting with #GDB and add the full path to your debugging gdb. You can now start your Xserver like normal. Note, that if you use startx you should do so as root. When the Xserver crashes the output of the server should have been written to /tmp/gdb_log.<number> together with a backtrace. If your Xserver resides at some other place you can use the XSERVER environment variable to specify the path. To restore the previous setup do:

mv /usr/X11R6/bin/X.org /usr/X11R6/bin/X

Version 2

If you only have one machine available, you might be able to pry some useful information from the server when it crashes. The downside is that it will probably halt your machine entirely rather than just crashing X.

Edit your xorg.conf file and find the ServerFlags section. Uncomment the

  Option "NoTrapSignals"

line (or add it if it doesn't exist). This will prevent the server from catching fatal signals, which should cause core dumps instead. (You need to make sure you have core dumps enabled for the server by removing the appropriate ulimit; see the ulimit command in the bash man page for details.)

The problem here is the same as mentioned earlier; the core dump will attempt to included mmap()'d sections of card memory, which will make the machine freeze. Usually the core dump is informative enough to at least give a partial backtrace.

Once you've crashed the machine, find the core file and load it in gdb:

  gdb `which Xorg` /path/to/core/file

and try to bt f like normal. Fortunately at this point you can't make the machine crash again.

Debugging with gdbserver

Run X on the target using gdbserver, listening on (for example) port 2500:

  gdbserver :2500 /usr/bin/X

Attach to the running process from gdb, running it from an environment in which you have Xorg installed. In my case, this is a chroot environment. If I try to debug the program from the host environment, without chrooting into my Xorg build environment, gdb cannot find the symbols correctly.

root:/usr/src/xc-build# gdb
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(gdb) file programs/Xserver/Xorg
Reading symbols from /usr/src/xc-build/programs/Xserver/Xorg...done.Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) target remote 192.168.0.134:2401
Remote debugging using 192.168.0.134:2401
0xb7fed7b0 in ?? ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb7a92524 in GXDisplayVideo (pScrni=0x828bd38, id=0xb7aa9490, offset=0x17,
    width=0x82a, height=0xe730, pitch=0xb7aa946c, x1=0x8289920, y1=0x0,
    x2=0x0, y2=0x0, dstBox=0x82ae680, src_w=0x82a, src_h=0xe794, drw_w=0x828,
    drw_h=0x8638) at amd_gx_video.c:849
849        GFX(set_video_enable(1));
(gdb)

Note in this example that I specify the program to be debugged with a gdb command to read the Xorg symbols:

  (gdb) file programs/Xserver/Xorg

This is simply an alternative to running gdb like this:

  gdb programs/Xserver/Xorg