7.2-RELEASE: DTrace stability?

NOTE: If this is a dupe, sorry, but I waited 10+ minutes for the post to appear. Never had a forum take more than 20 seconds to show a thread before, so... Hmm.

I'm having some major stability issues with DTrace under 7.2-RELEASE, on amd64. I followed the wiki to get it working, and it does work, for a few seconds at most.
(By the way, OT, it should be added to the wiki that debug symbols are required to not get an ugly error message!)

When using "opensnoop" from the DTrace toolkit (ports/sysutils/DTraceToolkit, for example), I get this panic:
Code:
Fatal trap 10: trace trap while in kernel mode
... some more info ...
cpuid = 0, apci id = 00
processor eflags: trace trap, nested task, IOPL = 3
current process = mktemp
panic: trace trap

I'm lazy, so I took a picture of the full screen instead, but being on a forum I hope you've got a graphical browser. :)
Full info from the second panic: http://www.imgur.com/2nCYu

Steps to replicate, for me:
1) Start opensnoop
2) Do anything that opens a file, it seems (in this case I just did find /var or something to that matter, and it panicked within a second).

/var/crash/info.1:
Code:
Dump header from device /dev/ad0s1b
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 729923584B (696 MB)
  Blocksize: 512
  Dumptime: Fri May  8 09:34:07 2009
  Hostname: chaos.exscape.org
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 7.2-RELEASE #1: Thu May  7 21:01:25 CEST 2009
    root@chaos.exscape.org:/usr/obj/usr/src/sys/DTRACE
  Panic String: trace trap
  Dump Parity: 224395085
  Bounds: 1
  Dump Status: good

I haven't done much poking around with other/custom scripts, although a simple
Code:
dtrace -n 'syscall:::entry { @a[probefunc] = count(); }'
appears stable.

I'm guessing this is related to DTrace not being fully implemented in userland or something like that?
Anyways, any help (or info, for that matter) is appreciated.
 
Sure. It took a while as I was building CURRENT when you posted, but here we go, a brand new crash:

Code:
[root@chaos /usr/obj/usr/src/sys/DTRACE]# kgdb kernel.debug /var/crash/vmcore.6
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 10: trace trap while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x8:0xffffffff812c7e40
stack pointer           = 0x10:0xfffffffebe7f7420
frame pointer           = 0x10:0xfffffffebe7f7510
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = trace trap, nested task, IOPL = 2
current process         = 819 (cron)
trap number             = 10
panic: trace trap
cpuid = 0
Uptime: 45s
Physical memory: 2031 MB
Dumping 183 MB: 168 152 136 120 104 88 72 56 40 24 8

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /bootdir/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /bootdir/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/smbfs.ko...Reading symbols from /bootdir/boot/kernel/smbfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/smbfs.ko
Reading symbols from /boot/kernel/libiconv.ko...Reading symbols from /bootdir/boot/kernel/libiconv.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/libiconv.ko
Reading symbols from /boot/kernel/libmchain.ko...Reading symbols from /bootdir/boot/kernel/libmchain.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/libmchain.ko
Reading symbols from /boot/kernel/dtraceall.ko...Reading symbols from /bootdir/boot/kernel/dtraceall.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/dtraceall.ko
Reading symbols from /boot/kernel/profile.ko...Reading symbols from /bootdir/boot/kernel/profile.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/profile.ko
Reading symbols from /boot/kernel/cyclic.ko...Reading symbols from /bootdir/boot/kernel/cyclic.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/cyclic.ko
Reading symbols from /boot/kernel/dtrace.ko...Reading symbols from /bootdir/boot/kernel/dtrace.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/dtrace.ko
Reading symbols from /boot/kernel/systrace.ko...Reading symbols from /bootdir/boot/kernel/systrace.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/systrace.ko
Reading symbols from /boot/kernel/sdt.ko...Reading symbols from /bootdir/boot/kernel/sdt.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/sdt.ko
Reading symbols from /boot/kernel/fbt.ko...Reading symbols from /bootdir/boot/kernel/fbt.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/fbt.ko
Reading symbols from /boot/kernel/dtmalloc.ko...Reading symbols from /bootdir/boot/kernel/dtmalloc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/dtmalloc.ko
#0  doadump () at pcpu.h:195
195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xffffffff80517f28 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xffffffff8051836c in panic (fmt=0xffffffff808ad39c "%s") at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xffffffff807e3e1c in trap_fatal (frame=0xffffff0010104a50, eva=Variable "eva" is not available.
) at /usr/src/sys/amd64/amd64/trap.c:757
#4  0xffffffff807e4b0a in trap (frame=0xfffffffebe7f7370) at /usr/src/sys/amd64/amd64/trap.c:558
#5  0xffffffff807c8a93 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:209
#6  0xffffffff812c7e40 in vpanic_common () from /boot/kernel/dtrace.ko
#7  0xffffffff812b2127 in dtrace_panic () from /boot/kernel/dtrace.ko
#8  0xffffffff812b215d in dtrace_assfail () from /boot/kernel/dtrace.ko
#9  0x000000080052f4c0 in ?? ()
#10 0xfffffffebe7f7560 in ?? ()
#11 0xffffffff812b2200 in dtrace_copycheck () from /boot/kernel/dtrace.ko
Previous frame inner to this frame (corrupt stack?)
(kgdb) list *0xffffffff812c7e40
No source file for address 0xffffffff812c7e40.
Doesn't look too helpful, though... right? :/
 
To be clear, the above is also from 7.2-RELEASE. (We really need an edit function around here.)
 
You'll get an edit function once you're all grown up ;) (search forum, I believe 10 posts or 10 days).

I suppose you have deleted or not installed the .ko.symbols files? That would point to something more useful.
 
I haven't deleted them, nor installed them specially (I simply did make installkernel KERNCONF=DTRACE), but there are 504 .ko's and 504 .ko.symbols in /boot/kernel, so I suppose they're where they belong. (They also appear to be loaded in the kgdb output above, no? :))

This isn't really a big problem for me personally, I was rather looking to see if others could reproduce this. All you need (as it seems for me) is a DTrace capable kernel and opensnoop (or, I imagine, plenty of other scripts)... and a computer you're willing to crash. ;)
 
Actually, you need a lot less than the full opensnoop script. I cut it down to the bare essentials, and it still panics within a second or two. I'm not shocked, but it's easier to read. :)
Code:
## opensnoop.d
 syscall::open:entry
 {
        self->pathp = arg0;
 }

 syscall::open:return
 /self->pathp/
 {
         printf("%s\n", copyinstr(self->pathp));
 }

## end opensnoop.d

# Then, run
dtrace -s opensnoop.d
 
Ack, horizontal scrolls..

Just weird that there's no symbols in there. Too many dtrace_assfail to even start guessing. I lack the last condition to try and reproduce it :p
 
Well, the problem is no doubt in copyinstr(), as
Code:
#11 0xffffffff812b2200 in dtrace_copycheck () from /boot/kernel/dtrace.ko
suggests.
Running the same script but replacing
Code:
printf("%s\n", copyinstr(self->pathp));
with
Code:
printf("file opened\n");

... allows me to run find / with no crash.
 
Well, you need the pointers to see why this assertion fails:
Code:
dtrace_copycheck(uintptr_t uaddr, uintptr_t kaddr, size_t size)
{
        ASSERT(kaddr >= kernelbase && kaddr + size >= kaddr);

Since size can't be negative, the second part is always true, unless size is so large that it causes a rollover. More likely is that kaddr is 0, but from where?
 
Email the maintainer for dtrace (jb@ I believe) or file a pr. Panics especially one that's as well defined as this will probably be easy to fix.
 
Hello,

I run 7.2-RELEASE kernel with DTrace options enabled, and 'kldload dtraceall' ran successfully. But simple D program with profile tick-1sec doesn't seem to work. And DTraceToolKit dvmstat df -h caused the core dump. Wonder people could shed some light on things maybe I missed or the status of DTrace on 7.2-RELEASE provider implementation.

Thanks in advance.
Code:
BEGIN
{
    i = 0;
}

tick-1sec
{
    trace(++i);
}

END
{
    trace(i);
}
 
What happens when you try to run the tick script?
It works for me, but I'm on 8.0-CURRENT.

"dvmstat df -h" coredumps for me as well (Bus error). As does dvmstat with every other command I've tried. Bash gave a segfault instead.

dvmstat uses the vminfo provider, though, which isn't available on FreeBSD. I have no idea why DTrace doesn't say so, but I doubt you'll get the script to work.
 
dtrace says the script matched 3 probes, ^C displays
CPU ID FUNCTION:NAME
0 2 :END 0

btw, does hotkernel and procsystime work under 8-current also for you...?
 
hotkernel appears to work fine, but procsystime appears to overload the buffer a lot:
Code:
[root@chaos /usr/local/share/DTraceToolkit]# ./procsystime
Tracing... Hit Ctrl-C to end...
dtrace: 7625 dynamic variable drops with non-empty dirty list
dtrace: 26300 dynamic variable drops with non-empty dirty list
dtrace: 3723 dynamic variable drops with non-empty dirty list
^C
dtrace: 1526 dynamic variable drops with non-empty dirty list

Elapsed Times for all processes,

         SYSCALL          TIME (ns)
            kill               4484
       setitimer               6422
       setrlimit              25412
       getrusage              49211
          getgid              65686
          getuid              67407
         getegid              71267
       getrlimit              81887
I tried increasing the buffer and switchrate (#pragma D option bufsize/switchrate) but it didn't help, even at 1000hz and 8m (the maximum, it appears, I tried going for 256).
 
Back
Top