ZFS Panic vputx negative ref cnt

pboehmer · May 14, 2015

At my wit's end and looking for some assistance.

I updated a 10.1-STABLE system to R282574 last week and within a couple of hours, the system would randomly reboot. Long story short, I updated to various older releases (back to R280900) over the past week to no avail. Panic is identical for the most part:

Code:

May 13 08:31:36 admin syslogd: restart 
May 13 08:31:36 admin syslogd: kernel boot file is /boot/kernel/kernel 
May 13 08:31:36 admin kernel: vputx: negative ref count 
May 13 08:31:36 admin kernel: 0xfffff8165f78f1d8: tag zfs, type VDIR 
May 13 08:31:36 admin kernel: usecount 0, writecount 0, refcount 0 mountedhere 0 
May 13 08:31:36 admin kernel: flags (VI_FREE) 
May 13 08:31:36 admin kernel: VI_LOCKed lock type zfs: EXCL by thread 0xfffff80118899000 (pid 73048, zfs, tid 101614) 
May 13 08:31:36 admin kernel: panic: vputx: negative ref cnt 
May 13 08:31:36 admin kernel: cpuid = 
6 May 13 08:31:36 admin kernel: KDB: stack backtrace: 
May 13 08:31:36 admin kernel: #0 0xffffffff80974a10 at kdb_backtrace+0x60 
May 13 08:31:36 admin kernel: #1 0xffffffff809389a5 at panic+0x155 
May 13 08:31:36 admin kernel: #2 0xffffffff809da6c5 at vputx+0x2d5 
May 13 08:31:36 admin kernel: #3 0xffffffff809d3e49 at dounmount+0x659 
May 13 08:31:36 admin kernel: #4 0xffffffff81a10bb6 at zfs_unmount_snap+0x126 
May 13 08:31:36 admin kernel: #5 0xffffffff81a13d01 at zfs_ioc_destroy_snaps+0xc1 
May 13 08:31:36 admin kernel: #6 0xffffffff81a128d0 at zfsdev_ioctl+0x5f0 
May 13 08:31:36 admin kernel: #7 0xffffffff808257c9 at devfs_ioctl_f+0x139
May 13 08:31:36 admin kernel: #8 0xffffffff8098c675 at kern_ioctl+0x255 
May 13 08:31:36 admin kernel: #9 0xffffffff8098c370 at sys_ioctl+0x140 
May 13 08:31:36 admin kernel: #10 0xffffffff80d2a287 at amd64_syscall+0x357 
May 13 08:31:36 admin kernel: #11 0xffffffff80d0f9db at Xfast_syscall+0xfb
May 13 08:31:36 admin kernel: Uptime: 1h31m0s 
May 13 08:31:36 admin kernel: Copyright (c) 1992-2015 The FreeBSD Project.

I have crash dumps enabled, but there are no crash files.

The pool is 7T with only 3% usage on a Supermicro SMC2108 RAID 6 array. I am using an Intel SSD PCI card nvme/nvd partitioned into a 4G log and a 64G cache. Server is Dual Xeon w/196G of ram. I am using looping rsync scripts to sync the data to two other servers. The rsync scripts take a snapshot , syncs the snapshot, deletes said snapshot, sleeps for 30 minutes, and then starts again. Server provides NFS to several other systems.

I noticed that the rsync scripts seem to be causing my ARC usage to increase, but the memory used is never returned. Would it be normal to see ARC ~140G in this scenario?

At this point, I'm svn'd and buildworld back up to R282880 waiting for the system to panic and reboot on the new kernel/base.

ZFS Panic vputx negative ref cnt

pboehmer