Hi all,
I have afreebsd FreeBSD backup system with several disks in a ZFS "mirrored stripe". I was causing a lot of disk io on one of the filesystems which seemd to cause the system to hang (by "a lot", I mean about 150 rm's concurrently in different parts which its coped with before). By "hang", the system responded to pings but I couldn't ssh to it, nor gain access via kvm.
When rebooting, it reached the filesystem mounts and then just hung there, ^T showed it stuck on zfs->io_cv most of the time. I rebooted into single user mode and it was ok, and - of the 3 filesystems present on the zfs pool - mounting 2 of them manually was fine. Mounting the one which previously had all of the IO just causes it to hang again. I've left it for a few hours and its still just sitting there. When I set the mountpoint of that filesystem to 'legacy' and booted it as normal, the system is fine (but obviously I can't access that filesystem).
I rebooted the system and ran the legacy mount in a screen to try to debug it, but I'm not entirely sure what to do, or how to use zdb. I scrubbed the entire pool recently, which completed without error, but the same thing happens. When trying to mount this particular fs, zpool iostat -v shows about 700K/sec total, with the system appearing to read all of the disks, with no other processes requiring disk io. zfs get all works against the filesystem (noatime,noexec,nosuid,fletcher4,dedup=off,copies=1) - literally the only thing I can't seem to do is mount it.
Does anyone have any suggestions as to what this might be? Not familiar enough with zfs or zdb to know fully - could it be the zfs intent log or something? Is there any way to tell? The mount process appears to be unkillable.
Various bits of system info:
I have a
When rebooting, it reached the filesystem mounts and then just hung there, ^T showed it stuck on zfs->io_cv most of the time. I rebooted into single user mode and it was ok, and - of the 3 filesystems present on the zfs pool - mounting 2 of them manually was fine. Mounting the one which previously had all of the IO just causes it to hang again. I've left it for a few hours and its still just sitting there. When I set the mountpoint of that filesystem to 'legacy' and booted it as normal, the system is fine (but obviously I can't access that filesystem).
I rebooted the system and ran the legacy mount in a screen to try to debug it, but I'm not entirely sure what to do, or how to use zdb. I scrubbed the entire pool recently, which completed without error, but the same thing happens. When trying to mount this particular fs, zpool iostat -v shows about 700K/sec total, with the system appearing to read all of the disks, with no other processes requiring disk io. zfs get all works against the filesystem (noatime,noexec,nosuid,fletcher4,dedup=off,copies=1) - literally the only thing I can't seem to do is mount it.
Does anyone have any suggestions as to what this might be? Not familiar enough with zfs or zdb to know fully - could it be the zfs intent log or something? Is there any way to tell? The mount process appears to be unkillable.
Various bits of system info:
Code:
[root@vault ~]# zpool get version vault
NAME PROPERTY VALUE SOURCE
vault version 28 default
Code:
[root@vault ~]# zpool status
pool: vault
state: ONLINE
scan: resilvered 6.75T in 35h40m with 0 errors on Fri Dec 30 07:27:25 2011
config:
NAME STATE READ WRITE CKSUM
vault ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da8.eli ONLINE 0 0 0
da0.eli ONLINE 0 0 0
label/bZp0.eli ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da1.eli ONLINE 0 0 0
label/bBp2.eli ONLINE 0 0 0
ada1.eli ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
label/bZp2.eli ONLINE 0 0 0
da10.eli ONLINE 0 0 0
da2.eli ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
label/bZp3.eli ONLINE 0 0 0
da3.eli ONLINE 0 0 0
da11.eli ONLINE 0 0 0
Code:
[root@vault ~]# uname -a
FreeBSD vault 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #16: Wed Dec 28 17:35:39 GMT 2011 root@:/usr/obj/usr/src/sys/vault amd64