ZFS system gets frozen after certain time during copy

_martin · Apr 9, 2014

I need to sync ~3TB data from NFS share back to my crypto ZFS pool. I'm unable to do so as system gets frozen when ~400GB is copied to the dataset. If I do a hard reset (nothing else is possible) and resume copying, it again gets frozen after ~400GB copied.

My HW configuration:
Intel S1200BT with Xeon(R) CPU E3-1240 V2 @ 3.40
32GB ECC ram
LSI 9211-8i IT fw with 6 WD RED 2TB disks + Intel 120GB SSD

FreeBSD amd64 9.2-RELEASE r256438. Kernel is a custom built one (GENERIC + PF + crypto).

ZFS setup:
6x2TB disks are in radiz2 pool (pool portal) with SSD acting as a read cache. I've created 4TB zvol that is used with geli to create fully encrypted pool (pool temple). This configuration was working fine, but I did hit the issue described here: http://forums.freebsd.org/viewtopic.php?f=48&t=45814 and I had to recreate it all from the scratch.

Situation on system is as follows:

Code:

# zpool status portal
  pool: portal
 state: ONLINE
  scan: scrub repaired 0 in 28h50m with 0 errors on Sun Apr  6 08:07:45 2014
config:

	NAME        STATE     READ WRITE CKSUM
	portal      ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    da0     ONLINE       0     0     0
	    da1     ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da4     ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	    da7     ONLINE       0     0     0
	cache
	  da6       ONLINE       0     0     0

errors: No known data errors
#
# zfs list portal/zvols/bolt00
NAME                  USED  AVAIL  REFER  MOUNTPOINT
portal/zvols/bolt00  4.01T  6.26T   837G  -
#
# geli status
                        Name  Status  Components
zvol/portal/zvols/bolt00.eli  ACTIVE  zvol/portal/zvols/bolt00
#
# zpool status temple
  pool: temple
 state: ONLINE
  scan: none requested
config:

	NAME                            STATE     READ WRITE CKSUM
	temple                          ONLINE       0     0     0
	  zvol/portal/zvols/bolt00.eli  ONLINE       0     0     0

errors: No known data errors
#

I'm using rsync to sync the data to the temple dataset:

Code:

rsync -avH /path/to/nfs/share /path/to/my/temple/dataset

System gets to the state where I can still ping it and I even get the version string from sshd when I try to telnet to the machine on port 22. I can't login on console. Any active window with opened terminal to this server works only to show the prompt (i.e. when I hit the enter with no command). If I enter a command, it gets hung.

After I reboot the server and mount back the dataset, I see data stopped after ~400GB. If I resume the rsync, system gets frozen again after 400GB (800GB synced now in total).
There's absolutely no log in syslog, disks show no errors at all. Kernel is configured with the watchdog, but due to the state it gets it never gets triggered.

I was wondering if anybody can help give me a hint how to, somehow, initiate a panic in this state so I have a dump available.

phoenix · Apr 9, 2014

Re: ZFS ; system gets frozen after certain time during copy

Fire up top(1) on the console of the system running ZFS. Leave it running.

Start the rsync and wait for it to lockup. Then check the memory and ARC lines in top.

My guess is that free memory drops to 0, wired memory approaches 32 GB, and ARC approaches 32 GB. And that it's deadlocking in the "need to free memory for ARC but something else is holding the lock I need; and that something else is waiting for memory to be freed".

If that's the case, limiting the ARC to 50% of RAM should allow the copy to succeed. There's a bug (and fix) for this issue somewhere in the post-9.2 stable branch, I believe.

_martin · Apr 10, 2014

Re: ZFS ; system gets frozen after certain time during copy

I recreated the crypto pool again, launched top and vmstat 5| tee log in screen and started copy. As I have a rate of only ~50MBps it took some time to get to this state. Right now the situation is as I've described. What is strange though, top in the screen terminal is working and seems to be showing the current state (it's not hung). vmstat is frozen and does not show anything new.

top output:

Code:

Last pid: 15488;  load averages:  1.95,  1.89,  1.74                                                                                                     up 0+14:09:33  09:14:14
201 processes: 2 running, 199 sleeping
CPU:  0.1% user,  0.0% nice, 17.7% system,  0.1% interrupt, 82.0% idle
Mem: 185M Active, 22G Inact, 7805M Wired, 695M Cache, 1655M Buf, 437M Free
ARC: 3847M Total, 192M MFU, 2627M MRU, 303M Anon, 313M Header, 442M Other
Swap: 48G Total, 1700K Used, 48G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
15204 operator      1  21    0  3840K  1296K tx->tx  4  16:39 12.99% mv
15164 operator      1  27    0  3784K  1124K tx->tx  4   9:38 11.87% mv
15166 operator      1  28    0  3784K  1124K tx->tx  3   9:41 11.57% mv
15082 operator      1  28    0  3784K  1124K tx->tx  6  10:16  9.67% unlink
 3259 root          1  20    0 12052K   692K tx->tx  4  15:27  6.88% syslogd
 2985 root          1  20    0 12052K   684K tx->tx  5  15:24  6.79% syslogd
 3510 root          1  20    0 12052K   692K tx->tx  3  15:23  6.79% syslogd
 5914 root          1  23    0  9976K  1304K tx->tx  2  15:07  5.37% tee
 2193 root          1  24    0 12104K  1372K tx->tx  7  12:04  4.39% syslogd
15436 root          1  20    0 72200K  4712K CPU0    0   1:10  4.39% sshd
 5618 root          1  20    0 80716K 20060K select  6  37:51  0.00% rsync
 5620 root          1  25    0 68428K  3668K tx->tx  2  36:30  0.00% rsync
 2677 vbox         24  49    0  1286M  1078M select  5  29:00  0.00% VBoxHeadless
 2690 vbox         35  52    0   405M   178M select  7   3:39  0.00% VBoxHeadless
 5624 root          1  20    0 16628K  2740K CPU5    5   0:35  0.00% top
 5913 root          1  20    0 18324K  1508K pipewr  6   0:34  0.00% vmstat
 2336 vbox         10  20    0   132M 11400K uwait   5   0:19  0.00% VBoxSVC
 2333 vbox          1  20    0 68148K  7200K select  2   0:06  0.00% VBoxXPCOMIPCD
 5615 root          1  20    0 22984K  2416K select  0   0:02  0.00% screen
 3036 root          1  20    0   200M  8920K select  2   0:01  0.00% httpd

Last vmstat output:

Code:

 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr ad0 da0   in   sy   cs us sy id
 0 0 4262912  1221356    16   0   0   0  2520   0 196 207 1407 3194 323163  0 15 85
 2 0 0 4262912  1221184    25   0   0   0  2700   0 188 218 1438 2996 340443  0 17 83
 0 0 0 4262912  1221328    16   0   0   0  2681   0 189 225 1458 3245 336634  0 16 84
 0 0 0 4262912  1221328    25   0   0   0  2972   0 183 259 1558 3024 343587  0 16 84
 0 0 0 4262912  1221324    16   0   0   0  3065   0 190 236 1484 3203 302536  0 15 85
 0 0 0 4262912  1221292    25   0   0   0  3169   0 207 245 1546 2989 334274  0 16 84
 0 0 0 4262912  1221188    16   0   0   0  2672   0 229 190 1483 3207 289230  0 13 87
 0 0 0 4262912  1221160    25   0   0   0  3091   0 194 261 1541 3044 255267  0 12 87
 0 0 0 4262912  1221312    17   0   0   0  2922   0 196 246 1476 3272 283628  0 13 87
 0 0 0 4262912  1221172    25   0   0   0  2619   0 201 209 1454 3019 268432  0 12 88
 1 0 0 4262912  1221292    16   0   0   0  2871   0 201 230 1522 3219 273285  0 13 86
 1 0 0 4262912  1221304    25   0   0   0  2566   0 179 212 1381 3022 330081  0 17 83
 1 0 0 4262912  1221320    16   0   0   0  2449   0 179 210 1384 3204 380255  0 18 82
 0 0 0 4262912  1221300    25   0   0   0  2539   0 151 228 1432 3019 400706  0 18 82
 0 0 0 4262912  1221292    16   0   0   0  2790   0 158 239 1444 3244 371716  0 17 83

It was able to copy ~850GB before it got to this state.

SirDice · Apr 10, 2014

Re: ZFS ; system gets frozen after certain time during copy

Instead of NFS have you tried scp(1) or sftp(1)? If you have to use NFS try turning off the ZIL: https://wiki.freebsd.org/ZFSTuningGuide#NFS_tuning

_martin · Apr 10, 2014

Re: ZFS ; system gets frozen after certain time during copy

SirDice said:
Instead of NFS have you tried scp(1) or sftp(1)? If you have to use NFS try turning off the ZIL: https://wiki.freebsd.org/ZFSTuningGuide#NFS_tuning

I thought about that, but copy to the NFS share didn't cause a problem. I can access the share via SSH, so that's an option too. According to the wiki though ZFS-NFS should "only" have a performance impact on the system.

I've updated to 9.2-RELENG r264312 and I'm going to retry the copy again. If that fails I'll go with an rsync over SSH option.

_martin · Apr 10, 2014

ZFS;system gets frozen after certain time during co

So it got to the this state again, even sooner (after ~180GB).

Code:

last pid:  4391;  load averages:  2.67,  2.59,  2.21                                                                                                     up 0+01:32:52  14:07:28
176 processes: 1 running, 175 sleeping
CPU:  0.1% user,  0.0% nice, 24.3% system,  0.1% interrupt, 75.5% idle
Mem: 150M Active, 22G Inact, 7009M Wired, 1060M Cache, 1655M Buf, 253M Free
ARC: 3828M Total, 10M MFU, 3228M MRU, 354M Anon, 193M Header, 65M Other
Swap: 48G Total, 968K Used, 48G Free

System behavior was again the same as before. The session where top was running was active though. I quit top and panicked the system from command line ( sysctl). This did work and I have a crash available.

backtrace shows that it paniced due to sysctl initiated panic (expected). This is not my cup of coffee though, please can somebody with more experience stir me to the right direction how to proceed (how to look for) ?

EDIT: I've upgraded to 10.0 r264316. This didn't help - same issue occurred.
Now I've disabled ZIL as you @SirDice advised with:

Code:

# zfs set sync=disabled temple/PORTAL
#

But I didn't do anything to zvol itself. Sync is running now ..

EDIT2: Well, @SirDice, you advice was dead on. Disabling ZIL made the "trick". I was able to move the data back with ZIL disabled on this dataset.

_martin · Apr 11, 2014

Re: ZFS;system gets frozen after certain time during co

Well, I guess I spoke too soon. The 2nd wave of data sync did kill the server again. During the copy I had top on console running and I saw the same situation: rsync, along with any new processes spawned were in tx->tx state. ZIL is disabled on the whole pool including the dataset:

Code:

# zfs get sync temple temple/PORTAL
NAME           PROPERTY  VALUE     SOURCE
temple         sync      disabled  local
temple/PORTAL  sync      disabled  inherited from temple
#

Crash occurred after 480GB was written.

It seems 10.0 can handle this better and did panic itself after a while.

Code:

Unread portion of the kernel message buffer:
panic: I/O to pool 'temple' appears to be hung on vdev guid 12861146308360417584 at '/dev/zvol/portal/bolt00.eli'.
cpuid = 6
KDB: stack backtrace:
#0 0xffffffff808954e0 at kdb_backtrace+0x60
#1 0xffffffff8085cfc5 at panic+0x155
#2 0xffffffff81823073 at vdev_deadman+0x123
#3 0xffffffff81822f80 at vdev_deadman+0x30
#4 0xffffffff8181821e at spa_deadman+0x7e
#5 0xffffffff8087203c at softclock_call_cc+0x16c
#6 0xffffffff80872484 at softclock+0x94
#7 0xffffffff808313eb at intr_event_execute_handlers+0xab
#8 0xffffffff80831836 at ithread_loop+0x96
#9 0xffffffff8082f09a at fork_exit+0x9a
#10 0xffffffff80c5bd5e at fork_trampoline+0xe
Uptime: 4h20m4s
Dumping 3511 out of 32465 MB: (CTRL-C to abort) ..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8085cc40 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447
#2  0xffffffff8085d004 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:754
#3  0xffffffff81823073 in vdev_deadman (vd=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:3391
#4  0xffffffff81822f80 in vdev_deadman (vd=0xfffff800211a6800) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c:3365
#5  0xffffffff8181821e in spa_deadman (arg=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:483
#6  0xffffffff8087203c in softclock_call_cc (c=0xfffff80402d99a40, cc=0xffffffff814baf00, direct=0) at /usr/src/sys/kern/kern_timeout.c:682
#7  0xffffffff80872484 in softclock (arg=0xffffffff814baf00) at /usr/src/sys/kern/kern_timeout.c:810
#8  0xffffffff808313eb in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff80008300400) at /usr/src/sys/kern/kern_intr.c:1263
#9  0xffffffff80831836 in ithread_loop (arg=0xfffff8000830aca0) at /usr/src/sys/kern/kern_intr.c:1276
#10 0xffffffff8082f09a in fork_exit (callout=0xffffffff808317a0 <ithread_loop>, arg=0xfffff8000830aca0, frame=0xfffffe082cfc5ac0) at /usr/src/sys/kern/kern_fork.c:995
#11 0xffffffff80c5bd5e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:606
#12 0x0000000000000000 in ?? ()
(kgdb)

So some sort of deadlock. But why?

usdmatt · Apr 12, 2014

Re: ZFS;system gets frozen after certain time during co

It looks like IO is completely hanging. Do you use root on ZFS? If so then that's probably why everything seems to stop but you can still get some response from running daemons. The system is actually still running but anything that tries to access the file system will hang. On FreeBSD 10 (and possibly some other recent releases?) this triggers the deadman timer, as seen in the backtrack above, which panics the system if IO does not return in 1000 seconds.

There's some information about the deadman timer around if you search the net. They panic the system because the device drivers should either fail or complete within that time. ZFS can't just carry on because if the driver does eventually return, and the system has moved onto something else, the returning driver could end up writing to memory that is now being used for something else.

This is generally blamed on hardware drivers or device firmware, because as mentioned all the lower layers really should have competed or failed within 1000 seconds. I would be interested to see if the problem still occurs if you use ZFS->GELI->UFS though instead of ZFS on ZFS.

If you can find newer firmware for the LSI card it may be wroth trying that. It could be a driver bug which would be unfortunate because FreeBSD is supposed to have pretty good LSI drivers. For more help trying to debug exactly what's causing the IO hang it may be worth asking on the freebsd-fs mailing list.

_martin · Apr 12, 2014

Re: ZFS;system gets frozen after certain time during co

Yes, I've ZFS only setup. I don't think this is HW related problem though. It seems to be the problem with NFS only (or better to say: NFS and ZFS are the key players here). If I do an rsync update via SSH I see no problem at all. This is an option I'm using right now as I need to sync those data data back during the weekend.

As you said, it seems some extra code (deadman timer) was included in 10.0 which at least paniced the system. On 9.2 system would stay waiting for I/O indefinitely. Kernel debugging is not my cup of coffee, I've 0 practical experience with it. I thought I would be able to see at least the reason of the state (ran out of memory, etc.) but it's over my head.

I can't experiment too much on this HW, but I'll try to see if I can reproduce this state in VM (ZFS->geli->ZFS). It maybe worth submitting the PR, but I need to read howto in order to do so (though I'm using FreeBSD over 10 years, I never submitted anything yet).

EDIT: I have a hard time syncing this data. rsync does have a problem too, but it's not fatal for the server. It just exits due to memory issues. I had to relaunched it two times already. According to top though there's no problem with the memory itself. Also there's no log of any kind in syslog.

Code:

..
Write failed: Cannot allocate memory
rsync: connection unexpectedly closed (1381712044384 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.0]
rsync: connection unexpectedly closed (320362 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [generator=3.1.0]
#

..
Write failed: Cannot allocate memory
rsync: connection unexpectedly closed (727390791572 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [receiver=3.1.0]
rsync: connection unexpectedly closed (405289 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [generator=3.1.0]
#

kairu0 · Apr 24, 2014

Running 10.0 here and unpredictably have the "I/O to pool 'XXXXX' appears to be hung on vdev guid" problem on my 4-disk RAIDZ-1 backup server. It always happens when doing automated rsyncs from other boxes. NFS and iSCSI are also running but are not writing at the time of day that the crash usually happens. Also only seems to happen with disk #3, which is a month new and may have issues.

I will try updating firmware as this is an HP server and I can update all available firmware in one shot with an HP DVD.

_martin · Apr 24, 2014

@kairu0 Please do share the results afterwards. In my opinion it's not the FW of the hardware though. I think this is a bug in FreeBSD (I have not hit this bug on Solaris server, so I can't say it's necessarily a ZFS bug).
I'm still fighting other 10.0 bugs which are making my harder than it should, so I didn't do deeper tests with this issue.

apsaras · Mar 1, 2015

Is there any update on that?

I have similar issue with a Dell using <Dell PERC H710P Mini (monolithics)>.

Those are my pools. mfid15,16,17,18 are SSD

Code:

  pool: tank
state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            mfid0   ONLINE       0     0     0
            mfid1   ONLINE       0     0     0
            mfid2   ONLINE       0     0     0
            mfid3   ONLINE       0     0     0
            mfid4   ONLINE       0     0     0
            mfid5   ONLINE       0     0     0
            mfid6   ONLINE       0     0     0
            mfid7   ONLINE       0     0     0
            mfid8   ONLINE       0     0     0
            mfid9   ONLINE       0     0     0
            mfid10  ONLINE       0     0     0
            mfid11  ONLINE       0     0     0
            mfid12  ONLINE       0     0     0
            mfid13  ONLINE       0     0     0
            mfid14  ONLINE       0     0     0
        logs
          mirror-1  ONLINE       0     0     0
            mfid15  ONLINE       0     0     0
            mfid16  ONLINE       0     0     0
        cache
          mfid17    ONLINE       0     0     0

errors: No known data errors

Code:

  pool: zroot
state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mfid18p3  ONLINE       0     0     0

errors: No known data errors

Code:

panic: I/O to pool 'tank' appears to be hung on vdev guid at '/dev/mfid0'.

cpuid = 9
KDB: stack backtrace:
#0 0xffffffff808e7dd0 at kdb_backtrace+0x60
#1 0xffffffff808af8b5 at panic+0x155
#2 0xffffffff81863073 at vdev_deadman+0x123
#3 0xffffffff81862f80 at vdev_deadman+0x30
#4 0xffffffff81862f80 at vdev_deadman+0x30
#5 0xffffffff8185821e at spa_deadman+0x7e
#6 0xffffffff808c492c at softclock_call_cc+0x16c
#7 0xffffffff808c4d74 at softclock+0x94
#8 0xffffffff80883cdb at intr_event_execute_handlers+0xab
#9 0xffffffff80884126 at ithread_loop+0x96
#10 0xffffffff8088198a at fork_exit+0x9a
#11 0xffffffff80c758ce at fork_trampoline+0xe
Uptime: 41d14h41m16s

_martin · Mar 1, 2015

Nope, nobody replied to me on this issue. I had to use the workaround with scp as mentioned above.

apsaras · Mar 1, 2015

I thing I found something but I have to test it. Unfortunately my system having the problem is my main Virtualization Storage so test are not easy.

What I found is that probably I will have to decrease vfs.zfs.arc_max as mentioned to the thread bellow.

https://forums.freebsd.org/threads/vfs-zfs-arc_max-question.39588/

If you have a more easy to test system please try the above and post any feedback.

belon_cfy · Mar 4, 2015

Another memory leak related issue? What about switching to the FreeBSD 10.1-STABLE?

apsaras · Mar 8, 2015

It is a production system so is not easy to do that. Is there any known issue or fix on that?

freebsdinator · Sep 24, 2018

I apologize if this is against the rules to bump an old post, but it appears this issue still exists in FreeBSD 11.2:

https://forums.freebsd.org/threads/...c-i-o-to-poll-zroot-appears-to-be-hung.67015/

Martin, can you tell me if you ever tried to remove GELI from the equation or if you found an answer aside from a workaround? Like Apsaras suggested, I reduced my ARC to 2GB of 16GB, but the system still failed with the 'panic I/O to poll zroot apears to be hung'.

I have a similar setup where I thought it was related to PostGRE SQL, but I managed to crash my system simulating a lot of I/O interrupts without interacting with the database. I'm beginning to suspect it's GELI related, so that will be my next area of focus.

_martin · Sep 24, 2018

freebsdinator I actually do use the very same setup today running on 11.2. Unfortunately I don't remember what tests I did but I think I didn't do tests without geli as it was (is) essential part of that setup. I had to use the workaround just to sync data. I was not happy about it but I didn't have time to play around for too long.

I didn't and I don't use any special tuneable related to ZFS in my setup.

Nowadays I have NFS clients which don't do heavy traffic on those shares so it's ok. But a week ago I had an incident which had similar symptoms as this issue. I was fighting java KVM for so long that I had to reboot the server without knowing what was the issue. I wonder if the issue reoccurs. If yes, I'll share it here for sure.

freebsdinator · Oct 17, 2018

Hey _martin, I was able to isolate the issue. On one pool, I had the encryption level set too high initially (gzip-9e). I changed it after performing the install to lz4 and added more data. When I added data to the existing pool too quickly, it would crash.

When I added 2 more disks for another pool with lz4 to begin with, I had no issues with data transfer speeds, specifically writes. This clued me into destroying and recreating the pool with lz4 compression from the beginning (after transferring my data off). That is what solved it for me.

ZFS system gets frozen after certain time during copy

Administrator