ZFS V28 delete causes loss of network

Goose997 · Sep 3, 2011

Hi

I have a system with ZFS V28 on FreeBSD 8.2-STABLE which I use primarily as media & backup server. I have a RAIDZ pool with 5 2TB disks, machine is X64 with 8GB RAM and boots off a separate UFS filesystem on a separate 1TB disk. It has been running very stable except for the issue below.

I have now 2x had the same problem:

Create a backup share with dedup on, SHA256 + verify on, compress=gzip.
Backup many large files onto it (in this case +- 70 files of 2 GB in size).
Delete the files (or directory).
After a while not possible to access server anymore (Apache webserver not working, SSH remote access not working).
Still possible to "ping" server.

At first I thought this was maybe a problem related to Samba (first time it happened was when I deleted the files on a Samba share). However, this time I was in a terminal window and did an rm -r of the directory.

Unfortunately I am not in a position to dump any information from the server at the moment - previously a reboot managed to recover the system after some time but at the moment it has not recovered yet (after 1 hour?). I am running a headless config so I will have to move the server to connect a monitor & keyboard tomorrow some time. After work - here in the Middle East Sunday is a working day :\

Has anybody experienced something similar, or has any advice?

thanks
Malan

akil · Sep 4, 2011

Hi

Could you check I/O stats ? I guess that you have IRQ storm, I experience similar issue with eSATA drive in my laptop which can't be attached with SATA2 interface (3GB)(for sure, it can be attached, but i need wait several minutes until it start working properly, i don't know why - with Fbsd9 i haven't noticed such behavior), but only with SATA (1,5GB). After plug in my extern hard drive with SATA2, my system has hang up for few minutes. I was not able to execute new commands, but i was able to execute last command, so I prepared each command on diffrent console:
[CMD="shell#"]zpool iostat 1[/CMD]
[CMD="shell#"]iostat -x 1[/CMD]
[CMD="sh#"]for x in `seq 0 99999` ; do vmstat -i; done[/CMD]

Strange is that you discover that after deleting some data.

xibo · Sep 4, 2011

You don't lose your network. You lost responsibility - try increasing your remote shell's timeout interval.
Deduplication requires huge amounts of hash-data to be stored and accessed in random order, so you will require alot of memory to cache it if you want some reasonable performance. There's some tunables that might help (zfs.max_arc_cache and zfs.max_l2arc_cache iirc, can't check now), but in the long run using a _fast_ cache (either an slc-ssd or _a_lot_ of RAM) is the solution - or giving up on dedup.

http://blogs.oracle.com/bonwick/entry/zfs_dedup

Goose997 · Sep 4, 2011

Hi

akil said:
Could you check I/O stats

Unfortunately I cannot check the I/O stats since I am still trying to recover my server. However, I was running gstat during the delete and if I recall it was only about 15-20 KB/s I/O to the disks - so nothing significant.

At some stage I will check again if I can reproduce this behaviour.

thanks
Malan

Goose997 · Sep 4, 2011

xibo said:
You don't lose your network. You lost responsibility - try increasing your remote shell's timeout interval.
Deduplication requires huge amounts of hash-data to be stored and accessed in random order, so you will require alot of memory to cache it if you want some reasonable performance. There's some tunables that might help (zfs.max_arc_cache and zfs.max_l2arc_cache iirc, can't check now), but in the long run using a _fast_ cache (either an slc-ssd or _a_lot_ of RAM) is the solution - or giving up on dedup.

hi
I will try to reproduce the faulty behaviour - I am not sure if only my shell timed out. Previously I lost the server while doing this via a Samba share.

On previous file deletes I can actually see in Cacti how the cache size is reducing while the dedup table is decreasing. I can accept that around 7GB cache might be too small but whether this should have any further side effects other than slowing down some operations I am not sure.

Thanks for the suggestions.

Since my server is still experimental (and if I can recover it without a re-install) I will try to reproduce the error. Is someone will to try to reproduce this?

thanks
Malan

rusty · Sep 4, 2011

Just posting this for reference, a decent comparison between SSD/RAM for dedupe
http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe

Goose997 · Sep 5, 2011

Hi

Just for future use, I will post how I recovered my pool.

Rebooted the system into single user mode.
# fsck
# zpool export pool
# reboot
# zpool import -f pool

I am running:

Code:

FreeBSD 8.2-STABLE (GENERIC) #0: Sat Jul  9 00:34:27 GST 2011

At the moment I am not sure what caused the lock-up: the only way to recover the pool was with the network cable disconnected.

I will post under hardware to check if someone is aware of problems with the driver for the ethernet interface.

Malan

Goose997 · Jan 20, 2012

Hi

I had this problem re-occur today. This time I had a monitor and keyboard attached to the server, so I could at least partially observe what happened.

I was deleting a directory of around 260 GB from a deduped, compressed volume. The delete seemed to work ok, then I could see the system freeze. I could still <alt>F1, F2 etc. between console windows. When running # top I could see snmpd in state pfault, and then ntpd, httpd also going into this state. At this point I lost all access and could not even issue a # reboot from the console. I did a hard reset.

Anyway, I think this is very difficult to trace. I will try to compile a newer kernel and hope it miraculously solves itself.

regards
Malan

phoenix · Jan 20, 2012

In top, watch the RAM line, especially Wired and Free.

To delete a deduped filesystem, you need a *LOT* of RAM. Every block of data in the filesystem being deleted has to be found in the DDT, the DDT entry for that block updated to remove the reference, then all the metadata for that block removed (which may also incur DDT lookups and updates) and then the block removed. And for a delete operation, it really needs to keep the DDT in RAM, for some reason.

You'll see your Free RAM going down to almost 0, and your Wired RAM increasing up to vfs.zfs.arc_max. If you haven't limited it, then all RAM will be Wired ... at which point the system locks up because there's no RAM available for anything but ZFS.

If you don't have an L2ARC device in the system, add one. If you don't have over 16 GB of RAM in the system, add more RAM.

Goose997 · Jan 20, 2012

phoenix said:
If you don't have an L2ARC device in the system, add one. If you don't have over 16 GB of RAM in the system, add more RAM.

hi Freddie

I have 16 GB RAM, and 40 GB L2ARC. Previously I was worried that it was memory, but it seems to be something else. The delete seems to succeed after running a long time (maybe 10 mins?) during which I monitored the disk activity with gstat. After I recovered the system the directory was deleted.

I will try this in about a month's time again: I usually have one full disk image that I delete at that point and will see if the newer 8.2 release I built today will survive it :\

thanks
Malan

Goose997 · Jan 21, 2012

phoenix said:
In top, watch the RAM line, especially Wired and Free.

You'll see your Free RAM going down to almost 0, and your Wired RAM increasing up to vfs.zfs.arc_max. If you haven't limited it, then all RAM will be Wired ... at which point the system locks up because there's no RAM available for anything but ZFS.

I recompiled the kernel yesterday, so I have the latest 8.2 RELEASE. This morning I did another 2 deletes (around 50-60 GB each with one from the terminal and one over Samba) and no crash. I checked the RAM usage, Free seems to go down from about 1200 MB to 450 MB during such a big delete. Only a small amount of swap got activated, less than 100 MB at any time.

Still, this is too early to tell if it really solved now. When the previous crash happened it was deleting around 200 GB of deduped data, so it might be that I am not stressing the server enough

I will post here again in a few weeks once I have a large enough image to delete.

Malan

Goose997 · Feb 25, 2012

Hi,

I deleted 257 GB of compressed, deduplicated data and it ran out of memory - there was 15 MB free when the system crashed. The value of vfs.zfs.arc_max was around 15 GB (automatic setting). I have 16 GB of memory, now I have set vfs.zfs.arc_max to 12 GB.

I will check again once I have enough files accumulated - should take another month. It is a pity that the system just crashes, I would have expected that system defaults should prevent this type of thing happening?

Malan

Goose997 · Feb 25, 2012

Hi

It still crashed. I was busy deleting around 109 GB of files, and it crashed with about 25 GB left, and showing 5 GB of RAM free. I am really at my wit's end - despite everything I have tried I am seriously starting to doubt that the ZFS code is as robust as claimed.

regards
Malan