ZFS raidz2 keeps freezing 8.0 64bit system

danbi · Oct 19, 2010

What is your setting for vm.kmem_size?

In the days of 8.0 I was routinely adding

Code:

vm.kmem_size="12G"

to /boot/loader.conf, as otherwise weird things happened under load with ZFS. This is for 8GB RAM system. Now, with -stable few systems run without any tuning and no crashes.

It looks like you have some ZFS corruption, as you are getting those asserts from zdb. Perhaps good idea is to rebuild and re-populate your pool.

big_girl · Oct 19, 2010

Thanks for that. I saw your post but hadn't tried it.. unfortunately it gave the same (Fatal trap 12) error upon
# zpool import -a

Since the zpool started having these problems after a really big rsync transfer (thinking more about it there was a 93GB file in the bunch) I was actually wondering if I might be running out of memory and then swap, but maybe not..?

Prior to trying that, I also tried removing the /boot/zfs/zpool.cache file and importing again; the command executed, returning me to the prompt, but then the system froze a few seconds later, also with the same error; however this time it also printed a

Code:

bufwrite: buffer is not busy???

error to the console as part of the Fatal trap 12 error.

The Gravity Test is looking more and more appealing..

Galactic_Dominator · Oct 20, 2010

Perhaps it is the livelock issue mentioned here:

http://lists.freebsd.org/pipermail/svn-src-all/2010-October/030158.html

big_girl · Oct 20, 2010

That does seem like a good match. I'm running 8.0-RELEASE-p4 and looking at /usr/src/sys/geom/eli/g_eli.c I see its version is 1.44.2.1.2.1 dated 10/25/09, so I assume my compiled version is the same? Since this was before the 4/15/2010 change, unless I'm missing something, it couldn't cause this error(livelocking)?

Thanks,
-bg

danbi · Oct 20, 2010

Have you tried booting recent OpenSolaris and trying to import the pool?

big_girl · Oct 20, 2010

danbi said:
Have you tried booting recent OpenSolaris and trying to import the pool?

That may be what I'll end up doing.. it will be sort of a pain since I will have to decrypt the volumes with geli first.

Am I right to assume that the compiled version of the corresponding system file that I have is the same as the version of this source file /usr/src/sys/geom/eli/g_eli.c in my 8.0-RELEASE-p4 install? I'm not sure how to get the version otherwise..

Thanks!
-bg

EDIT : I installed the 8.0-RELEASE without updating (from the DVD) and had the same problem. Thus I can rule out the livelock issue.

big_girl · Nov 8, 2010

A couple of other thoughts...

Sorry to keep kicking a dead pig, but, I added

Code:

vm.kmem_size="12G"
vfs.zfs.arc_max="4G"

to /boot/loader.conf on 8.0-RELEASE, 8.1-RELEASE, and 8-STABLE, but had the same freeze.

I also typed

Code:

sysctl -a | grep vfs.zfs.zio.use_uma

And found, on my installations of 8.1-REL and 8-STABLE where it is tunable, this parameter disabled (set to "0" by default). (I found some threads out there about issues with this creating system hangs if it is enabled). So probably no issue there..

Incidentally, this zpool was probably 80% full, but another thing that occurs to me is that I've never emptied the trash, and have deleted beyond the capacity of the raidz2 zpool volume. Searching another (living) ZFS volume, I can see where my deleted files go, but again it's unclear to me whether freeBSD/ZFS will automatically purge my deleted files or not(?). Just wondering if perhaps what I'm seeing has to do with the volume being totally full..

Also, another thing that happened over time, was that somehow two of the six volumes seem to have switched places on the controller.. I noticed because the geli keys I use to decrypt them before zfs mount now decrypt two in the reverse order. They do decrypt OK, but I was wondering if this might potentially wreak havoc on my zpool?

At this point most of this is academic as I have a pretty recent backup of everything and it seems like there's no way this volume will EVER come back online. The main thing I'd actually be interested in is simply getting a list of files and directories created/modified after a given date, so that I can make sure I don't actually lose anything (or at least know what I have lost) -- if there is a way to get this using ZDB or similar (?), that would be really helpful..

Incidentally, are there any resources for how to actually use ZDB?

Thanks,
-bg

big_girl · Dec 18, 2010

Final lesson: don't use ZFS + rsync if you value your data and time..

Of course, I may have to re-evaluate that down the road, but that really seems like the most likely culprit. Total bummer as I really like rsync..

phoenix · Dec 18, 2010

ZFS + rsync works wonderfully. Been using it since ZFSv6 hit the FreeBSD 7.0 tree at work. rsync backups of 127 remote servers every night, each into their own sub-directory, snapshots taken every morning. Then rsync'd to another server across town during the day.

Main server is now FreeBSD 7.3 with ZFSv14; secondary server is now FreeBSD 8.1 with ZFSv14.

We restore files and directories via rsync on an almost daily basis. And use Frenzy/Knoppiz + rsync to restore entire servers at least once a month.

What really matters is what rsync options you use.

big_girl · Jan 6, 2011

I just saw this post.

Thanks for this - would you mind posting the rsync parameters you use routinely as well as the parameters that tend to crash ZFS servers, from your experiences? This would be really helpful for me. The parameters I use with rsync are simply '-av' and I've had problems with large (~500GB) transfers, in which are contained large files, such as disk images, which can individually be 80-100GB each. I don't recall any problems with smaller, quicker transfers, which led me to believe that my configuration options for ZFS memory use were probably incorrect/unsafe. I haven't been able to determine much about what has caused the crashes, and frankly, I've invested a foolishly large amount of time trying to find the reason when, for better or worse (and most likely due to my own inexperience) I don't seem to have the problem when I don't use rsync.

In the past and currently, I've used rsync on large 64bit fedora servers running very large (20TB) raid volumes, both within the box and over networks, miles away, but have never had any problems there with instability, yet have had unacceptably high failure rates using it on freeBSD + ZFS, the details of which are posted earlier in this thread.

Thanks,
-bg

phoenix · Jan 7, 2011

Shameless plug:
Thread on rsync backups

Our current set of rsync options are:

Code:

--archive --delete-during --delete-excluded --hard-links --inplace --numeric-ids --partial --stats

Using HPN-enabled net/openssh-portable in place of base OpenSSH, with the following options (only enable None if on a network you trust, and if both ends are using HPN; the buffer is a must):

Code:

-oHPNBufferSize=8192 -oNoneEnabled=yes -oNoneSwitch=yes

big_girl · Jan 6, 2012

Thinking more deeply about this issue over the last year (but not having the time to rebuild this server; thankfully it was backed up right before death & I'm too busy), I doubt freeBSD or the ZFS implementation (or their initial, ZFS-related parameters) were to blame at all. In fact, freeBSD is a beacon of stability in a turbulent world.

There are several likely possibilities, each of which could have killed my zpool:

1) power supply might have reset at high, sustained voltages - I wouldn't recommend http://www.newegg.com/Product/Product.aspx?Item=N82E16817148040 After more consideration, a good power supply is never a bad deal. The high-end ones also have very long warranties, like 7 years. The cheap ones break right after the warranty ends. Consider the $$$/year.
2) disks might have switched IDs. As this was my first time with freeBSD and also a hodge-poge of leftover parts (except for the ZFS disks and power supply, which I had to buy), I swapped multiple controllers in this machine and suspect that two disks might have swapped at some point. I also used multiple system disks, connected to different controllers at various times. Hardwiring SCSI IDs in the future.
3) apparently existing, weird behavior from the fedora machines I was transferring to/fro. Does anyone else have weird problems when transferring big files over a cheap switch between fedora boxes? When the same machines are transferring data from a USB connection (as opposed to SATA) this does not happen (still over same cheap switch). I'm talking about ~400GB and more data, mostly music files and large (80GB disk images) all in the same rsync run.
4) a few power outages, like the one due to my idiot cousin (& that ratty power supply).
5) untweaked rsync & ssh parameters.

Fortunately because of the backup I can just consider this a pretty good initial experience.

-BG

tnpimatt · Jan 6, 2012

Hey big_girl,
I read through your ordeal with a bit of nostalgia. In 2008 I build a pair of ZFS servers for doing backups of linux systems via rsync. Read my post for some background. In short, I was running 3 concurrent rsync processes. Why 3? Because empirical testing proved that 3 concurrent rsync processes could saturate my ZFS pools. More than 3 rsync processes put more memory pressure on the system and slowed everything down.

Each rsync process was backing up a Linux VPS to my ZFS based backup servers. For moving around that much data, rsync + SSH was a non-starter, it just could not move enough data across the network to get 8,000 servers backed up in less than a week. We had a private network available so I pushed all the backup traffic across that network unencrypted, using rsyncd. I didn't have to tweak rsync params at all. IIRC, excepting some minor tweaks for monitoring and reporting, I used the default rsync options that rsnapshot defaults with.

I highly doubt #5 was related to your problem. One thing I can tell you though is that ZFS on FreeBSD behaves much, much better when running atop real RAID controllers. I had all sorts of problems with ZFS on OpenSolaris and FreeBSD 7 during initial testing. With 24 disks across 3 controllers, performance and stability were both terrible. When ZFS only had to stripe data across two 12-disk RAID volumes with each controller having 1GB of BBWC, it performed fairly well.

big_girl · Apr 2, 2012

6) Make sure your outlets are wired correctly. My landlord is less than worthless, and I recently discovered that the circuit powering most of my equipment was wired by an amateur electrician and was not properly grounded. After getting shocked and having a computer fried (not the same server as with the ZFS, for better or worse) one day not too long ago, I bought a $3 outlet tester and discovered this. Always test outlets before plugging into them. This might have contributed to instability.

So I got back into messing around with this box over the last few days... I bought a nice, efficient power supply with a 7yr warranty, rated at 910 watts with 12 SATA connectors. I would have preferred to get bona fide server equipment but I need frequent physical access to the machine and could not tolerate the noise.

I installed 9.0 release and set up everything as noted before in this thread. The idea was to chuck this pool and create a new one from a backup I have from just before the pool became unuseable.

For the hell of it, I decided to try and import the pool with my new setup before erasing the disks. As is typical, I got some interesting results. After I typed import, CPU useage went almost to 100% for a few hours before the system froze. I rebooted again, decrypted the volumes, and [CMD=]zpool status -x[/CMD] returned some information. Paraphrasing (except where quoted), this is what it said:

State said the pool was online.
Status said the pool was older and should be upgraded.
Action said to upgrade the pool since it was older.
Scan said scrub in progress since when I tried to import this pool (which was last night). "3.78T scanned out of 3.85T at 1/s, (scan is slow, no estimated time)
0 repaired, 98.32% done"
Config showed the pool correctly with no errors.

Then the system froze.

I rebooted again, decrypted the disks, and once more typed

Code:

zpool status -x

Then it froze instantly, stayed frozen for a second and then rebooted. There wasn't anything that seemed to be related in /var/log/messages

I ran [CMD=]zdb -v tank[/CMD] which ran for a few minutes, then threw a nearly identical error as to what I described on this thread (dnode.c error, see post 19) and returned me to the command line without any freeze or crash. Running it again causes the same crash/dump at exactly the same point (or at the same file, from the 'zdb' output).

I then upgraded [successfully] to v28 a second before the system froze and then rebooted. Running 'zdb' again on the upgraded pool bonked on exactly the same file as before. The 'status' command then returned 'All pools are healthy' before she froze a few seconds later and rebooted.

What seems to be the case here is that there is a file or region on this raidz2 volume that chokes the ZFS. E.g. each time I use the 'status' command, it appears to cause the ZFS to resume its scrub where it previously left off (at 98.32%), then it hits the rough patch and causes a kernel panic. Since this seems to correspond to a file or region, is there anyway I can tweeze this file or group of files out for removal?

It seems like the pool is mostly fine and that I might be able to recover it, but then again I've spent countless hours already, making me think I should probably kill this f-ing thing once and for all and start over..

EDIT: I was able to then export the pool, but now the system freezes upon import.