ZFS panic the system

Hi

I moved from an OpenSolaris box to FreeBSD to serve files from a ZFS pool. Very very bad idea. I'm using
Code:
FreeBSD -snip- 9.0-RELEASE FreeBSD 9.0-RELEASE #0. AMD64

First of all I recompiled the ZFS module with the patch to avoid panicing on ls -la. It mounted the volumes ok and upgraded to v28 my pools. Installed Samba 3.6.4 and shared files on ZFS.
Edit: the patch is http://people.freebsd.org/~pjd/patches/zfs_sid.h.patch.

It worked ok on my test machine with batch stress tests. The day I put it in production, it started panicing every couple of minutes. The problem could be related to concurrency, I suppose.

I read all the forums, tried all the sysctl configurations. Nothing to do, only panics. Attached is the panic screen, referring smbd. I also tried compiling smbd with GCC and CLANG, each time with and without optimizations.

No way. Any ideas?

This is 1 of 3 big problems I have with ZFS on FreeBSD, doing the same things I did ok on Opensolaris. Other problems in separate posts. Having samba out of service I switched to zvols exported via iSCSI to a Windows machine, with some hiccup too.
 

Attachments

  • zfs_dump.jpg
    zfs_dump.jpg
    95.8 KB · Views: 358
Wow. This situation is so unprofessional of FreeBSD devs. I was about to create a thread about another slowness on FBSD 9.0 where SXCE is going four times as fast, same hardware, same disks, 512 b/sec by the way, good old fast SAS disks. Now I won't even bother and I'll keep using OpenSolaris or Solaris Express. The numbers don't add up regarding ZFS on FreeBSD, which by any other measure is pretty good and stable. I use it for a number of things and it's been good but back to my complain (and I don't mean to hijack the thread) I wonder why FreeBSD is trumpeting ZFS as 'stable' on http://www.freebsd.org since 8.2 when it's obviously incomplete, demonstrably slow, and now panics too. Just DON'T lie! ZFS on FreeBSD is not ready for production so don't lie on the front page, it hurts credibility terribly.
 
lucinda said:
Wow. This situation is so unprofessional of FreeBSD devs. I was about to create a thread about another slowness on FBSD 9.0 where SXCE is going four times as fast, same hardware, same disks, 512 b/sec by the way, good old fast SAS disks. Now I won't even bother and I'll keep using OpenSolaris or Solaris Express. The numbers don't add up regarding ZFS on FreeBSD, which by any other measure is pretty good and stable. I use it for a number of things and it's been good but back to my complain (and I don't mean to hijack the thread) I wonder why FreeBSD is trumpeting ZFS as 'stable' on http://www.freebsd.org since 8.2 when it's obviously incomplete, demonstrably slow, and now panics too. Just DON'T lie! ZFS on FreeBSD is not ready for production so don't lie on the front page, it hurts credibility terribly.

ZFS on FreeBSD works beautifully on all kinds of setups. From my lowly P4 at home to our Opteron-based storage boxes at work. Plenty of people have been using it "in production" without issues. Some people use it "in testing" with all kinds of issues. Just like any other piece of software. (Or, are you going to say that Windows 7 is not "production quality" since many people have issues with it?)

Been using it since ZFSv6 on FreeBSD 7-STABLE. And our latest setup can saturate a gigabit link (915 Mbps) doing "zfs send | zfs recv" between similar boxes. Using plain SATA controllers and plain SATA drives.

And, just how is it "incomplete"?

The only one who's hurting their credibility here is you. Signed up this week. Posted a whole 2 times. Both posts disparaging ZFS with hardly any information in either post. Yeah, there's a lot of credibility there. Guess all the FreeBSD devs should just pack it in and retire.
 
lucinda said:
Wow. This situation is so unprofessional of FreeBSD devs. I was about to create a thread about another slowness on FBSD 9.0 where SXCE is going four times as fast, same hardware, same disks, 512 b/sec by the way, good old fast SAS disks. Now I won't even bother and I'll keep using OpenSolaris or Solaris Express. The numbers don't add up regarding ZFS on FreeBSD, which by any other measure is pretty good and stable. I use it for a number of things and it's been good but back to my complain (and I don't mean to hijack the thread) I wonder why FreeBSD is trumpeting ZFS as 'stable' on http://www.freebsd.org since 8.2 when it's obviously incomplete, demonstrably slow, and now panics too. Just DON'T lie! ZFS on FreeBSD is not ready for production so don't lie on the front page, it hurts credibility terribly.

I'm not a benchmark geek and use variuos Solaris too. Also in my opinion Solaris is somewhat faster, but not having any number and considering different hardware, I just can't say.
My first consideration in using ZFS is affordability and flexibility. Doing single zfs send|receive using mbuffer I can easily choke my Gbit connection on FreeBSD, and this is enough for me.
I just would like to reach the point where I was on Solaris, creating a storage box and forgetting about it. It is clear FreeBSD is not at that level today, but posting bugs (and PRs) and giving the maximum support to devs is the only way to have it work.
Using an OS (Solaris) just like a filesystem driver, nowadays seems to me somewhat bizzarre, we NEED a replacement.
 
lucinda said:
Wow. This situation is so unprofessional of FreeBSD devs. ... ZFS on FreeBSD is not ready for production so don't lie on the front page, it hurts credibility terribly.

Well, I wouldn't word it so strongly, [you might make enemies that way... for example phoenix is a VERY knowledgeable guy, and posts all around the forums helping people; you want him on your side] but I agree that the situation needs to be fixed or explained better, especially for anyone new to FreeBSD. If I was in charge of it, I would simply add a bunch of warnings everywhere where you see 9.0 and in the OS itself, that if you are using 9.0-RELEASE and you get any instability, and don't have the resources to test and solve them all yourself [such as in a small business], you should immediately move to 8-STABLE or 8.3-*, because it really is stable (personally haven't tried 8.3-* yet). And unless you honestly plan on testing and experimenting to be sure it is ready, you should not put it on production at all, even to try it out, unless you don't need to rely as much on it (eg. 2nd dns or dhcp server, nagios server, etc.).

It is common knowledge in the FreeBSD community that you should avoid a x.0 release on any production server until extensive testing, but it is not declared plainly for newcomers to see. This I see as the biggest issue. (and I think 8.2 was an exception to the rule... it was very bad for ZFS, without being a x.0 release, but 8-STABLE csupped on 2012-02-04 is working fine for me, with only a few problems, only half of which are ZFS related.)

I can't use FreeBSD on my desktop... there are too many things I need that aren't available (ext2 [hangs the system], ext3 and ext4 support [incomplete], ntfs [way too slow at 50% write]) and I assume more that I'm not aware of. But if I did, I would put 9.0 on it (until I got fed up with it), just to be part of the bug discovery experience. That is the only place I would put 9.0... I wouldn't put it on a server. And I am not complaining that something is terribly wrong with the devs, just explaining my understanding of the process, and complaining that it isn't well known enough to newcomers.

I have my Feb. 2012 8-STABLE version on 4 large 36 disk file servers (only 1 full with disks), and one test VirtualBox server (which will become production once the proper hardware is delivered; since 8.3 release is out, I'll use that when it comes). All are pure ZFS.
 
LoZio said:

I recognize that patch;)

But our problem was a bit different. Your panics are related to SAMBA, while our problem wasn´t protocol-specific; after mounting any zfs filesystem, it panicked even if you went near it with a 30-foot pole:) After recompiling with the patch, it has never panicked again, and we are now using it with SAMBA, connected to AD-domain- saturating 1GbE at times. I call that "creating a storage box and forgetting about it".

/Sebulon
 
Sebulon said:
I recognize that patch;)

But our problem was a bit different. Your panics are related to SAMBA, while our problem wasn´t protocol-specific; after mounting any zfs filesystem, it panicked even if you went near it with a 30-foot pole:) After recompiling with the patch, it has never panicked again, and we are now using it with SAMBA, connected to AD-domain- saturating 1GbE at times. I call that "creating a storage box and forgetting about it".

/Sebulon

I referred the patch to allow for case reproduction, I read the discussion and what it was about to fix. Also, in that case the problem was easy to reproduce, in my case I had a test box running overly stressed for days with no hassle, when two real user connected the box was gone in less than a minute.
 
@peetaur

Yepp, several 9.0-RELEASE in production. Haven´t had any issues this far, knock on wood:)

/Sebulon
 
peetaur said:
Well, I wouldn't word it so strongly, [you might make enemies that way... for example phoenix is a VERY knowledgeable guy, and posts all around the forums helping people; you want him on your side] but I agree that the situation needs to be fixed or explained better, especially for anyone new to FreeBSD. If I was in charge of it, I would simply add a bunch of warnings everywhere where you see 9.0 and in the OS itself, that if you are using 9.0-RELEASE and you get any instability, and don't have the resources to test and solve them all yourself [such as in a small business], you should immediately move to 8-STABLE or 8.3-*, because it really is stable (personally haven't tried 8.3-* yet). And unless you honestly plan on testing and experimenting to be sure it is ready, you should not put it on production at all, even to try it out, unless you don't need to rely as much on it (eg. 2nd dns or dhcp server, nagios server, etc.).

It is common knowledge in the FreeBSD community that you should avoid a x.0 release on any production server until extensive testing, but it is not declared plainly for newcomers to see. This I see as the biggest issue. (and I think 8.2 was an exception to the rule... it was very bad for ZFS, without being a x.0 release, but 8-STABLE csupped on 2012-02-04 is working fine for me, with only a few problems, only half of which are ZFS related.)

I can't use FreeBSD on my desktop... there are too many things I need that aren't available (ext2 [hangs the system], ext3 and ext4 support [incomplete], ntfs [way too slow at 50% write]) and I assume more that I'm not aware of. But if I did, I would put 9.0 on it (until I got fed up with it), just to be part of the bug discovery experience. That is the only place I would put 9.0... I wouldn't put it on a server. And I am not complaining that something is terribly wrong with the devs, just explaining my understanding of the process, and complaining that it isn't well known enough to newcomers.

I have my Feb. 2012 8-STABLE version on 4 large 36 disk file servers (only 1 full with disks), and one test VirtualBox server (which will become production once the proper hardware is delivered; since 8.3 release is out, I'll use that when it comes). All are pure ZFS.

peetaur, thank you so much for the down to earth, kind message, hard data, and useful information without looking first at who I might be or how many posts I have. I can only hope that half the people in any forum were like you. Maybe it won't satisfy me but I will certainly make another attempt with 8.3.

I agree that the mechanics of the releases is unclear; however, I am the type to read all the documentation before moving and I knew that 9.0 RELEASE is more dangerous. I was only testing but precisely because I have come to expect stability and speed from FreeBSD I wanted to stress the implementation in 9.0 and while I saw errors about some secondary label corrupt, sorry, don't remember them, just using it, reading and writing, but not pulling discs out, or stressing it seriously, it's also slower than Solaris on exactly the same hardware. This was my complaint. Then I come here and I see that these problems are common and there is even (admittedly, maybe) a crash with ZFS involved. The fact that it works for some does never preclude the fact that it doesn't for others. There is this elitist attitude about problems that I must be doing something wrong, that I haven't twisted and poked it enough. This attitude repels people asking questions and attracts people accomodating the status quo.

Back to my words, I was upset precisely because I know FreeBSD works well, I have been using it for a little more than a year now, from 20110302 according to my archives, and if the posts in this forum are half true, ZFS on FreeBSD should carry a warning about it not being ready for production. Not seeing this warning anywhere I was certainly upset. ZFS is complex and bugs and oddities are to be expected but when it's not ready, it's not ready. Retorting that because all software can always have yet one more bug, we should stand them with biblical patience is too poor for my standards.

I will have a second look because I prefer FreeBSD to Solaris for being free and also it is more likely to install anywhere than Solaris or Solaris derivatives but when there are no knobs neither in the HBA driver nor in ZFS (amd64 autotuning according to the handbook) I expect it to perform nominally.

Looking a bit I see the slowness is pretty much a given nowadays. I have not seen such abysmal results as the ones in http://www.zfsbuild.com/2010/09/10/freenas-vs-opensolaris-zfs-benchmarks/ but I would be grateful if someone can point to better news?

Regards
 
lucinda said:
Back to my words, I was upset precisely because I know FreeBSD works well, I have been using it for a little more than a year now, from 20110302 according to my archives, and if the posts in this forum are half true, ZFS on FreeBSD should carry a warning about it not being ready for production. Not seeing this warning anywhere I was certainly upset. ZFS is complex and bugs and oddities are to be expected but when it's not ready, it's not ready. Retorting that because all software can always have yet one more bug, we should stand them with biblical patience is too poor for my standards.
I've been using FreeBSD + ZFS heavily since 8.1, and in production (32TB per system) since 8.2. Aside from a couple of instabilities (none of which caused data loss) when ZFS v28 was committed, it has been rock solid. I did have a hardware problem where my ZIL drive failed, and any attempt to write to the pool would panic the system. This was before ZFS v19 or so, when ZIL devices could not be removed.

Looking a bit I see the slowness is pretty much a given nowadays. I have not seen such abysmal results as the ones in http://www.zfsbuild.com/2010/09/10/freenas-vs-opensolaris-zfs-benchmarks/ but I would be grateful if someone can point to better news?
Each of my systems can do 500MByte/sec writes for days on end - take a look at this graph from benchmarks/iozone Click for image
 
LoZio said:
First of all I recompiled the ZFS module with the patch to avoid panicing on ls -la. It mounted the volumes ok and upgraded to v28 my pools. Installed Samba 3.6.4 and shared files on ZFS.
Edit: the patch is http://people.freebsd.org/~pjd/patches/zfs_sid.h.patch.

I just read that the first time. Never had issues with ls -la on mounted ZFS when I was browsing though its content. So the question is, if this is really necessary, or does it depend on some more i.e. 32 / 64 bit?

Just wondering whether I should also ad it - just to prevent any panics.
 
lucinda said:
...and if the posts in this forum are half true, ZFS on FreeBSD should carry a warning about it not being ready for production. Not seeing this warning anywhere I was certainly upset.

I wouldn't word it like that. I think it is very production ready, you get various bugs here and there, depending on your specific use case. For example, lately it was discovered that (with 9.0 or the new NFS server, which I don't use) when removing very many files on an NFS share backed by ZFS, there are memory holes. There is a patch for that now. There are other issues when importing a pool from Solaris (which I don't do). I've seen 2 patches to fix such things. These things don't affect me, so I could conclude that it is very solid. But your use case might differ.

lucinda said:
Looking a bit I see the slowness is pretty much a given nowadays. I have not seen such abysmal results as the ones in http://www.zfsbuild.com/2010/09/10/freenas-vs-opensolaris-zfs-benchmarks/ but I would be grateful if someone can point to better news?Regards

That benchmark is FreeNAS vs Solaris, not FreeBSD. It is a terrible comparison... FreeNAS's is many releases behind. In the case of the above article, FreeNAS used a FreeBSD 7 or so. And I think those benchmarks sound like they are using iSCSI, so unless you are using iSCSI, it is not directly related to your use case.
 
Back
Top