ZFS Panic Galore

I'm running a 32 bit version with zfs. There is 4GB of RAM on this machine.

Lately it has started crashing every month or so. I had more RAM added and upgraded to 8.3. I was trying the PAE kernel but it could crash after an hour or so.

So I backed down to the regular GENERIC kernel but if I'm lucky it stays up for an hour.

attachment.php


Any insight into what might be wrong?
 

Attachments

  • crash.jpg
    crash.jpg
    43.5 KB · Views: 668
Looks like an HDD problem to me....
Are you sure that the system is not having HDD time-outs, which means loosing the connection with the HDD? Look through all the messages in the system's logs for any hint of what's going on.

Install and run sysutils/smartmontools.
# smartctl -a /dev/ada0
will show how many and what type of errors you have had re the HDD.
 
There are no hints in the system logs.

Here is some of the output
Code:
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD5000AAKS-00V1A0
Serial Number:    WD-WCAWF3101822
LU WWN Device Id: 5 0014ee 1027d88ad
Firmware Version: 05.01D05
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Fri May  4 14:36:31 2012 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

snip...

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART overall-health self-assessment test result: PASSED

To me it would appear that the drive is in good health.
 
Here is another dump.

attachment.php


Hardware problem? I just had the RAM replaced with fresh stuff and it happened again.
 

Attachments

  • crash1.jpg
    crash1.jpg
    66.4 KB · Views: 652
Have you done any tuning of vm.kmem_size* tunables? On i386 the recommendation is to up both vm.kmem_size and vm.kmem_size_max at least to 512M for stable operation of ZFS:

http://wiki.freebsd.org/ZFSTuningGuide#i386

As the page states, if you need more than 512MBs of kmem you'll have to compile your own custom kernel with increased KVA_PAGES setting.
 
The kernel is a GENERIC 8.3-RELEASE-p1 as of May 4th 2012.

The only startup options are
Code:
vm.kmem_size="512M"
vm.kmem_size_max="512M"
vfs.zfs.arc_max="160M"

These were taken from the ZFSTuningGuide. I also did try recompiling the kernel with the KVA_PAGES options but it would panic immediately. My assumption is that a GENERIC kernel has the best chance of being consistent with the rest of the world.

I am having serious second thoughts about continuing to run ZFS on a production machine.
 
You would probably have better luck with amd64 version of FreeBSD, all of this tuning would be mostly unnecessary except for the vfs.zfs.arc_max setting.

I would run both short and long self-tests on the disk drive with smartctl(8) to make sure the drive is ok, sometimes clean SMART stats do not tell the whole story about the drive's health.
 
Those messages look way too much like hardware error and I propose you first eliminate any such possibility from your system.
If you are sure that the HDD is fine, try memtest, preferably from a linux CD or else sysutils/memtest.
 
All of the smart tests passed. I have arranged to have a fresh hard drive installed with a fresh copy of 64 bit FreeBSD on it. I'll see if I can use that to read the ZFS partitions.
 
We completely replaced the hardware. Same issue. I suspect it may be a bug in the ZFS system where it's unable to deal with some sort of filesystem corruption in the ZFS. There are now 2 drives in the system, one with FreeBSD 64 bit and the original. Any suggestions on how I might try to mount this filesystem from with the FreeBSD 64 bit system? Of course the fresh install doesn't know how to find the drive with the old zfs pools.
 
Sorry to have wasted your time on a the hardware side, better to be safe than sorry though I think...

Can you tell us where is your swap? Is swap part of the ZFS pool or on its own proper slice?

Any suggestions on how I might try to mount this filesystem from with the FreeBSD 64 bit system?
# zpool import -f -R /media/rescue <poolname>
f to force the import, R to specify where you want it mounted (altroot)
You can also pass in the above command -o canmount=noauto to prevent automatic-mounting of datasets, then mount datasets by hand using
# mount -t zfs pool/dataset <mountpoint>
 
The swap was on a slice. There was only 1 ZFS partition.

Code:
FreeBSD cl-t153-284cl 8.3-RELEASE FreeBSD 8.3-RELEASE #0: Mon Apr  9 21:23:18 UTC 2012     
root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

Code:
zpool import -f -R /dev/ad16s1g email
Unexpected XML: name=stripesize data="0"
Unexpected XML: name=stripeoffset data="977305088"
---SNIP---
Unexpected XML: name=stripesize data="0"
Unexpected XML: name=stripeoffset data="32256"
cannot import 'email': pool is formatted using a newer ZFS version

The 32 bit version of the OS was cvsup'd within minutes of each other. No idea why they show different versions. Is it possible that the format of the file has changed with the 64 bit version? I understood this was one of the strengths of zfs - uniform format between all platforms.
 
# zpool upgrade -v
Will give you the version you are running. For details, zpool(8)(). The mountpoint in import needs to be folder name, not device.
# zpool import -f -R /media email
 
It would appear that I'm running version 14. I thought I read somewhere that 8.3 was supposed to be version 28. I was planning to upgrade it to 9 so I'll try that and see if it's able to read the 32 bit filesystem that way.

Just for kicks I tried to rebuild the kernel (dual booted to 32 bit) so maybe I could get it to run long enough and query what ZFS version it is. If I load the zfs module on that I get about 5-10 seconds before kernel panic...
 
I'm still having problems with this zfs thing. I upgraded to 9.0-release.

The replacement filesystem has a ZFS pool called email. I need to find a way to access the email zfs pool from the old disk which is installed in the system. It's on /dev/ad16s1g whereas the "good" zfs system is on /dev/ad6s1g and it's up and running properly.

The closest I've gotten is this:
Code:
zpool import email ad16s1g
cannot import 'email': pool may be in use from other system, it was last accessed by mail (hostid: 0xb486f72a) on Sat May  5 22:14:07 2012
use '-f' to import anyway

I'm a little afraid at this moment that it's going to do bad things to my existing pool that has the same name. How will I know what pool is which? I want to mount it, copy the data off and unmount it.
 
Ok, bit the bullet. Backed up my /email and tried it. Looks like there must be a bug in the kernel.

[cmd=]zpool import -f email ad16s1g[/cmd]

attachment.php
 

Attachments

  • crash2.jpg
    crash2.jpg
    73.9 KB · Views: 536
Leave out the ad16s1g from the import command, the pool is detected by on disk metadata and you can not give device names as part of the import command like that.
 
How do you tell zfs to look on that disk for the partition? If I do a zfs list here is what I get:
Code:
zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
email   118M   356G   118M  /email

That's my new email zfs filesystem living on /dev/ad0s1g.

As you can see from my feeble attempts above I really want to work with the one on /dev/ad16s1g so I can copy the data to this one.
 
The detection is solely based on on-disk metadata, you can not tell zpool(8) to use a specific device for import.

If you run just this as root it should list all pools that are available for import:

# zpool import
 
Running import lists only 1 pool.
Code:
 pool: email
    id: 10433152746165646153
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        email       ONLINE
          ada1s1g   ONLINE

I'm not sure anymore what pool this is...
 
Would you suggest I run [cmd=]zpool import -f 10433152746165646153[/cmd]
?

Nearly every command I've tried so far on this thing has resulted in a kernel panic. So far I've tried 8.3 32bit and 8.3 64bit and now 9.0 64 bit with very consistent panics.
 
Try this, it will import the pool forcibly and mount it under temporary mount point /altroot so you take a look what the pool contains (and to avoid a situation where the pool gets mounted over the system directories):

# zpool import -f -R /altroot 10433152746165646153
 
Code:
zpool import -f -R /mnt 10433152746165646153
cannot import 'email': pool already exists

Code:
zpool import -f -R /altroot 10433152746165646153
cannot import 'email': pool already exists

Wasn't sure if the altroot was an option or path...

Code:
zpool import -f -R /altroot 10433152746165646153 olddata

Did some googling... As soon as I ran this I got the all too familiar kernel panic.

I'll see if I can bring the image home with me.

Code:
dd if=/dev/ad16s1g > zfsimage.dat

I don't think I have time to go to bsdcon but if this problem persists I might have to go over there and see if any of the kernel experts can help.
 
Back
Top