Howdy,
This is my first official "big question" on the forums. I've had little luck with bizarre problems like this on the mailing lists, likely due to not always knowing which list is best (ie: panic in softdep, is that -stable, -fs, or -scsi if I'm not sure it's a driver of FS issue, etc.).
Anyhow, we've got a number of older Supermicro boxes that are dated, but usable. But all of them have a few quirks with anything newer than 4.11 that make them not quite production-grade. I'm trying to get a Supermicro P4DPR-6GM+ board (dual 1.8 Xeons, 2GB RAM, 2x137GB U320 SCSI drives) to play nice with 8.0. It's had issues with 6.x, 7.x and now 8.0. The BIOS is at the latest revision. In everything newer than 4.11 it has issues with panicing as soon as the APIC stuff is enumerated on warm boots, serial ports disappearing after boot, or hanging on a reboot after syncing the disks. It gets somewhat more reliable when Hyperthreading is disabled in the BIOS.
I've been able to get a gmirror setup running and recently wiped it and got a working "Root on ZFS" install going. The box was up for a few months and cruised through a ton of reboots before going into production as a testbed/devel box. While doing some benchmarking, I noticed this oddity:
It's claiming one processor with HTT enabled, when in reality there are TWO packages, and NO HTT threads. I went to the colo tonight to verify that in fact there were two cpus and that HTT was disabled in the BIOS, and sure enough there were two and the BIOS saw two and also noted on POST that HTT was disabled.
I enabled HTT in the BIOS, and the box would simply panic like so:
I decided to give up on HTT and put things back the way they were. Since then, I'm getting a hang when trying to mount root. It's not a hard lockup - if I give it the 3 finger salute, it will reboot after about a minute:
That's from a verbose boot. I can't tell if it's really timing out on the root mount or if it's waiting on something else. The last line before the "hang" is the kernel trying to start init. I don't know if that means that it has mounted root or not. The "ct_to_ts" line that shows up only in a verbose boot is interesting as well.
I think ZFS is a total red herring on this - we had similar weirdness on 6.x and 7.x using UFS2.
I'm kind of thinking of blaming USB, but I'll be damned if I can figure out how to forcibly disable uhci and any other USB-related stuff at the loader prompt. In general, I think that SM has a pretty screwy BIOS that's confusing things. Oh, and USB can't be turned off in the BIOS - the only USB-related option is to enable/disable legacy mode (which I've tried).
This machine also netboots well, so I can muck around more next time I'm at the colo.
Anyone want to take a stab at this?
Full verbose boot is here:
http://paste.pocoo.org/show/222120/
This is my first official "big question" on the forums. I've had little luck with bizarre problems like this on the mailing lists, likely due to not always knowing which list is best (ie: panic in softdep, is that -stable, -fs, or -scsi if I'm not sure it's a driver of FS issue, etc.).
Anyhow, we've got a number of older Supermicro boxes that are dated, but usable. But all of them have a few quirks with anything newer than 4.11 that make them not quite production-grade. I'm trying to get a Supermicro P4DPR-6GM+ board (dual 1.8 Xeons, 2GB RAM, 2x137GB U320 SCSI drives) to play nice with 8.0. It's had issues with 6.x, 7.x and now 8.0. The BIOS is at the latest revision. In everything newer than 4.11 it has issues with panicing as soon as the APIC stuff is enumerated on warm boots, serial ports disappearing after boot, or hanging on a reboot after syncing the disks. It gets somewhat more reliable when Hyperthreading is disabled in the BIOS.
I've been able to get a gmirror setup running and recently wiped it and got a working "Root on ZFS" install going. The box was up for a few months and cruised through a ton of reboots before going into production as a testbed/devel box. While doing some benchmarking, I noticed this oddity:
Code:
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 1 core(s) x 2 HTT threads
It's claiming one processor with HTT enabled, when in reality there are TWO packages, and NO HTT threads. I went to the colo tonight to verify that in fact there were two cpus and that HTT was disabled in the BIOS, and sure enough there were two and the BIOS saw two and also noted on POST that HTT was disabled.
I enabled HTT in the BIOS, and the box would simply panic like so:
Code:
real memory = 2146959360 (2047 MB)
avail memory = 2087059456 (1990 MB)
MPTable: < Kings Canyon>
AP #1 (PHY# 6) failed!
panic y/n? [y]
I decided to give up on HTT and put things back the way they were. Since then, I'm getting a hang when trying to mount root. It's not a hard lockup - if I give it the 3 finger salute, it will reboot after about a minute:
Code:
GEOM_MIRROR: Device mirror/swap launched (2/2).
Trying to mount root from zfs:zroot
ct_to_ts([2010-06-05 01:50:36]) = 1275702636.000000000
start_init: trying /sbin/init
(I ctrl-alt-delete here after 5-10 minutes, 1-2 minutes later the output resumes)
GEOM_MIRROR: Device swap: provider mirror/swap destroyed.
GEOM_MIRROR: Device swap destroyed.
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 0 done
All buffers synced.
Uptime: 2m7s
fatxkbdc0: kwake_per c digsableod wak
e for \_SB_.PCI0.LPC0.KBC0 (S5)
unknown: wake_prep disabled wake for \_SB_.PCI0.LPC0.MSE0 (S5)
uhci0: wake_prep disabled wake for \_SB_.PCI0.USB1 (S5)
uhci1: wake_prep disabled wake for \_SB_.PCI0.USB2 (S5)
uhci2: wake_prep disabled wake for \_SB_.PCI0.USB3 (S5)
Rebooting...
cpu_reset: Stopping other CPUs
That's from a verbose boot. I can't tell if it's really timing out on the root mount or if it's waiting on something else. The last line before the "hang" is the kernel trying to start init. I don't know if that means that it has mounted root or not. The "ct_to_ts" line that shows up only in a verbose boot is interesting as well.
I think ZFS is a total red herring on this - we had similar weirdness on 6.x and 7.x using UFS2.
I'm kind of thinking of blaming USB, but I'll be damned if I can figure out how to forcibly disable uhci and any other USB-related stuff at the loader prompt. In general, I think that SM has a pretty screwy BIOS that's confusing things. Oh, and USB can't be turned off in the BIOS - the only USB-related option is to enable/disable legacy mode (which I've tried).
This machine also netboots well, so I can muck around more next time I'm at the colo.
Anyone want to take a stab at this?
Full verbose boot is here:
http://paste.pocoo.org/show/222120/