server won't reboot or shutdown

Hi. I've installed FreeBSD 8.1 on a server (incidentally it is a zfsroot filesystem). When I send either a reboot or shutdown command it stops services and prepares to shutdown, prints out the uptime and then nothing. It remains on, but I can't ping it, can't enter any commands, and have to hit the power button to make it reboot or power off. I called customer support for the hardware, they had not heard of this problem before. Another co-worker set up a similar server with FreeBSD 8.1 on zfs and he's having the same issue. Has anyone ever seen this before? Is this a product of zfs or is there possibly something else going on here?
 
The machines were both ordered from Silicon Mechanics. The phone support had said that he'd never heard of anything like this but that he'd check with a guy there who was more familiar with FreeBSD.

Anyway, the specs on two different machines experiencing the same inability to reboot or shutdown:


Code:
    Rackform iServ R143

    Details:
    Motherboard:   SuperMicro X8SIE-F
    CPU:  Intel Xeon X3430 Quad-Core 2.40GHz, 8MB Cache, 95W, 45nm
    RAM:  4GB (2 x 2GB) Operating at 1333MHz Max (DDR3-1333 ECC Registered 2R DIMMs)
    NIC:  Dual Gigabit Ethernet NICs (Intel 82574L) - Integrated
    Management:  Integrated IPMI 2.0 & KVM with Dedicated LAN
    PCIe x16 2.0:  No Item Selected
    Hot-Swap Drive - 1:  250GB Western Digital RE3 (3Gb/s, 7.2K RPM, 16MB Cache) 3.5" SATA
    Hot-Swap Drive - 2:  250GB Western Digital RE3 (3Gb/s, 7.2K RPM, 16MB Cache) 3.5" SATA
    Optical Drive:  Low-Profile DVD-ROM Drive
    Power Supply:  350W Power Supply with PFC - 80 PLUS Gold Certified
    RAID:  3Ware 9750-4i 6Gb/s SAS/SATA RAID (4-Port Int) 512MB Cache





Code:
    Storform iServ R513.v2.1

    Details:
    Motherboard:   (I'm not sure exactly what model, but it's one of the following) SuperMicro X8DT3 / X8DTi / X8DT3-F / X8DTi-F / X8DT3-LN4F / X8DTi-LN4F
    CPU:  2 x Intel Xeon E5620 Quad-Core 2.40GHz, 12MB Cache, 5.86GT/s QPI, 80W, 32nm
    RAM:  6GB (6 x 1GB) Operating at 1333MHz Max (DDR3-1333 ECC Unbuffered DIMMs)
    NIC:  Dual Intel 82574L Gigabit Ethernet Controller - Integrated
    Management:  Integrated IPMI 2.0 with KVM over LAN
    Ext. SAS Connector:  External SAS / SATA Connector for JBOD Expansion (SFF-8088) - Integrated
    Hot-Swap HDD:  12 x 1TB Seagate Constellation ES (6Gb/s, 7.2K RPM, 16MB Cache) 3.5" SAS
    System Volume:  60GB Boot Volume (Carved from RAID Array)
    LP PCIe 2.0 x8 - 1:  3ware 9750-4i, 6Gb/s SAS/SATA RAID (4-Port Int) 512MB Cache & BBU
    Power Supply:  920W High-Efficiency (94+%) Power Supply with PMBus - 80 PLUS Platinum Certified
    Configured Power:  446 W, 457 VA, 1521 BTU/h, 4.2 Amps (110V), 2.2 Amps (208V)

If you need more information from me, just ask.
Thanks.
 
I've done some more troubleshooting, and I've found something interesting:

The server actually does reboot, it just waits 60 minutes to do so. Just before it does reboot, it spits out the following:

Code:
(da0:tws0:0:0:0): Synchronize cache failed, status == 0xb, scsi status == 0x0
Rebooting...
cpu_reset: Stopping other CPUs

Is this a problem with the 3ware driver? I'm not very well-versed in troubleshooting these sorts of errors in FreeBSD. Any help would be greatly appreciated.
 
I contacted LSI about this problem and I was told that this is a known problem with the version of zfs on FreeBSD and they asked me to look into updating zfs. I'm running the current stable release so there's no more recent version available. Can anyone confirm whether zfs is on the radar for being updated anytime soon?
 
Did they mention which version of ZFS they are talking about? ZFSv6 (FBSD 7.0), ZFSv13 (FBSD 7.1, 7.2), ZFSv14 (FBSD 7.3, 8.0, 8.1), ZFSv15 (FBSD 8-STABLE aka 8.2) are all currently available on FreeBSD. And there are experimental patches that enable ZFSv28 on FreeBSD 9-CURRENT.

Which controller are you using? Does it have the latest firmware installed?

How are the drives configured (JBOD, Single, etc)? How is the controller configured (cache enabled, bios enabled, etc)?

How is the ZFS pool configured?
# zpool iostat -v
# zpool status

32-bit install or 64-bit install of FreeBSD?

If you boot off a LiveCD like Frenzy without ZFS enabled, can you reboot and shutdown properly?

If you boot off a LiveCD and import the pool, can you shutdown/reboot?
# /etc/rc.d/hostid start
# zpool import <poolname>

Just want to narrow down where the issue is (FreeBSD, ZFS, RAID controller, etc).
 
I have few ZFS-only servers with similar spec as the first one (same motherboard, processor, more RAM) and they never ever experienced something similar. The HBA however is LSI1068E. Different systems use SAS and/or SATA drives.

This looks like 3ware, perhaps driver/firmware problem. Does disabling the controller cache resolve the issue?
 
Firstly, thanks for your reply, phoenix.

phoenix said:
Did they mention which version of ZFS they are talking about? ZFSv6 (FBSD 7.0), ZFSv13 (FBSD 7.1, 7.2), ZFSv14 (FBSD 7.3, 8.0, 8.1), ZFSv15 (FBSD 8-STABLE aka 8.2) are all currently available on FreeBSD. And there are experimental patches that enable ZFSv28 on FreeBSD 9-CURRENT.
They were not specific, but I'll see if they can give me an answer about the minimum version they know to be functional.

Which controller are you using? Does it have the latest firmware installed?
As I mentioned before, both machines are using LSI 3ware 9750-4i controllers. Yes, they are using the current version of the firmware: FH9X 5.12.00.007.

How are the drives configured (JBOD, Single, etc)? How is the controller configured (cache enabled, bios enabled, etc)?
The two machines had different drive configurations: one has always been a single, the other has gone through various changes varying from single to raid (which will be described below). The controller on one is basically stock. I've tried a number of configuration options and nothing has made any changes, so I ended up changing it back to the defaults. The other machine has had no changes made to its controller's configuration.

How is the ZFS pool configured?
Here's the results on one machine. I can get more info from the other if you need it, but I'm not sure what state it is currently in.
# zpool iostat -v
Code:
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zroot       2.28G   222G      0      1  2.63K  14.6K
  da0s1a    2.28G   222G      0      1  2.63K  14.6K
----------  -----  -----  -----  -----  -----  -----
# zpool status
Code:
  pool: zroot
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          da0s1a    ONLINE       0     0     0

errors: No known data errors

32-bit install or 64-bit install of FreeBSD?
64-bit on both machines


If you boot off a LiveCD like Frenzy without ZFS enabled, can you reboot and shutdown properly?

If you boot off a LiveCD and import the pool, can you shutdown/reboot?
# /etc/rc.d/hostid start
# zpool import <poolname>
One machine is in production right now, so taking it out just to boot off of a live cd might cause problems. And while I don't personally have physical access to the other machine right now, I can shed some more light on what's going on.

My coworker reinstalled FreeBSD a number of times just to troubleshoot the situation. There's no problem with a "plain vanilla install" of FreeBSD with tws.ko and opensolaris.ko loaded as long as there are no zpools defined. As soon as a zpool is defined, the machine hangs at reboot.

Again, thanks for your help, phoenix. I felt like my tone throughout may have sounded a little curt, but I really do appreciate you lending your time to help me figure this out, and I like that your questions have a "just the facts, buddy" underpinning.
 
danbi said:
I have few ZFS-only servers with similar spec as the first one (same motherboard, processor, more RAM) and they never ever experienced something similar. The HBA however is LSI1068E. Different systems use SAS and/or SATA drives.

This looks like 3ware, perhaps driver/firmware problem. Does disabling the controller cache resolve the issue?

As I replied to phoenix, I tried a number of configuration options regarding read and write caches in one of the controllers, and it made no difference.

Regarding the caches, I stumbled upon this blog post: Back in the sandbox…ZFS flushing shenanigans revisted. - Jason’s .plan. He says that disabling the zfs cache via zfs_nocacheflush=1 resolved an issue for him. I have two questions about this:
  • Will this have any effect on my problem?
  • His directions are for solaris. I'm still a little new to FreeBSD, so would it be set in /etc/sysctl.conf as: vfs.zfs.cache_flush_disable=1 ?
 
Set it in /boot/loader.conf so that it takes effect before the kernel is loaded.

Double-check the outout of # sysctl vfs.zfs | grep cache to get the sysctl, then copy/paste it into loader.conf with =1 at the end.
 
Hi,

I'm basically having the same problem on two different servers running 8.1-release, one amd64, one i386. Before loading zfs module, everything was working fine. Now I am unable to reboot, it just stops on "Uptime" message, like it is for original poster of this thread.

I have upgraded both servers to RELENG_8, but this didn't help either.

I suspect there is some problem with unloading of the zfs module from kernel at the end, but don't know how to fix it. I have already tried patching the zfs rc skcript, setting the cache, disabling the acpi, etc.. none helped. So it seems there is a code change needed here :(

Or is there anything else to try?

I can provide any information, don't hesitate to ask.

Thanks in advance,
Gabriel
 
I just wanted to add a "Me Too" to this thread.

I am running FreeBSD 8.2-PRERELEASE from December 21st and as soon as I create a ZFS mount, the machine will hang forever after shutting down filesystems and showing uptime.

Tried,
Code:
vfs.zfs.cache_flush_disable=1
in /boot/loader.conf as well as disabling the caching on my 3ware 9750-8i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 card.

Is there a way to force creation of a ZFS slice with an older rev of ZFS? I know 8.X has the ability to work with older versions. Since I assume those older versions work.


Thanks,
Nicole
 
Starting to sound like an issue with the combination of tws(4) and zfs().

We currently use nothing but 3Ware controllers with our ZFS boxes, using the twa(4) driver. No problems with rebooting on those systems, whether it's FreeBSD 7.0-7.3 or FreeBSD 8.0-8.2 RC3.

Maybe it's something new in the 9700-series hardware? And/or the tws(4) driver?

If possible, can you move the drives to the motherboard SATA ports and create a pool on there? If that allows you to reboot without issues, then it's almost certainly something in the combination of ZFS+tws.

You could try downloading a snapshot ISO of FreeBSD 9.0 (aka -CURRENT) and doing a test install, create a pool, and see how things work.

Or, use a snapshot of 8-STABLE with ZFSv28 and see if the problem persists.
 
unixgirl said:
I just wanted to add a Me Too to this thread.
I am running FreeBSD 8.2-PRERELEASE from December 21st and as soon as I create a ZFS mount, the machine will hang forever after shutting down filesystems and showing uptime.

Tried, vfs.zfs.cache_flush_disable=1 in /boot/loader.conf as well as disabling the caching on my 3ware 9750-8i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006 card.

Is there a way to force creation of a ZFS slice with an older rev of ZFS? I know 8.X has the ability to work with older versions. Since I assume those older versions work.


Thanks
Nicole

Hi, Nicole. From what I understand from 3ware's support, it's the newer versions of zfs where this problem is resolved. I'm not saying it shouldn't be tested, but I wouldn't want you to put the time into downgrading it with a good chance of it not working any better... especially when upgrading it has a good chance of working for a similar amount of time/effort.

I had tried to contact 3ware again to find out exactly what version is needed to make it work properly, but I never received a response, and I no longer work at that organization with that server. As far as I was concerned, I was willing to wait for FreeBSD to include a newer version of zfs, because, theoretically, this server wasn't going to be restarted too often, and if it was, I'd be around to take care of it anyway. Having an understanding of the cause was enough to keep me happy that it wasn't going to have destructive consequences.

So, if you do upgrade zfs, can you please post whether or not it works and what version of zfs it is that works?
 
Yes, it is definitely the 3ware tws driver.
1) Making a ZFS mount on a non tws disk does not prevent a reboot.

2) After making a gjournal on the tws based disk, the devices vanish after a reboot. However if I go into sysinstall and try to reallocate the slices (data/journal) it fails. But after exiting sysinstall the devices are now visible and mountable.

I will contact 3ware however if anyone has any better connections to them please also contact them if you can please. I guess I can see why I had to add the driver into the system myself and it is the only 3ware driver that does not come built in by default.


Thanks!

Nicole
 
Hi Forum,

I have a problem with the 3ware 9750-8i, too.

I'm running a fresh installed FreeBSD 8.2.

The mainboards controller has to disks: ad8 and ad10
BSD is installed on ad8.

The 3ware 9750 controller has 8 SATA disks.

Short question: how can I access the 8 disks on 3ware 9750?

dmesg shows me that:
Code:
dmesg | grep tws
tws0: <LSI 3ware SAS/SATA Storage Controller> port 0xe800-0xe8ff mem 0xfbffc000-0xfbffffff,0xfbf80000-0xfbfbffff irq 17 at device 0.0 on pci8
tws0: [ITHREAD]
tws0: Using legacy INTx
tws0: Controller details: Model 9750-8i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006
tws0: <LSI 3ware SAS/SATA Storage Controller> port 0xe800-0xe8ff mem 0xfbffc000-0xfbffffff,0xfbf80000-0xfbfbffff irq 17 at device 0.0 on pci8
tws0: [ITHREAD]
tws0: Using legacy INTx
tws0: Controller details: Model 9750-8i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=4
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=5
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=6
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=0
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=3
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=2
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=1
tws0: INFO: (0x04: 0x001A): Drive inserted: phy=7
tws0: <LSI 3ware SAS/SATA Storage Controller> port 0xe800-0xe8ff mem 0xfbffc000-0xfbffffff,0xfbf80000-0xfbfbffff irq 17 at device 0.0 on pci8
tws0: [ITHREAD]
tws0: Using legacy INTx
tws0: Controller details: Model 9750-8i, 8 Phys, Firmware FH9X 5.12.00.007, BIOS BE9X 5.11.00.006


sysinstall fdisk shows me only ad8 and ad10.

I'd like to use ZFS on the 9750 disks. The disks should be used as JBOD. I haven't create an Raid array in the controllers BIOS and haven't created any units. 3DM2 shows me that the status of all disks is ok.

Any idea?
 
New driver seems to solve the ZFS issue

As an update.
I finally got an updated driver from 3ware / LSI to test and that seems to have solved the issue with ZFS. Not sure when it will be available for download on their site.
The driver I got was driver-10.80.00.003.tgz

If it is not available you should be able to get it via their technical support. (Which I have always found to be the superb)


Nicole
 
Back
Top