FreeBSD refuses to boot with ZFS disks attached

I'm building a fileserver for someone. I installed FreeBSD on one disk and added 3 2TB disks and added them to a zpool. The thing is that the server refuses to boot al long as these disks are attached. There was no problem before I created the zpool, so the problem has to be with ZFS.

This is the error I get:
Code:
ahcich1: Error while READ LOG EXT
ahcich1: Timeout on slot 5 port 0
ahcich1: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 50 serr 00000000 cmd 0004c517
ahcich1: Error while READ LOG EXT
ahcich1: Timeout on slot 6 port 0
ahcich1: is 00000000 cs 00000040 ss 00000000 rs 00000040 tfd 50 serr 00000000 cmd 0004c617
ahcich1: Error while READ LOG EXT
ahcich1: Timeout on slot 7 port 0
ahcich1: is 00000000 cs 00000080 ss 00000000 rs 00000080 tfd 50 serr 00000000 cmd 0004c717
I have no idea why it won't work, let alone how to fix it. :(
 
Do you only get these errors appearing for ahcich1?
If so I would suggest removing that disk first (if you're lucky it'll be the second port on the controller/mb) and seeing it the system will boot without it. If your pool layout is redundant you should be able to pull the one disk, and replace if needed fairly easily.

I don't think this is a ZFS problem. I just think it's trying to read information from the disk during pool import that is should normally be able to read, but is failing. Possibly even dodgy SATA cables.

Also what version of FreeBSD is installed?
 
This is a brand new motherbord. I seriously hope the SATA controller is fine.

I also have a second SATA controller, in a PIC-e slot, with 2 SATA ports. The disk with FreeBSD is connected to that controller, on port 0 (since that is always ada0 and FreeBSD won't boot if it isn't on ada0, for some reason) On port 1 is one ZFS disk. If I disconnect that disk and connect it to a free SATA port on the motherboard, I don't get the ahcich1 errors. Instead the system will just hang during boot, and give no errors at all.

By the way. I left the server on, spitting ahcich1 errors. And after a while I was able to log in via SSH. Everything seems to work fine. But I obviously won't put this server to work in this condition.
 
Allright. I connected the disk with FreeBSD to port 0 on the PCI-e SATA controller and connected the 3 ZFS disks to SATA ports on the motherbord. During boot it kept hanging for a while at this message (right before it should mount ada0p2, where FreeBSD is located)

Code:
ums0: 5 buttons and [XYZ] coordinates ID=0

But after a while it booted anyway. After a reboot it booted smooth, without any errors or pauses.

I suppose it's not possible to spread the ZFS disks over multiple SATA controllers. :\

Edit: Ignore the above. I see I hadn't the ZFS disks attached at all. Apparently I need more coffee...

So, the current status is that FreeBSD will hang during boot, right before the point where it should mount ada0p2, that is the FreeBSD partition.
 
With all the ZFS disks attached to the motherboard, it will hang for a few minutes at
Code:
ums0: 5 buttons and [XYZ] coordinates ID=0
Then it will try to mount the FreeBSD partition for like half an hour and eventually boot. This happens every time the server is booted. I didn't have this problem when the 3 2TB disks were not added to a zpool. So, I seriously think this is a ZFS issue.

@usdmatt. I'm running an up-to-date version of FreeBSD 9.0
 
I think I found the issue. This thread got me on the right track.

The motherboard supported SATA hotplug, so I disconnected the ZFS disks and booted the system. Then I connected the ZFS disks and tried to import the zpool. This took several minutes, but not as long as it did during boot, earlier. Everything seemed to work fine, but I decided to do a check with smartmontools. This reavealed several errors on one of the 2TB disks. Just one day old! Coincidentally (is it? IS IT?), this also happened to be the disk that was attached to port 1, of the PCI-e SATA controller, spitting out a lot of ahcich1 errors.

I'm sending the drive back. Hopefully everything will work smooth, once it is replaced with a new disk.

Update: With the fauly disk removed, the system boots fine and mount the zpool without any problems. Well, the zpool is obviously degraded. But I'm glad I finally figured out the source of the problem :)
 
It was a WD Caviar Green WD20EARX.

I know, I know. Not really the first choice for a (file)server. But this one is intended for home use. Not enterprise/heavy use. So I decided to go for the green disks. And in case one crashes, well, that's why I use ZFS ;)
 
I had the same problem when one of my Samsung drives started dying, disconnecting it caused the pool to boot up in degraded mode without any problems.
So maintaining uptime upon disk failure can be harder than you think :P
 
I got a new drive today. I was able to resilver the degraded zpool and everything works like a charm. Even when it's connected to the PCI-e SATA controller. :beergrin
 
Back
Top