Hi
I found 2 of my FreeBSD ZFS servers were rebooted because one of the hardisk in RAID1 was failed. In order to reproduce the problem, I have setup the same server at office to simulate the problem and found that failing the HDD will cause the FreeBSD kernel panic and eventually automatically reboot the server.
Both servers are supermicro 1U with 4 disk bays, however the hardware are completely different, (Xeon 1230v2 and Xeon 5405 , 8GB Ram , 4X 2 TB SATA HDD and AHCI with hotswap enabled).
server is running ZFS RAID10 (I'm replacing the HDD)
Below is the message I have extracted from /var/log/message before it crashed.
I found 2 of my FreeBSD ZFS servers were rebooted because one of the hardisk in RAID1 was failed. In order to reproduce the problem, I have setup the same server at office to simulate the problem and found that failing the HDD will cause the FreeBSD kernel panic and eventually automatically reboot the server.
Both servers are supermicro 1U with 4 disk bays, however the hardware are completely different, (Xeon 1230v2 and Xeon 5405 , 8GB Ram , 4X 2 TB SATA HDD and AHCI with hotswap enabled).
server is running ZFS RAID10 (I'm replacing the HDD)
Code:
vol DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
gpt/data-disk0 ONLINE 0 0 0
gpt/data-disk1 ONLINE 0 0 0
mirror-1 DEGRADED 0 0 0
replacing-0 UNAVAIL 0 0 0
14892520269246031058 UNAVAIL 0 0 0 was /dev/gpt/data-disk2
gpt/data_disk2 ONLINE 0 0 0 (resilvering)
gpt/data-disk3 ONLINE 0 0 0
logs
gpt/slog-disk0 ONLINE 0 0 0
cache
gpt/l2arc-disk0 ONLINE 0 0 0
errors: No known data errors
pool: zroot
state: ONLINE
scan: resilvered 2.01G in 0h4m with 0 errors on Wed Oct 16 10:49:53 2013
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gpt/os-disk0 ONLINE 0 0 0
gpt/os-disk1 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gpt/os_disk2 ONLINE 0 0 0
gpt/os-disk3 ONLINE 0 0 0
Below is the message I have extracted from /var/log/message before it crashed.
Code:
Oct 13 00:20:10 storage kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Oct 13 00:21:58 storage kernel: ahcich2: Timeout on slot 27 port 0
Oct 13 00:21:58 storage kernel: ahcich2: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd 80 serr 00000000 cmd 00004017
Oct 13 00:21:58 storage kernel: (aprobe1:ahcich2:0:15:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
Oct 13 00:21:58 storage kernel: (aprobe1:ahcich2:0:15:0): CAM status: Unconditionally Re-queue Request
Oct 13 00:21:58 storage kernel: (aprobe1:ahcich2:0:15:0): Error 5, Retry was blocked
Oct 13 00:21:58 storage kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Oct 13 00:21:58 storage kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
Oct 13 00:21:58 storage kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
Oct 13 00:21:58 storage kernel: ahcich2: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Oct 13 00:21:58 storage kernel: ahcich2: Poll timeout on slot 27 port 15
Oct 13 00:21:58 storage kernel: ahcich2: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd 80 serr 00000000 cmd 0000c017
Oct 13 00:21:58 storage kernel: (aprobe1:ahcich2:0:15:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
Oct 13 00:21:58 storage kernel: (aprobe1:ahcich2:0:15:0): CAM status: Command timeout
Oct 13 00:21:58 storage kernel: (aprobe1:ahcich2:0:15:0): Error 5, Retries exhausted
Oct 13 00:21:58 storage kernel: ahcich2: Timeout on slot 27 port 0
Oct 13 00:21:58 storage kernel: ahcich2: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd 80 serr 00000000 cmd 0000c017
Oct 13 00:21:58 storage kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Oct 13 00:21:58 storage kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
Oct 13 00:21:58 storage kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
Oct 13 00:21:58 storage kernel: (pass3:(ada2:ahcich2:0:ahcich2:0:0:0:0): passdevgonecb: devfs entry is gone
Oct 13 00:21:58 storage kernel: 0): lost device