Hi, I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #3 r296473M: Mon Mar 7 16:53:46 PST 2016 on this amd box.
and I've been long ignoring these errors:
I see this every 30mins or so in dmesg:
But what made me notice things is following errors:
And my zpool status:
Now, it seems like a disk popped out and got detected again and zfs resilvered it.
I see both types of errors on ahcich0 and ahcich1 both.
Attaching
Do I have failed disks? OR bad firmware?
I'd appreciate any help. I can provide more info/data. Thanks in advance.
(BTW, I have taken necessary backups. )
Code:
hw.machine: amd64
hw.model: AMD FX(tm)-8350 Eight-Core Processor
hw.ncpu: 8
hw.byteorder: 1234
hw.physmem: 16870957056
hw.usermem: 8616914944
Code:
# camcontrol devlist -vvv
scbus0 on ahcich0 bus 0:
<WDC WD4000F9YZ-09N20L0 01.01A01> at scbus0 target 0 lun 0 (ada0,pass0)
<> at scbus0 target -1 lun ffffffff ()
scbus1 on ahcich1 bus 0:
<WDC WD4000F9YZ-09N20L0 01.01A01> at scbus1 target 0 lun 0 (ada1,pass1)
<> at scbus1 target -1 lun ffffffff ()
and I've been long ignoring these errors:
I see this every 30mins or so in dmesg:
Code:
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000400 tfd 50 serr 00000000 cmd 00046a17
ahcich0: Timeout on slot 23 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00800000 tfd 50 serr 00000000 cmd 00047717
ahcich0: Timeout on slot 4 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000010 tfd 50 serr 00000000 cmd 00046417
ahcich0: Timeout on slot 9 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000200 tfd 50 serr 00000000 cmd 00046917
ahcich0: Timeout on slot 28 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 10000000 tfd 50 serr 00000000 cmd 00047c17
ahcich0: Timeout on slot 1 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000002 tfd 50 serr 00000000 cmd 00046117
But what made me notice things is following errors:
Code:
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 3 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000008 tfd 50 serr 00000000 cmd 00046317
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
ahcich0: Timeout on slot 4 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000010 tfd 50 serr 00000000 cmd 00046417
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retry was blocked
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD4000F9YZ-09N20L0 01.01A01> s/n WD-WCC4A0011551 detached
(ada0:ahcich0:0:0:0): Periph destroyed
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD4000F9YZ-09N20L0 01.01A01> ATA8-ACS SATA 3.x device
ada0: Serial Number WD-WCC4A0011551
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors)
And my zpool status:
Code:
# zpool status
pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 12K in 0h0m with 0 errors on Fri Apr 22 04:48:54 2016
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
diskid/DISK-WD-WCC4A0011551p3 ONLINE 0 0 0
diskid/DISK-WD-WCC4A0012485p3 ONLINE 0 0 0
Now, it seems like a disk popped out and got detected again and zfs resilvered it.
I see both types of errors on ahcich0 and ahcich1 both.
Attaching
smartctl -a
o/p for both the disks: ada0_smartctl.txt and ada1_smartctl.txt and I think something is wrong with them as smartctl(8) can't get the data because of some i/o error.Do I have failed disks? OR bad firmware?
I'd appreciate any help. I can provide more info/data. Thanks in advance.
(BTW, I have taken necessary backups. )