Disk errors on FreeBSD 11/amd box

Hi, I am on 11.0-CURRENT FreeBSD 11.0-CURRENT #3 r296473M: Mon Mar 7 16:53:46 PST 2016 on this amd box.
Code:
hw.machine: amd64
hw.model: AMD FX(tm)-8350 Eight-Core Processor
hw.ncpu: 8
hw.byteorder: 1234
hw.physmem: 16870957056
hw.usermem: 8616914944

Code:
# camcontrol devlist -vvv
scbus0 on ahcich0 bus 0:
<WDC WD4000F9YZ-09N20L0 01.01A01>  at scbus0 target 0 lun 0 (ada0,pass0)
<>  at scbus0 target -1 lun ffffffff ()
scbus1 on ahcich1 bus 0:
<WDC WD4000F9YZ-09N20L0 01.01A01>  at scbus1 target 0 lun 0 (ada1,pass1)
<>  at scbus1 target -1 lun ffffffff ()

and I've been long ignoring these errors:

I see this every 30mins or so in dmesg:
Code:
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000400 tfd 50 serr 00000000 cmd 00046a17
ahcich0: Timeout on slot 23 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00800000 tfd 50 serr 00000000 cmd 00047717
ahcich0: Timeout on slot 4 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000010 tfd 50 serr 00000000 cmd 00046417
ahcich0: Timeout on slot 9 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000200 tfd 50 serr 00000000 cmd 00046917
ahcich0: Timeout on slot 28 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 10000000 tfd 50 serr 00000000 cmd 00047c17
ahcich0: Timeout on slot 1 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000002 tfd 50 serr 00000000 cmd 00046117

But what made me notice things is following errors:
Code:
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 3 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000008 tfd 50 serr 00000000 cmd 00046317
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
ahcich0: Timeout on slot 4 port 0
ahcich0: is 00000002 cs 00000000 ss 00000000 rs 00000010 tfd 50 serr 00000000 cmd 00046417
(aprobe0:ahcich0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich0:0:0:0): CAM status: Command timeout
(aprobe0:ahcich0:0:0:0): Error 5, Retry was blocked
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD4000F9YZ-09N20L0 01.01A01> s/n WD-WCC4A0011551 detached
(ada0:ahcich0:0:0:0): Periph destroyed
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD4000F9YZ-09N20L0 01.01A01> ATA8-ACS SATA 3.x device
ada0: Serial Number WD-WCC4A0011551
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors)

And my zpool status:
Code:
# zpool status
  pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
  still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
  the pool may no longer be accessible by software that does not support
  the features. See zpool-features(7) for details.
  scan: resilvered 12K in 0h0m with 0 errors on Fri Apr 22 04:48:54 2016
config:

  NAME  STATE  READ WRITE CKSUM
  zroot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  diskid/DISK-WD-WCC4A0011551p3  ONLINE  0  0  0
  diskid/DISK-WD-WCC4A0012485p3  ONLINE  0  0  0

Now, it seems like a disk popped out and got detected again and zfs resilvered it.

I see both types of errors on ahcich0 and ahcich1 both.

Attaching smartctl -a o/p for both the disks: ada0_smartctl.txt and ada1_smartctl.txt and I think something is wrong with them as smartctl(8) can't get the data because of some i/o error.

Do I have failed disks? OR bad firmware?

I'd appreciate any help. I can provide more info/data. Thanks in advance.

(BTW, I have taken necessary backups. :))
 

Attachments

  • ada0_smartctl.txt
    4.9 KB · Views: 218
  • ada1_smartctl.txt
    4.8 KB · Views: 248
Wait what? I am not asking for "support". I am asking for hints/insights if anyone has any.

Well, I can always take this to a mailing list. Let me know.
 
Wait what? I am not asking for "support". I am asking for hints/insights if anyone has any.

Well, I can always take this to a mailing list. Let me know.

Note that discussion concerning the use of FreeBSD-CURRENT takes place there.
 
We are not exactly providing support here since no one is paid to do so but do note that FreeBSD CURRENT is the playground of the developers and can break in many ways unexpectedly. The number of variables involved when trying to offer help for troubleshooting can be tenfold compared to the release or stable branches and the causes of the problems are often outside the skill and knowledge level of the people who frequently post here.
 
Back
Top