I started out with FreeBSD 8.2-RELEASE on a Supermicro X8DTi-LN4F motherboard (Intel 5520 chipset) for a backup server using ZFS attached to Intel SASUC8I in IT mode.
FreeBSD OS is on Crucial M4 SSD (zroot) with two Intel 80 Gb SSDSA2CW080G3 SSD for swap and ZFS L2ARC.
SSD are all connected to on-board SATA ports on the mobo.
About three weeks ago I started having problems with kernel panics due to timeouts to these SATA devices.
Kernel messages were not conclusive as far as which drive was the culprit.
I tried to eliminate various potential sources of the issue by methodically replacing items such as SATA cables, checking drives, checking memory, swapping mobo with identical mobo, etc.
I finally installed FreeBSD 9.0-RELEASE which has worked problem free for over a week until I received an ahcich timeout problem this morning.
I have the following in /boot/loader.conf:
I also setup NCQ to be disabled for my drives (ada0 through ada3).
I'm pretty sure that my issue is a bad SSD but I wanted feedback on what I'm seeing on my smartctl commands.
None of these drives show SMART events.
I ran the following commands for each drive (full output attached):
The only thing odd that I'm seeing (from what I know, patterns, etc) is that two of the devices show a relatively high number of ASR events and hardware resets relative to the other devices.
In terms of proportion I would assume that ada2 is a little wonky and ada3 is a problem.
I haven't been able to find anything that elaborates on ASR events and hardware resets as reported by smartmontools so I'm looking for feedback to tell me if I'm on the write track or not.
Any help, direction, etc would be much appreciated.
FreeBSD OS is on Crucial M4 SSD (zroot) with two Intel 80 Gb SSDSA2CW080G3 SSD for swap and ZFS L2ARC.
SSD are all connected to on-board SATA ports on the mobo.
About three weeks ago I started having problems with kernel panics due to timeouts to these SATA devices.
Kernel messages were not conclusive as far as which drive was the culprit.
I tried to eliminate various potential sources of the issue by methodically replacing items such as SATA cables, checking drives, checking memory, swapping mobo with identical mobo, etc.
I finally installed FreeBSD 9.0-RELEASE which has worked problem free for over a week until I received an ahcich timeout problem this morning.
I have the following in /boot/loader.conf:
Code:
ahci_load="YES"
# See ahci(4)
hint.ahcich.0.sata_rev=1
hint.ahcich.1.sata_rev=1
hint.ahcich.2.sata_rev=1
hint.ahcich.3.sata_rev=1
hint.ahcich.0.pm_level=1
hint.ahcich.1.pm_level=1
hint.ahcich.2.pm_level=1
hint.ahcich.3.pm_level=1
I also setup NCQ to be disabled for my drives (ada0 through ada3).
Code:
#!/bin/sh
CAMCONTROL=/sbin/camcontrol
$CAMCONTROL tags ada0 -N 1 > /dev/null
$CAMCONTROL tags ada1 -N 1 > /dev/null
$CAMCONTROL tags ada2 -N 1 > /dev/null
$CAMCONTROL tags ada3 -N 1 > /dev/null
exit 0
I'm pretty sure that my issue is a bad SSD but I wanted feedback on what I'm seeing on my smartctl commands.
None of these drives show SMART events.
I ran the following commands for each drive (full output attached):
Code:
smartctl -a /dev/blah
smartctl -l devstat /dev/blah
smartctl -l sataphy /dev/blah
smartctl -l ssd /dev/blah
The only thing odd that I'm seeing (from what I know, patterns, etc) is that two of the devices show a relatively high number of ASR events and hardware resets relative to the other devices.
Code:
ada0 - FreeBSD zroot disk
0 ASR events
0 hardware resets
ada1 - swap
0 ASR events
0 hardware resets
ada2 - FreeBSD zroot disk
43 ASR events
160 hardware resets
ada3 - L2ARC
180 hardware resets
25607 ASR events
In terms of proportion I would assume that ada2 is a little wonky and ada3 is a problem.
I haven't been able to find anything that elaborates on ASR events and hardware resets as reported by smartmontools so I'm looking for feedback to tell me if I'm on the write track or not.
Any help, direction, etc would be much appreciated.