Other updating BSD from 10 to 11 AHCI ssd issue.

TheFeaR

New Member

Reaction score: 1
Messages: 17

[UPDATE 04/08/2019]
kern.cam.ada.%d.quirks is not sysctl variable
you should add it in /boot/loader.conf
i confirm it is working good.



Hello! Let me describe my problem and how I fix this. I hope Google will cache it and it may help someone.
Yesterday I decided to update my HP Microserver from FReeBSD 10 to 11 and faced some strange issues. After new kernel was installed I noticed dramatical system freezes. In dmesg(1) I found the following messages:
Code:
ahcich2: is 00000000 cs 00000000 ss 00000800 rs 00000800 tfd 40 serr 00000000 cmd 000ceb17
(ada1:ahcich2:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Command timeout
(ada1:ahcich2:0:0:0): Retrying command
ahcich2: Timeout on slot 17 port 0
ahcich2: is 00000000 cs 00000000 ss 00020000 rs 00020000 tfd 40 serr 00000000 cmd 000cf117
(ada1:ahcich2:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Command timeout
(ada1:ahcich2:0:0:0): Retrying command
ahcich2: Timeout on slot 23 port 0
ahcich2: is 00000000 cs 00000000 ss 00800000 rs 00800000 tfd 40 serr 00000000 cmd 000cf717
(ada1:ahcich2:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Command timeout
(ada1:ahcich2:0:0:0): Retrying command
ahcich2: Timeout on slot 29 port 0
ahcich2: is 00000000 cs 00000000 ss 20000000 rs 20000000 tfd 40 serr 00000000 cmd 000cfd17
(ada1:ahcich2:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Command timeout
(ada1:ahcich2:0:0:0): Retrying command
ahcich2: Timeout on slot 3 port 0
ahcich2: is 00000000 cs 00000000 ss 00000008 rs 00000008 tfd 40 serr 00000000 cmd 000ce317
(ada1:ahcich2:0:0:0): SEND_FPDMA_QUEUED DATA SET MANAGEMENT. ACB: 64 01 00 00 00 40 00 00 00 00 00 00
(ada1:ahcich2:0:0:0): CAM status: Command timeout
(ada1:ahcich2:0:0:0): Error 5, Retries exhausted
The system can't write anything to SSD. But ZFS pool didn't degrade because device is still present in system. I believe it was better if this zpool "lost" device but it didn't.


I googled a lot and every thread I found was "probably your SSD is dead".

My SSD is Netac N530S. I thought that the SSD may really be almost dead and swapped it to 100% working Samsung SSD and got absolutely the same problem.

Googling deeply I found that FPDMA is NCQ ant and tried to disable it through camcontrol(8), but id it doesn't help. NCQ disabled but receive and send FPDMA which causing this problem is not.

Code:
pass5: <Netac SSD 120GB V2.5> ACS-3 ATA SATA 3.x device
pass5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes)

protocol              ATA/ATAPI-10 SATA 3.x
device model          Netac SSD 120GB
firmware revision     V2.5
serial number         G14273J029238
WWN                   502b2a201d1c1b1a
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         234441648 sectors
LBA48 supported       234441648 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             non-rotating

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes    yes
write cache                    yes    yes
flush cache                    yes    yes
overlap                        no
Tagged Command Queuing (TCQ)   no    no
Native Command Queuing (NCQ)   yes        32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    yes
SMART                          yes    yes
microcode download             yes    yes
security                       no    no
power management               yes    yes
advanced power management      yes    no    254/0xFE
automatic acoustic management  no    no
media status notification      no    no
power-up in Standby            no    no
write-read-verify              no    no
unload                         no    no
general purpose logging        yes    yes
free-fall                      no    no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks       yes              8
DSM - deterministic read       yes              zeroed
Host Protected Area (HPA)      no
Finally I found this release notes https://www.freebsd.org/releases/11.0R/relnotes.html
with following line
Code:
The ahci(4) driver has been updated to add NCQ TRIM support for drives that support it. [r298002] (Sponsored by Netflix)

Note:
Drives that advertise this feature but do not properly support it have been blacklisted. Systems experiencing traffic problems with NCQ TRIM enabled can set the kern.cam.ada.%d.quirks tunable to 2 for 512k sectors or 3 for 4096k sectors, replacing %d with the drive number.
I've tried to set kern.cam.ada.1.quirks but there is no such variable in my sysctl.


Finally I downloaded sources for FreeBSD 10 and replaced ahci(4) driver sources in my FreeBSD 11 and recompiled/installed kernel. Now all works as expected.


Is there any other way to disable/blacklist NCQ TRIM for specified device?
 
Last edited:

dR3b

Member

Reaction score: 7
Messages: 35

Try one of these:
Code:
vfs.unmapped_buf_allowed=0
or
Code:
hint.ahci.X.msi=0
X is your Device!
 
Top