Other Kernel: ahcich: Timeout in slot

WCSN

Member

Reaction score: 17
Messages: 57

Hello!
After the transition to FreeBSD 10.1-RELEASE-p23.
Code:
> uname -a
FreeBSD wfid78-172 10.1-RELEASE-p23 FreeBSD 10.1-RELEASE-p23 #0: Thu May 14 13:35:13 UTC 2015
root@amd64-builder.pcbsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

> sudo dmidecode -t 2
# dmidecode 2.12
SMBIOS 2.4 present.

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
        Manufacturer: Gigabyte Technology Co., Ltd.
        Product Name: GA-880GA-UD3H
        Version: x.x

Jun  3 12:28:42 wfid78-172 kernel: CPU: AMD Phenom(tm) II X4 925 Processor (2812.51-MHz K8-class CPU)
Jun  3 12:28:42 wfid78-172 kernel: real memory  = 34359738368 (32768 MB)
Jun  3 12:28:42 wfid78-172 kernel: avail memory = 33271947264 (31730 MB)
There are such periodical messages:
Code:
May 30 11:43:29 wfid78-172 kernel: ahcich3: Timeout on slot 24 port 0
May 30 11:43:29 wfid78-172 kernel: ahcich3: is 00000008 cs 00000000 ss 00000000 rs 01000000 tfd 40 serr 00000000 cmd 00207817
May 30 11:43:29 wfid78-172 kernel: (ada2:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 03 bc f5 1e 40 00 00 00 00 00 00
May 30 11:43:29 wfid78-172 kernel: (ada2:ahcich3:0:0:0): CAM status: Command timeout
May 30 11:43:29 wfid78-172 kernel: (ada2:ahcich3:0:0:0): Retrying command
May 30 11:43:29 wfid78-172 kernel: ahcich1: Timeout on slot 15 port 0
May 30 11:43:29 wfid78-172 kernel: ahcich1: is 00000008 cs 00000000 ss 00000000 rs 00008000 tfd 40 serr 00000000 cmd 00206f17
May 30 11:43:29 wfid78-172 kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 02 bd f5 1e 40 00 00 00 00 00 00
May 30 11:43:29 wfid78-172 kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
May 30 11:43:29 wfid78-172 kernel: (ada1:ahcich1:0:0:0): Retrying command
The computer continues to operate, but the disk subsystem is disabled and everything freezes.
"Reset" and booting normally.
Then in one day or two days (the computer does not turn off) appears again:
Code:
Jun  3 12:23:32 wfid78-172 kernel: ahcich3: Timeout on slot 14 port 0
Jun  3 12:23:32 wfid78-172 kernel: ahcich3: is 00000008 cs 00000000 ss 00000000 rs 00006000 tfd 40 serr 00000000 cmd 00206e17
Jun  3 12:23:32 wfid78-172 kernel: (ada2:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 04 c9 25 1c 40 0f 00 00 00 00 00
Jun  3 12:23:32 wfid78-172 kernel: (ada2:ahcich3:0:0:0): CAM status: Command timeout
Jun  3 12:23:32 wfid78-172 kernel: (ada2:ahcich3:0:0:0): Retrying command
Jun  3 12:23:32 wfid78-172 kernel: ahcich1: Timeout on slot 13 port 0
Jun  3 12:23:32 wfid78-172 kernel: ahcich1: is 00000008 cs 00000000 ss 00000000 rs 00003000 tfd 40 serr 00000000 cmd 00206d17
Jun  3 12:23:32 wfid78-172 kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 04 c9 25 1c 40 0f 00 00 00 00 00
Jun  3 12:23:32 wfid78-172 kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
Jun  3 12:23:32 wfid78-172 kernel: (ada1:ahcich1:0:0:0): Retrying command
File system in ZFS raidz1. (ada0, ada1, ada2)
Code:
> zpool list -v
NAME         SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
zpool0      2,70T   194G  2,51T    12%         -     6%  1.04x  ONLINE  -
  raidz1    2,70T   194G  2,51T    12%         -
    ada0p2      -      -      -      -         -
    ada1p2      -      -      -      -         -
    ada2p2      -      -      -      -         -

> zpool status -v
  pool: zpool0
state: ONLINE
  scan: scrub repaired 0 in 1h10m with 0 errors on Wed Jun  3 02:27:02 2015
config:

        NAME        STATE     READ WRITE CKSUM
        zpool0      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0
            ada2p2  ONLINE       0     0     0

errors: No known data errors
The system is not highly loaded. No very intensive work is not done.
Powered by nominal parameters. It system not overclocked.
Typical status:
Code:
> top
last pid:  5205;  load averages:  0.54,  0.42,  0.41 up 0+00:24:38  12:52:14
181 processes: 1 running, 179 sleeping, 1 zombie
CPU:  0.9% user,  0.0% nice,  0.5% system,  0.1% interrupt, 98.5% idle
Mem: 2153M Active, 1097M Inact, 2262M Wired, 9748K Cache, 26G Free
ARC: 1476M Total, 642M MFU, 784M MRU, 226K Anon, 11M Header, 39M Other
Swap: 17G Total, 17G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
1514 root        1 -21  r31  1148M   204M select  1   0:49   2.78% Xorg
2206 wcsn        2  28    0   524M    98M select  1   0:07   2.29% kdeinit4
2287 wcsn       12  28    0  1244M   357M uwait   2   0:17   0.20% chrome
2300 wcsn       12  33    0  1067M   187M uwait   2   0:08   0.20% chrome
2293 wcsn       12  28    0  1072M   190M uwait   0   0:08   0.20% chrome
smartctl is not find on the critical problems and drive errors.

What to do? How to find the problem?
This is happening on a different system on the same AMD.
There, too, ZFS, but stripped zfs pool.
Maybe the problem is in the driver AHCI?
 
Last edited:

Terry_Kennedy

Aspiring Daemon

Reaction score: 289
Messages: 922

There are such periodical messages:
Code:
May 30 11:43:29 wfid78-172 kernel: ahcich3: Timeout on slot 24 port 0
...
May 30 11:43:29 wfid78-172 kernel: ahcich1: Timeout on slot 15 port 0
...
Please post the contents of /var/run/dmesg.boot so we can see what AHCI controller is used. Also, the "slot 24" and "slot 15" make me think there might be a port multiplier in there somewhere.
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

Ok. That's all that is on the disk controllers and disks:
Code:
> cat /var/run/dmesg.boot
...
ahci0: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,0xfb00-0xfb0f mem 0xfe02f000-0xfe02f3ff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
...
ahci1: <JMicron JMB363 AHCI SATA controller> mem 0xfdbfe000-0xfdbfffff irq 17 at device 0.0 on pci5
ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
ahci1: quirks=0x1<NOFORCE>
ahcich6: <AHCI channel> at channel 0 on ahci1
ahcich7: <AHCI channel> at channel 1 on ahci1
cd0 at ahcich7 bus 0 scbus7 target 0 lun 0
cd0: <Optiarc DVD RW AD-7280S 1.01> Removable CD-ROM SCSI-0 device
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
cd0: cd present [2091474 x 2048 byte records]
...
atapci0: <JMicron JMB363 UDMA133 controller> port 0xaf00-0xaf07,0xae00-0xae03,0xad00-0xad07,0xac00-0xac03,0xab00-0xab0f irq 17 at d$
ata2: <ATA channel> at channel 0 on atapci0
...
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST31000524AS JC45> ATA-8 SATA 3.x device
ada0: Serial Number 6VPBX6H5
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST31000524AS JC45> ATA-8 SATA 3.x device
ada1: Serial Number 6VPBANNF
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich3 bus 0 scbus3 target 0 lun 0
ada2: <ST31000524AS JC45> ATA-8 SATA 3.x device
ada2: Serial Number 6VPBWKNY
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad10
Connecting a ahcich1 - ada1 and ahсich3 - ada2

Since the post was held for 3 days. I disabled in BIOS on that controller eSATA (possibility this hot-plug SATA? Not exSATA connectors are on the back-side separately). Currently there are no messages with errors in the system log.
The system is not rebooted an not freeze.
Maybe it's still a bug in the driver? Maybe off SATA3.0 :( ?
I not off system. After weekend see...
 

tingo

Daemon

Reaction score: 391
Messages: 2,033

FWIW, those "slot XX" and "slot YY" messages refer to traffic slots (or virtual channels if you will) in the sata communication, not physical slots in the hardware. Yes, this is confusing. :)
 

Terry_Kennedy

Aspiring Daemon

Reaction score: 289
Messages: 922

Maybe it's still a bug in the driver? Maybe off SATA3.0 :( ?
I not off system. After weekend see...
This looks like something developer mav@ could help with. He is here on the forums, though I'm not sure if just tagging him as I did will alert him to this topic.
 

diizzy

Well-Known Member

Reaction score: 61
Messages: 260

I've seen similar timeout issues on AMD controllers, seems harmless but still....
//Danne
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

This looks like something developer mav@ could help with. He is here on the forums, though I'm not sure if just tagging him as I did will alert him to this topic.
Everything happens again. In the log after a reboot nothing happened to record. And, as I said, and after "RESET": system normal booting (ZFS :)). The system messages on cons0 see such messages as cited earlier, also adds messages about ahcich0 ... and in the end wrote that ada1 ada2 and detached. zpool is raidz1 do not work without 2 HDD.

Maybe lower the rate of SATA from 600 to 300?
I hope that the mav@ glance here :).
 
Last edited:

tingo

Daemon

Reaction score: 391
Messages: 2,033

Changing the SATA speed from 600 to 300 should not be necessary, unless you have old hard drives which can't figure out the correct speed themselves.
Have you tried changing cables?

Oh, and one more thing: have you checked if you have the newest firmware for your hard drives? Sometimes you get unlucky and get hard drives with faulty firmware.
 

protocelt

Daemon

Reaction score: 410
Messages: 1,253

Maybe this is a possible problem with the SATA controller itself. FWIW, I very vaguely remember seeing some issues with that JMicron chipset/controller on a FreeBSD mailing list and I think as well on FreeNAS forums.
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

Changing the SATA speed from 600 to 300 should not be necessary, unless you have old hard drives which can't figure out the correct speed themselves.
Have you tried changing cables?

Oh, and one more thing: have you checked if you have the newest firmware for your hard drives? Sometimes you get unlucky and get hard drives with faulty firmware.
Yes, translated from 600 to 300 is not turned. FreeBSD ignored the BIOS settings and still connects the hdd as the SATA3.
Update firmware? - I will test it, but on my system (MBoard based a "new" chipset 880) drives Seagate, Toshiba and the other (MBoard based "old" 790 chipset) - identical error associated with ahсich.
It's strange and suggestive still problems ahci driver.

Maybe this is a possible problem with the SATA controller itself. FWIW, I very vaguely remember seeing some issues with that JMicron chipset/controller on a FreeBSD mailing list and I think as well on FreeNAS forums.
No, hdd attached to chipset (AMD) controller. JMicron chipset attached only DVD drive (ahcich6,7) - work normal.

All HDDs are attached to the AMD chipset, that said. There may be a BIOS/BIOS mod around with a newer AMD AHCI ROM modules.
http://www.win-raid.com/t7f13-AHCI-amp-RAID-ROM-Modules.html
//Danne
So radically solve the problem? :) AHCI into chipset is not a separate controller - I understand that the replacement must be available, but I never did.
Somehow I was not ready mentally. :)

Now report:
Code:
Jun 10 06:54:33 wfid78-172 kernel: ahcich1: Timeout on slot 27 port 0
Jun 10 06:54:33 wfid78-172 kernel: ahcich1: is 00000008 cs 00000000 ss 00000000 rs 0c000000 tfd 40 serr 00000000 cmd 00007b17
Jun 10 06:54:33 wfid78-172 kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 04 e1 fb 98 40 14 00 00 00 00 00
Jun 10 06:54:33 wfid78-172 kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
Jun 10 06:54:33 wfid78-172 kernel: (ada1:ahcich1:0:0:0): Retrying command
Jun 10 06:54:33 wfid78-172 kernel: ahcich3: Timeout on slot 5 port 0
Jun 10 06:54:33 wfid78-172 kernel: ahcich3: is 00000008 cs 00000000 ss 00000000 rs 00000020 tfd 40 serr 00000000 cmd 00006517
Jun 10 06:54:33 wfid78-172 kernel: (ada2:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 04 e1 fb 98 40 14 00 00 00 00 00
Jun 10 06:54:33 wfid78-172 kernel: (ada2:ahcich3:0:0:0): CAM status: Command timeout
Jun 10 06:54:33 wfid78-172 kernel: (ada2:ahcich3:0:0:0): Retrying command
Jun 10 06:56:03 wfid78-172 kernel: ahcich1: Timeout on slot 27 port 0
Jun 10 06:56:03 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 08000000 tfd 50 serr 00000000 cmd 00007b17
...
And after timeout one by one "detached ada1" and "detached ada2".
Thanks for the link to FreeNAS forum.
This is a similar problem:
1. https://forums.freenas.org/index.php?threads/concerned-about-getting-ahcichx-timeout-on-xx-port-0.26468/#post-167977
2. https://forums.freebsd.org/threads/ahci-device-timeouts-while-performing-zfs-scrub.24189/

Find solve this:
Boris Samorodov wrote on 27.09.2013 17:38:
>> reverting those two commits solved the issue.
>
> In my case just rebuilding and restarting of sysutils/hal helped.

Rebuilding and restarting hald solved the issue on non-reverted kernel.
Thank you, Boris!
--
Regards, Ruslan
T.O.S. Of Reality
Yes... I changed the SATA cables... :( no difference.
I find info in Google... no answers :(
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

Here's what else is interesting, if highly complex work load on the system: disk copy, backup, video coding ... etc. That everything works and there are no faults! No error! But here is a long time to leave the computer around and go in the morning, start working - there are the same problems of disconnection drive.

FW in the drive is latest. I see at Seagate.
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

Next report:
Code:
Jun 12 05:11:00 wfid78-172 kernel: ahcich1: Timeout on slot 28 port 0
Jun 12 05:11:00 wfid78-172 kernel: ahcich1: is 00000008 cs 00000000 ss 00000000 rs 10000000 tfd 40 serr 00000000 cmd 00207c17
Jun 12 05:11:00 wfid78-172 kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 77 85 de 8c 40 0a 00 00 00 00 00
Jun 12 05:11:00 wfid78-172 kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
Jun 12 05:11:00 wfid78-172 kernel: (ada1:ahcich1:0:0:0): Retrying command
Jun 12 05:11:00 wfid78-172 kernel: ahcich3: Timeout on slot 11 port 0
Jun 12 05:11:00 wfid78-172 kernel: ahcich3: is 00000008 cs 00000000 ss 00000000 rs 00000800 tfd 40 serr 00000000 cmd 00206b17
Jun 12 05:11:00 wfid78-172 kernel: (ada2:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 78 84 de 8c 40 0a 00 00 00 00 00
Jun 12 05:11:00 wfid78-172 kernel: (ada2:ahcich3:0:0:0): CAM status: Command timeout
Jun 12 05:11:00 wfid78-172 kernel: (ada2:ahcich3:0:0:0): Retrying command
Jun 12 05:12:30 wfid78-172 kernel: ahcich3: Timeout on slot 11 port 0
Jun 12 05:12:30 wfid78-172 kernel: ahcich3: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cmd 00206b17
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): CAM status: Command timeout
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): Retrying command
Jun 12 05:12:30 wfid78-172 kernel: ahcich1: Timeout on slot 28 port 0
Jun 12 05:12:30 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 10000000 tfd 50 serr 00000000 cmd 00207c17     
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00                 
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): CAM status: Command timeout                                           
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): Retrying command                                                       
Jun 12 05:12:30 wfid78-172 kernel: ahcich1: Timeout on slot 28 port 0                                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 10000000 tfd 50 serr 00000000 cmd 00207c17     
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00                 
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): CAM status: Command timeout                                           
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich3: Timeout on slot 11 port 0                                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich3: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cmd 00206b17     
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00                 
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): CAM status: Command timeout                                           
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): Error 5, Retries exhausted                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich3: Timeout on slot 11 port 0                                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich3: is 00000002 cs 00000000 ss 00000000 rs 00000800 tfd 50 serr 00000000 cmd 00206b17     
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00                 
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): CAM status: Command timeout                                           
Jun 12 05:12:30 wfid78-172 kernel: (aprobe1:ahcich3:0:0:0): Error 5, Retry was blocked                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich1: Timeout on slot 28 port 0                                                             
Jun 12 05:12:30 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 10000000 tfd 50 serr 00000000 cmd 00207c17     
Jun 12 05:12:30 wfid78-172 kernel: ada2 at ahcich3 bus 0 scbus3 target 0 lun 0                                                     
Jun 12 05:12:30 wfid78-172 kernel: ada2: <ST31000524AS JC45> s/n 6VPBWKNY detached                                                 
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00                 
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): CAM status: Command timeout                                           
Jun 12 05:12:30 wfid78-172 kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retry was blocked
Jun 12 05:12:30 wfid78-172 kernel: ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
Jun 12 05:12:30 wfid78-172 kernel: ada1: <ST31000524AS JC45> s/n 6VPBANNF detached
:(
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

I make testing my HDD:
ada0:
Code:
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     26691         -
# 2  Extended offline    Interrupted (host reset)      00%     26686         -
# 3  Short offline       Completed without error       00%     14403         -
# 4  Short offline       Completed without error       00%     14380         -
# 5  Short offline       Completed without error       00%     14356         -
# 6  Short offline       Completed without error       00%     14332         -
ada1:
Code:
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     26778         -
# 2  Extended offline    Interrupted (host reset)      00%     26773         -
# 3  Short offline       Completed without error       00%     14491         -
# 4  Short offline       Completed without error       00%     14467         -
# 5  Short offline       Completed without error       00%     14443         -
# 6  Short offline       Completed without error       00%     14420         -
ada2:
Code:
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     28137         -
# 2  Extended offline    Interrupted (host reset)      00%     28133         -
# 3  Short offline       Completed without error       00%     25236         -
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

Try this in your [FONT=Courier New]/boot/loader.conf[/FONT]
Code:
hint.ahci.0.msi=0
Code:
> cat /var/run/dmesg.boot | grep ahci
ahci0: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0xff00-0xff07,0xfe00-0xfe03,0xfd00-0xfd07,0xfc00-0xfc03,0xfb00-0xfb0f mem 0xfe02f000-0xfe02f3ff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 6 6Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahci1: <JMicron JMB363 AHCI SATA controller> mem 0xfdbfe000-0xfdbfffff irq 17 at device 0.0 on pci5
ahci1: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
ahci1: quirks=0x1<NOFORCE>
ahcich6: <AHCI channel> at channel 0 on ahci1
ahcich7: <AHCI channel> at channel 1 on ahci1
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
cd0 at ahcich7 bus 0 scbus7 target 0 lun 0
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada2 at ahcich3 bus 0 scbus3 target 0 lun 0
My system have two SATA-controllers...
Need
Code:
hint.ahci.0.msi=0
hint.ahci.1.msi=0
???

What give this hint.ahci.0.msi=0 ?
 

dR3b

Member

Reaction score: 7
Messages: 35

Yes that's right. See the ahci(4) man page for details:
Code:
hint.ahci.X.msi
controls Message Signaled Interrupts (MSI) usage by the specified con-
troller.

0 MSI disabled;
1 single MSI vector used, if supported (default);
2 multiple MSI vectors used, if supported;
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

I conducted an experiment. I have zfs raidz1 and HDD are identical in model, then reconnect them, changing channels and cable connection.
hint.ahci.0.msi set by default (eq. 1).
Code:
Jun 19 01:11:21 wfid78-172 kernel: ahcich2: Timeout on slot 23 port 0
Jun 19 01:11:21 wfid78-172 kernel: ahcich2: is 00000008 cs 00000000 ss 00000000 rs 00c00000 tfd 40 serr 00000000 cmd 00207717
Jun 19 01:11:21 wfid78-172 kernel: (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 04 94 8e d2 40 22 00 00 00 00 00
Jun 19 01:11:21 wfid78-172 kernel: (ada2:ahcich2:0:0:0): CAM status: Command timeout
Jun 19 01:11:21 wfid78-172 kernel: (ada2:ahcich2:0:0:0): Retrying command
Jun 19 01:11:21 wfid78-172 kernel: ahcich1: Timeout on slot 25 port 0
Jun 19 01:11:21 wfid78-172 kernel: ahcich1: is 00000008 cs 00000000 ss 00000000 rs 03000000 tfd 40 serr 00000000 cmd 00207917
Jun 19 01:11:21 wfid78-172 kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 04 95 8e d2 40 22 00 00 00 00 00
Jun 19 01:11:21 wfid78-172 kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout
Jun 19 01:11:21 wfid78-172 kernel: (ada1:ahcich1:0:0:0): Retrying command
Jun 19 01:12:51 wfid78-172 kernel: ahcich2: Timeout on slot 23 port 0
Jun 19 01:12:51 wfid78-172 kernel: ahcich2: is 00000002 cs 00000000 ss 00000000 rs 00800000 tfd 50 serr 00000000 cmd 00207717
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): Retrying command
Jun 19 01:12:51 wfid78-172 kernel: ahcich1: Timeout on slot 25 port 0
Jun 19 01:12:51 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 02000000 tfd 50 serr 00000000 cmd 00207917
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): CAM status: Command timeout
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): Retrying command
Jun 19 01:12:51 wfid78-172 kernel: ahcich1: Timeout on slot 25 port 0
Jun 19 01:12:51 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 02000000 tfd 50 serr 00000000 cmd 00207917
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): CAM status: Command timeout
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): Error 5, Retries exhausted
Jun 19 01:12:51 wfid78-172 kernel: ahcich2: Timeout on slot 23 port 0
Jun 19 01:12:51 wfid78-172 kernel: ahcich2: is 00000002 cs 00000000 ss 00000000 rs 00800000 tfd 50 serr 00000000 cmd 00207717
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retries exhausted
Jun 19 01:12:51 wfid78-172 kernel: ahcich1: Timeout on slot 25 port 0
Jun 19 01:12:51 wfid78-172 kernel: ahcich1: is 00000002 cs 00000000 ss 00000000 rs 02000000 tfd 50 serr 00000000 cmd 00207917
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): CAM status: Command timeout
Jun 19 01:12:51 wfid78-172 kernel: (aprobe1:ahcich1:0:0:0): Error 5, Retry was blocked
Jun 19 01:12:51 wfid78-172 kernel: ahcich2: Timeout on slot 23 port 0
Jun 19 01:12:51 wfid78-172 kernel: ahcich2: is 00000002 cs 00000000 ss 00000000 rs 00800000 tfd 50 serr 00000000 cmd 00207717
Jun 19 01:12:51 wfid78-172 kernel: ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
Jun 19 01:12:51 wfid78-172 kernel: ada1: <ST31000524AS JC45> s/n 6VPBANNF detached
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): CAM status: Command timeout
Jun 19 01:12:51 wfid78-172 kernel: (aprobe0:ahcich2:0:0:0): Error 5, Retry was blocked
Jun 19 01:12:51 wfid78-172 kernel: ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
Jun 19 01:12:51 wfid78-172 kernel: ada2: <ST31000524AS JC45> s/n 6VPBWKNY detached
The problem is not in the hdd.
Strangely enough, the problem is always in ahci-channels is greater than 0! :(

Now I set
Code:
hint.ahci.0.msi=0
hint.ahci.1.msi=0
(attached DVD).

Let's see what happens.

These parameters can be written in loader.conf or device.hints.
 

diizzy

Well-Known Member

Reaction score: 61
Messages: 260

I think this is AMD AHCI controller quirks, Linux has a few "odd" workarounds in their driver for instance.
//Danne
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

Yes, before that my system works with two hdd but they were connected independently without raid and was ufs.
And I worked mainly ada0 and ada1 was just store.
In fact I have the system configured in a similar manner, but the Intel chipset, such errors do not happen.

I hope the author of a ahci driver mav@ able to look at this problem.
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

With
Code:
hint.ahci.0.msi=0
hint.ahci.1.msi=0
As long as it works.
But ... still off msi is not quite true :(.
 
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

After set msi=0 ...
Code:
> uptime
19:17  up 7 days,  9:11, 5 users, load averages: 0,75 0,83 0,58
I see source ahci.c/achi.h ... 'll think.
mav@ says nothing and does not respond to emails... probably no time.
 
Last edited by a moderator:
OP
OP
WCSN

WCSN

Member

Reaction score: 17
Messages: 57

If the problem is in the treatment of "events" around "msi" and there is a problem with chipset AMD (I have no such problems on computers with Intel chipsets, the way controller Marvel also gives the same error). Maybe this is due to the possibility of a hot-plug SАТА hdd?

Если проблема в обработке "событий" вокруг "msi" и есть проблемы именно с чипсетами AMD (у меня нет таких проблем на компах с интеловскими чипсетами, кстати марвел тоже даёт такую же ошибку). Может быть это связано с возможностью hot-plug для SАТА hdd?
 
Top