3726 port multiplier regression in 9.0?

I've recently upgraded an amd64 disk-to-disk backup server from 8.2 to 9.0 and run into a serious problem that I'm hoping can be resolved. We have three SiI 3132 dual-port eSATA cards. Each of those six ports connects to an SiI 3726 (possibly 4726, but detected as 3726) port multiplier, which in turn drives four disks. This worked brilliantly on 8.x with a total of 24 disks running reliably for quite some time.

Upon upgrading to 9.0-RELEASE, this has ceased to work. On the standard kernel, the relevant startup messages are these:

Code:
Jan 30 13:32:06 chimney kernel: siis0: <SiI3132 SATA controller> port 0x9c00-0x9c7f mem 0xfaefe000-0xfaefe07f,0xfaef8000-0xfaefbfff irq 26 at device 0.0 on pci3
Jan 30 13:32:06 chimney kernel: siisch0: <SIIS channel> at channel 0 on siis0
Jan 30 13:32:06 chimney kernel: siisch1: <SIIS channel> at channel 1 on siis0

Jan 30 13:32:06 chimney kernel: siis1: <SiI3132 SATA controller> port 0xac00-0xac7f mem 0xfaffe000-0xfaffe07f,0xfaff8000-0xfaffbfff irq 30 at device 0.0 on pci4
Jan 30 13:32:06 chimney kernel: siisch2: <SIIS channel> at channel 0 on siis1
Jan 30 13:32:06 chimney kernel: siisch3: <SIIS channel> at channel 1 on siis1

Jan 30 13:32:06 chimney kernel: siis2: <SiI3132 SATA controller> port 0xbc00-0xbc7f mem 0xf73fe000-0xf73fe07f,0xf73f8000-0xf73fbfff irq 48 at device 0.0 on pci131
Jan 30 13:32:06 chimney kernel: siisch4: <SIIS channel> at channel 0 on siis2
Jan 30 13:32:06 chimney kernel: siisch5: <SIIS channel> at channel 1 on siis2

Jan 30 13:32:06 chimney kernel: siisch0: port is not ready (timeout 1000ms) status = 001f0000
Jan 30 13:32:06 chimney kernel: siisch1: port is not ready (timeout 1000ms) status = 001f0000
Jan 30 13:32:06 chimney kernel: siisch2: port is not ready (timeout 1000ms) status = 001f0000
Jan 30 13:32:06 chimney kernel: siisch3: port is not ready (timeout 1000ms) status = 001f0000
Jan 30 13:32:06 chimney kernel: siisch4: port is not ready (timeout 1000ms) status = 001f0000
Jan 30 13:32:06 chimney kernel: siisch5: port is not ready (timeout 1000ms) status = 001f0000
At that point, nothing at all has been detected. Next I tried building a custom kernel with the "siis" driver removed, so that I could load it manually. At that point, if I do a "kldload siis" followed by "camcontrol rescan all", after a lengthy delay I get "Re-scan of bus n was sucessful" where n ranged from 0-14 (this machine has other types of controllers as well). However, in the kernel log we have:
Code:
ata2: SiI 3726 (rev=1706) Port Multiplier with 6 (5) ports
ata2: SiI 3726 (rev=1706) Port Multiplier with 6 (5) ports
ata6: SiI 3726 (rev=1706) Port Multiplier with 6 (5) ports
ata6: SiI 3726 (rev=1706) Port Multiplier with 6 (5) ports
ata7: SiI 3726 (rev=1706) Port Multiplier with 6 (5) ports
ata7: SiI 3726 (rev=1706) Port Multiplier with 6 (5) ports
Unfortunately this doesn't result in anything useful. "camcontrol devlist -v" still shows nothing for these six ports. Finally, here is the kernel output from "kldload siis" and "camcontrol rescan all" with debug.bootverbose=1:

<see attachment>

That was "kldload siis", now for "camcontrol rescan all" (trimmed to just the first 3132 card):
<see second attachment>

I'll be grateful for any assistance. This has me stumped, as the hardware configuration is unchanged from what worked with 8.2, and others seem to report that the 3132->3726 configuration works fine in 9.0.

Thanks,
Allen
 

Attachments

  • kldload-siis.txt
    9.7 KB · Views: 291
  • camcontrol-rescan-all2.txt
    14.7 KB · Views: 301
To add one additional data point, I booted from the 9.0 memstick image and had the same results. So this isn't coming from some sort of bizarre bad upgrade.

Am looking to try the same thing with 8.2 but this machine has some problems with the 8.2 memstick image.

Thanks,
Allen
 
Not sure it is your case, but Port Multipliers have documented operation aspect: if there is no disk on the first port-multiplier port, card will get no ready status from it and will report reset timeout. For ahci(4) it causes additional 30 seconds probe delay. For siis(4) I don't remember exactly what should be result. Haven't you changed anything in your hardware setup except updating system? Show please your full verbose dmesg.
 
Hi Mav, thanks for asking! I made changes to the hardware several days prior to upgrading (added another mpt card, moved one siis card to another slot to make room) but then continued to run 8.2-RELEASE after that with no issues. The upgrade to 9.0 took place with no further changes.

Attached is a verbose dmesg; the beginning got cut off, presumably the kernel buffer isn't large enough. I think it contains all of the important bits, but let me know if I need to enlarge the buffer and try again. This was made while booting the 9.0 GENERIC kernel, so the siis driver was integrated.

I've also attached the last boot of 8.2, retrieved from /var/log/messages. It's not verbose, but perhaps it will be useful.

Thanks,
Allen
 

Attachments

  • verbose.bz2
    7.9 KB · Views: 143
  • lastboot82.bz2
    5.4 KB · Views: 143
Strange situation. I can't guess what in 9.0 could be differ in siis to cause that. That can be interrupt delivery problem, because I can see proper port multipliers signatures in statuses, while driver reports timeouts. You may check vmstat -i output for interrupts from siis controllers. Each completed command, including soft reset should generate one. If there are none, I would started to look what could change there.
 
Hi Mav, I think you've hit upon something important. Here's the output of vmstat -i:
Code:
interrupt                          total       rate
irq20: hpet0                     7577788        102
irq23: uhci0 ehci0                   751          0
irq275: cxgbc0                    117539          1
irq276: cxgbc0                       647          0
irq277: cxgbc0                     29220          0
irq278: cxgbc0                      5130          0
irq279: cxgbc0                      4332          0
irq280: cxgbc0                      1223          0
irq281: cxgbc0                      6151          0
irq282: cxgbc0                      2480          0
irq283: ahci0                      99786          1
irq284: mpt0                       17412          0
irq285: mpt1                        2352          0
irq286: mpt2                       13720          0
Total                            7878531        106
Notice that there is no entry for the siis driver!
 
They are poddibly sharing interrupts with other devices and their names may not fit into buffer. What interrupts they use you may find in dmesg. Also you may try vmstat -ia just in case if they haven't generated any interrupts and that's why wasn't shown.
 
Indeed, no interrupts are being generated:

Code:
# vmstat -ia | grep -A1 siis
irq26: siis0                           0          0
stray irq26                            0          0
--
irq30: siis1                           0          0
stray irq30                            0          0
--
irq48: siis2                           0          0
stray irq48                            0          0

These IRQs match the ones shown being assigned in dmesg. It's like interrupt generation isn't being enabled on the 3132.

Thanks,
Allen
 
Can you try to connect some disk directly to any of these controllers? If result will be the same, I've doubt it is siis(4) driver problem, because there was no much changes and also problem would be noticed by others. I would suppose it can be something with legacy interrupts routing, as most of other devices in your system use MSI interrupts, but I don't know how to check that. Unluckily MSIs not working properly on SiI3132. I just don't remember if they are not working completely or dying after some load. You may try to enable them for experiment.
 
Hi Mav, that's exactly what I ended up trying last night. Forcing MSI initially makes everything appear to work. The pmp devices are detected immediately, as are the drives behind them. Unfortunately it's clearly unreliable, as any attempt to read from those disks results in a hang & timeout after the first second or so.

A couple of days ago I tried reverting to the 8.2 siis driver source and got the same results, so I agree that it's not anything in the driver. Most likely something changed in the kernel relating to how legacy irqs are set up.

I've got a bug open on this, kern/164694, so we'll see what happens. I added the results of testing MSI but they've not come through as of just now.

Thanks for your help in looking at this! My workaround for now will be either reinstalling 8.2 or tracking down some eSATA controllers which aren't based on the 3132.

Allen
 
Just a test idea. Now siis(4) are the only cards in your case using legacy IRQs above 23. All the rest seems using MSIs. Can you disable MSI for one of them to check if legacy IRQs above 23 will work for them?
 
Back
Top