Solved ASM1153E Causing system Lockup?

Issue: System will lock up on on file transfers(sometimes) to a External USB drive.
HDD: Segate IronWolf 4TB, GELI encrypted with ZFS pool on it
usb chipset: ASM1153E
Memory: 32GB DDR4 PNY
CPU: Ryzen 5 Pro 2400GE


Description:
External USB 3.1 drive using ASM1153E IC will lock up the entire server while transferring files to and from it, but only after a certain amount of time/data. At first I assumed that it was a jail I had running that was eating up system memory but that turned out to be a dead end. I end up having to hard reset the server when it locks up so I don't get any dump data or other interesting logs. This is on a Lenovo Thinkcetre m715q computer I'm using as a home-server.

I'm looking for ideas on what what the cause could be and what to try.

What I've Tried:
  • checking the var/logs for info.
  • setting sysctl for xhci debugging while using the external drive.
  • limiting transfer by using rsync with -bwlimit=20000
  • Shutting down all other user processes while testing.
  • Reading the smart data for the drive(it passes)
  • checking usbconfig dumps.
  • setting sysctl xhci for streams to 1
  • Searching for usb bugs related to ASM1153 and ASM1153E, didn't find anything can't even find any datasheets so far.
  • Checking iostat while transferring files, nothing looked crazy.

I've attached the output from some of the stuff I've messed with.
 

Attachments

  • dmesg.txt
    20.7 KB · Views: 110
  • usbconfig_desc.txt
    4.7 KB · Views: 83
  • usbconfig_stats.txt
    323 bytes · Views: 85
Here's the output from running smart, zfs get all and the messages.
 

Attachments

  • usbconfig.txt
    539 bytes · Views: 92
  • da0_smartctl.txt
    5.1 KB · Views: 85
  • zfs_get_all_nas_dataset.txt
    7.1 KB · Views: 80
  • messages.txt
    36.2 KB · Views: 95
So, was doing more testing today to see what I could get going.
after setting the sysctl tunable for umass devices sysctl hw.usb.umass.throttle=4 and running this script:
Code:
while true
do
sleep 5
vmstat -i
zfs-stats -M
done;

and I watched as the intrrupts to the xhci0 (my external hdd) rise slowly all the way up too a rate of 1015 before the ssh session freezes and the server locks up.
Here's the last output I could get:
Screenshot from 2022-11-16 18-28-26.png


Will update once I've run more tests. The output from xhci with debugging 16, didn't show any errors when I send it to be logged, so I'm assuming it's probably not the issue.
 
Last edited by a moderator:
Had another lockup today when running rsync on two separate usb3.1 external hdd both using the same IC. So now I'm testing the first hdd by using it over a usb2.0 port. So far what sticks out to me is the UE_BULK_FAIL line for ugen1.2

After running zfs Scrub to observe io via usbdump info of ugen2.2 I can't see anything indicating an issue.

I'll start looking more into umass to try to see what I can find will update with results or solution.



Code:
jake@thinkcentre001 /u/h/jake [1]> sudo usbconfig -d ugen1.2 dump_stats
Password:
ugen1.2: <RSH-339 ASM1153E> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (0mA)

{
    UE_CONTROL_OK       : 35
    UE_ISOCHRONOUS_OK   : 0
    UE_BULK_OK          : 246
    UE_INTERRUPT_OK     : 0
    UE_CONTROL_FAIL     : 0
    UE_ISOCHRONOUS_FAIL : 0
    UE_BULK_FAIL        : 32
    UE_INTERRUPT_FAIL   : 0
}

jake@thinkcentre001 /u/h/jake> sudo usbconfig -d ugen2.2 dump_stats
ugen2.2: <vendor 0x05e3 USB2.0 Hub> at usbus2, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA)

{
    UE_CONTROL_OK       : 12667
    UE_ISOCHRONOUS_OK   : 0
    UE_BULK_OK          : 0
    UE_INTERRUPT_OK     : 2
    UE_CONTROL_FAIL     : 0
    UE_ISOCHRONOUS_FAIL : 0
    UE_BULK_FAIL        : 0
    UE_INTERRUPT_FAIL   : 0
}

jake@thinkcentre001 /u/h/jake>
 
Last edited by a moderator:
After some looking I think I might have an idea of what the problem could be.
I think the power coming into the building is dirty or at the very least not power-factor corrected.

I'm going to just call this solved for now because until I can setup a datalogger for my single phase AC I don't think I will be able to know for sure.
 
Further testing,

After experiencing yet another lockup of the system while playing around with zpools I came across my solution by chance.
I set
Code:
kern.smp.disabled="1"
in my /boot/loader.conf file.

For some reason removing simultaneous multi-processing prevents the issues with external usb drives. It does mean I'm stuck using a single core however....But I'm fairly confident that it solved the stability issue; I transferred close to a TB or data between two external USB drives without issue.

As for the power issue I also tested a known working psu that has more than enough ampacity and it made no difference when smp was re-enabled.
 
For some reason removing simultaneous multi-processing prevents the issues with external usb drives.
I would report this as a bug. Might be specific to Ryzen, or the driver for the chipset, or a combination of both. Either way a developer should take a closer look at it. It sounds like its fairly easy to reproduce, that should help tracking down the issue.
 
SirDice I'll do such if it happens again, I just did a fresh install this morning with 13.1-RELEASE(same as before). Once I setup NFS again I'll hammer the I/O and see if I can reproduce the issue.

I think you are probably right on it being a Ryzen/chipset issue; but I want to make sure it's not something that was wrong with my previous setup. Wish me luck in testing.
:)
 
Back
Top