Solved ASM1153E Causing system Lockup?

Jake0162 · Nov 15, 2022

Issue: System will lock up on on file transfers(sometimes) to a External USB drive.
HDD: Segate IronWolf 4TB, GELI encrypted with ZFS pool on it
usb chipset: ASM1153E
Memory: 32GB DDR4 PNY
CPU: Ryzen 5 Pro 2400GE

Description:
External USB 3.1 drive using ASM1153E IC will lock up the entire server while transferring files to and from it, but only after a certain amount of time/data. At first I assumed that it was a jail I had running that was eating up system memory but that turned out to be a dead end. I end up having to hard reset the server when it locks up so I don't get any dump data or other interesting logs. This is on a Lenovo Thinkcetre m715q computer I'm using as a home-server.

I'm looking for ideas on what what the cause could be and what to try.

What I've Tried:

checking the var/logs for info.
setting sysctl for xhci debugging while using the external drive.
limiting transfer by using rsync with -bwlimit=20000
Shutting down all other user processes while testing.
Reading the smart data for the drive(it passes)
checking usbconfig dumps.
setting sysctl xhci for streams to 1
Searching for usb bugs related to ASM1153 and ASM1153E, didn't find anything can't even find any datasheets so far.
Checking iostat while transferring files, nothing looked crazy.

I've attached the output from some of the stuff I've messed with.

Jake0162 · Nov 15, 2022

Here's the output from running smart, zfs get all and the messages.

Jake0162 · Nov 17, 2022

So, was doing more testing today to see what I could get going.
after setting the sysctl tunable for umass devices sysctl hw.usb.umass.throttle=4 and running this script:

Code:

while true
do
sleep 5
vmstat -i
zfs-stats -M
done;

and I watched as the intrrupts to the xhci0 (my external hdd) rise slowly all the way up too a rate of 1015 before the ssh session freezes and the server locks up.
Here's the last output I could get:

Will update once I've run more tests. The output from xhci with debugging 16, didn't show any errors when I send it to be logged, so I'm assuming it's probably not the issue.

Jake0162 · Nov 17, 2022

Had another lockup today when running rsync on two separate usb3.1 external hdd both using the same IC. So now I'm testing the first hdd by using it over a usb2.0 port. So far what sticks out to me is the UE_BULK_FAIL line for ugen1.2

After running zfs Scrub to observe io via usbdump info of ugen2.2 I can't see anything indicating an issue.

I'll start looking more into umass to try to see what I can find will update with results or solution.

Code:

jake@thinkcentre001 /u/h/jake [1]> sudo usbconfig -d ugen1.2 dump_stats
Password:
ugen1.2: <RSH-339 ASM1153E> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (0mA)

{
    UE_CONTROL_OK       : 35
    UE_ISOCHRONOUS_OK   : 0
    UE_BULK_OK          : 246
    UE_INTERRUPT_OK     : 0
    UE_CONTROL_FAIL     : 0
    UE_ISOCHRONOUS_FAIL : 0
    UE_BULK_FAIL        : 32
    UE_INTERRUPT_FAIL   : 0
}

jake@thinkcentre001 /u/h/jake> sudo usbconfig -d ugen2.2 dump_stats
ugen2.2: <vendor 0x05e3 USB2.0 Hub> at usbus2, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=SAVE (100mA)

{
    UE_CONTROL_OK       : 12667
    UE_ISOCHRONOUS_OK   : 0
    UE_BULK_OK          : 0
    UE_INTERRUPT_OK     : 2
    UE_CONTROL_FAIL     : 0
    UE_ISOCHRONOUS_FAIL : 0
    UE_BULK_FAIL        : 0
    UE_INTERRUPT_FAIL   : 0
}

jake@thinkcentre001 /u/h/jake>

Jake0162 · Nov 22, 2022

After some looking I think I might have an idea of what the problem could be.
I think the power coming into the building is dirty or at the very least not power-factor corrected.

I'm going to just call this solved for now because until I can setup a datalogger for my single phase AC I don't think I will be able to know for sure.

Jake0162 · Jan 25, 2023

Further testing,

After experiencing yet another lockup of the system while playing around with zpools I came across my solution by chance.
I set

Code:

kern.smp.disabled="1"

in my /boot/loader.conf file.

For some reason removing simultaneous multi-processing prevents the issues with external usb drives. It does mean I'm stuck using a single core however....But I'm fairly confident that it solved the stability issue; I transferred close to a TB or data between two external USB drives without issue.

As for the power issue I also tested a known working psu that has more than enough ampacity and it made no difference when smp was re-enabled.

SirDice · Jan 25, 2023

Jake0162 said:
For some reason removing simultaneous multi-processing prevents the issues with external usb drives.

I would report this as a bug. Might be specific to Ryzen, or the driver for the chipset, or a combination of both. Either way a developer should take a closer look at it. It sounds like its fairly easy to reproduce, that should help tracking down the issue.

Jake0162 · Jan 25, 2023

SirDice I'll do such if it happens again, I just did a fresh install this morning with 13.1-RELEASE(same as before). Once I setup NFS again I'll hammer the I/O and see if I can reproduce the issue.

I think you are probably right on it being a Ryzen/chipset issue; but I want to make sure it's not something that was wrong with my previous setup. Wish me luck in testing.

Solved ASM1153E Causing system Lockup?

Jake0162

Attachments

Jake0162

Attachments

Jake0162

Jake0162

Jake0162

Jake0162

SirDice

Administrator

Jake0162