panic when connecting or disconnecting A/C power

Hi all,

my Ryzen 4700-powered HP Probook running Current as of July 10 is now capable of suspend&resume (I assume thx to drm-devel from the same date). I did a "pkg update" + "pkg upgrade" just today.

Whenever I connect or disconnect A/C power, it panics immediately and drops me into kdb.

Since at that moment I'm dropped into the console at that moment, I only see part of the stack backtraces - two of them to be precise:
The first seems to be be repeated several times; I see two instances thereof, it starts with
Code:
WARNING !dmr_modeset_is_locked(plane->mutex) failed at ... 
#0 0xf.... at linux_dump_stack+0x23
...

the second one has cpuid and a timestamp and then:
Code:
KDB: stack backtrace:
<stuff that looks fairly standard trap handling>
--- trap 0xc, rip ... --
amdgpu_pm_acpi_event_handler() at amdgpu_pm_acpi_event_handler+0x69/frame 0x...
amdgpu_acpi_event() at amdgpu_acpi_event+0x62/frame 0x...
linux_handle_acpi_acad_event() at linux_handle_acpi_acad_event+0x3c/frame 0x...
...

I think it may be relevant that I'm seeing tons of these three messages in the logs and on the text console:

Code:
Jul 31 18:02:30 hbeast kernel: Firmware Error (ACPI): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210604/exoparg2-569)
Jul 31 18:02:30 hbeast kernel: ACPI Error: Aborting method \_TZ.GTTP due to previous error (AE_AML_PACKAGE_LIMIT) (20210604/psparse-689)
Jul 31 18:02:30 hbeast kernel: ACPI Error: Aborting method \_TZ.CHGZ._TMP due to previous error (AE_AML_PACKAGE_LIMIT) (20210604/psparse-689)

... roughly every ten seconds.

I'd be grateful for any pointers or advice to get my power management fixed.

regards & TIA
Michael
 
A quick peruse of the web shows:
This seems to point to a BIOS/UEFI issue. Any updates available? That's your first port of call.
 
mark_j thank you - I was hoping I'd get around it since AFAICT these messages are not present when I'm using the other OS ... (and since that's not the one from Redmond, HP only offers an update from the BIOS itself, which I'm a bit hesitant about, as I don't have a FAT partition ... )

cheers!
 
I guess the acpi errors refer to a thermal zone ( = TZ). If they show up every 10 seconds, maybe the temperature is reported to the OS every 10 seconds or so. It is reporting an invalid value I guess.

The kernel backtrace shows a amdgpu event handler, that should handle acap events (AC adapter events). I would Think that the bug is in these event handlers?


Does the laptop boot with the cable unplugged?

I would make a PR.
 
Hi George,
thx for your feedback. The laptop boots with and without a plugged-in power cable.

btw: even removing the power cable when laptop is suspended causes the panic upon resume.
 
another update: I've updated to the latest BIOS, no (real) change - I haven't looked at the new stack traces when the laptop panics, but panic it definitely does (the ACPI error messages look the same)
 
Maybe you could boot without the graphics drivers and see whether the problem persists (to find out of its really amdgpu related). But either way, create a bug report. ;D I don't think anything can be done except writing a patch.
 
Regarding issue with tons of ACPI error messages, here is my patched ACPI files to avoid error:

Actually it resolves error in acpi_tz_thread to get temperature of last thermal zone. Originally hw.acpi.thermal.tz5.temperature doesn't contain correct temperature, after fix it works fine.

Hope it helps anybody.
 
Back
Top