CoreBoot ACPI and FreeBSD suspend

I am trying to have reliable suspend and resume on my laptop after flashing CoreBoot and SeaBIOS on ThinkPad T430s.
Problem is that system will shutdown because it thinks that temperature is too high:
Code:
acpi_tz0: WARNING - current temperature (128.1C) exceeds safe limits
Laptop was in suspend less than a minute and it is not too hot on touch.

Not sure if relevant, but I noticed that acpidump from base doesn't work:
Code:
/usr/sbin/acpidump -dt > acpi_dt.asl
/usr/sbin/iasl acpi_dt.asl

Intel ACPI Component Architecture
ASL+ Optimizing Compiler/Disassembler version 20181003
Copyright (c) 2000 - 2018 Intel Corporation

Compiler aborting due to parser-detected syntax error(s)
acpi_dt.asl   6008:             }
Error    6126 -   syntax error ^

acpi_dt.asl   6028:             }
Error    6126 -   syntax error ^

acpi_dt.asl   6063:             }
Error    6126 -   syntax error ^

acpi_dt.asl   6445:
Error    6126 - syntax error and premature End-Of-File

ASL Input:     acpi_dt.asl - 6445 lines, 207014 bytes, 1574 keywords

Compilation complete. 4 Errors, 0 Warnings, 0 Remarks, 0 Optimizations

Another try with acpidump from ports:
Code:
/usr/local/bin/acpidump -b
/usr/local/bin/iasl dsdt.dat

Intel ACPI Component Architecture
ASL+ Optimizing Compiler/Disassembler version 20181213
Copyright (c) 2000 - 2018 Intel Corporation

File appears to be binary: found 4624 non-ASCII characters, disassembling
Binary file appears to be a valid ACPI table, disassembling
Input file dsdt.dat, Length 0x387B (14459) bytes
ACPI: DSDT 0x0000000000000000 00387B (v02 COREv4 COREBOOT 20110725 INTL
180810)
Pass 1 parse of [DSDT]
Pass 2 parse of [DSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)

Parsing completed
Disassembly completed
ASL Output:    dsdt.dsl - 146025 bytes

So, at least I can use that DSDT:
Code:
cp dsdt.dat /boot/dsdt.aml
/boot/loader.conf:
acpi_dsdt_load="YES"
acpi_dsdt_name="/boot/dsdt.aml"

Code:
dmesg | grep -i 'acpi error'
ACPI Error: No handler for Region [ERAM] (0xfffff80003580d00) [EmbeddedControl] (20181003/evregion-288)
ACPI Error: Region EmbeddedControl (ID=3) has no handler (20181003/exfldio-428)
ACPI Error: Method parse/execution failed \134_SB.PCI0.LPCB.EC.BAT0._STA, AE_NOT_EXIST (20181003/psparse-677)
ACPI Error: No handler for Region [ERAM] (0xfffff80003580d00) [EmbeddedControl] (20181003/evregion-288)
ACPI Error: Region EmbeddedControl (ID=3) has no handler (20181003/exfldio-428)
ACPI Error: Method parse/execution failed \134_SB.PCI0.LPCB.EC.BAT1._STA, AE_NOT_EXIST (20181003/psparse-677)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP00 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP01 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP02 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP03 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP00 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP01 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP02 (20181003/dspkginit-579)
ACPI Error: AE_NOT_FOUND, While resolving a named reference package element - \134_PR_.CP03 (20181003/dspkginit-579)
ACPI Error: uhub0: Method parse/execution failed <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
ACPI Error: uhub1: Method parse/execution failed \134_SB.PCI0.LPCB.EC.AC._PSR, AE_NOT_FOUND (20181003/psparse-677)
ACPI Error: Method parse/execution failed \134PNOT, AE_NOT_FOUND (20181003/psparse-677)
ACPI Error: Method parse/execution failed \134_SB.PCI0.LPCB.EC.AC._PSR, AE_NOT_FOUND (20181003/psparse-677)
ACPI Error: Method parse/execution failed \134PNOT, AE_NOT_FOUND (20181003/psparse-677)
ACPI Error: Method parse/execution failed \134_SB.PCI0.LPCB.EC.AC._PSR, AE_NOT_FOUND (20181003/psparse-677)
ACPI Error: Method parse/execution failed \134PNOT, AE_NOT_FOUND (20181003/psparse-677)
Full "dmesg.boot" file is attached.
But now with DSDT unmodifed, my system lost sysctl dev.cpu:
Code:
sysctl: unknown oid 'dev.cpu'

I have also tried to boot Linux (which doesn't have problems with resume after suspend) and dumped ACPI tables but that was not successfull (IIRC, problem was that FreeBSD was unable to parse DSDT from Linux)

I am using FreeBSD 12.0-RELEASE amd64 with small patch (but the problem with resume was same as with GENERIC kernel) so my battery can be recognized:
https://bugs.freebsd.org/bugzilla/attachment.cgi?id=196523&action=diff

Unfortunately currently I don't have stock BIOS ACPI tables (then I didn't know for acpica-tools package) but if needed I can flash stock BIOS and dump ACPI tables.
The same situation is with 13-CURRENT.
Any ideas how to proceed with debugging?
 

Attachments

  • acpi_errors_after_custom_dsdt.txt
    2.1 KB · Views: 330
  • msg-after-failed-resume.txt
    3.4 KB · Views: 244
  • dmesg.boot-default-dsdt.txt
    90 KB · Views: 280
  • dmesg.boot-override-dsdt.txt
    68 KB · Views: 260
The same problem is observed with loaded acpi_ibm and when acpi_ibm is not loaded.

For now it seems that this can "fix" the problem (at least the resume from suspend), but it needs more testing:
Code:
diff --git a/sys/dev/acpica/acpi_thermal.c b/sys/dev/acpica/acpi_thermal.c
index fa1c2e81cc2..b6a6b5eec4f 100644
--- a/sys/dev/acpica/acpi_thermal.c
+++ b/sys/dev/acpica/acpi_thermal.c
@@ -65,7 +65,7 @@ ACPI_MODULE_NAME("THERMAL")
 #define TZ_POLLRATE    10
 
 /* Make sure the reported temperature is valid for this number of polls. */
-#define TZ_VALIDCHECKS 3
+#define TZ_VALIDCHECKS 30
 
To get rid of the 'AE_NOT_FOUND' acpi errors in dmesg:

Add to: /boot/loader.conf
#get rid of acpi messages (AE_NOT_FOUND)
debug.acpi.disabled="thermal"


Tested working on

<slemke@besta>/home/slemke # uname -a
FreeBSD besta 12.0-RELEASE-p4 FreeBSD 12.0-RELEASE-p4 GENERIC amd64
<slemke@besta>/home/slemke #
 
The same problem is observed with loaded acpi_ibm and when acpi_ibm is not loaded.

For now it seems that this can "fix" the problem (at least the resume from suspend), but it needs more testing:
Code:
diff --git a/sys/dev/acpica/acpi_thermal.c b/sys/dev/acpica/acpi_thermal.c
index fa1c2e81cc2..b6a6b5eec4f 100644
--- a/sys/dev/acpica/acpi_thermal.c
+++ b/sys/dev/acpica/acpi_thermal.c
@@ -65,7 +65,7 @@ ACPI_MODULE_NAME("THERMAL")
#define TZ_POLLRATE    10

/* Make sure the reported temperature is valid for this number of polls. */
-#define TZ_VALIDCHECKS 3
+#define TZ_VALIDCHECKS 30
I am a beginner of FreeBSD and I used skulls to corebooted my x230, hit the same problem.
Do I need to rebuild the kernel after making the changes? I think I should follow the instructions here: https://www.freebsd.org/doc/handbook/kernelconfig-building.html
And set the MODULES_OVERRIDE to acpi_thermal.
Or, is there any way to rebuild only the acpi_thermal module ?
Please forgive me for my stupid question..

BTW, I think it would be great to expose this option, just like

using `hw.acpi.thermal.polling_rate` to control the polling rate
 
Do I need to rebuild the kernel after making the changes? I think I should follow the instructions here: https://www.freebsd.org/doc/handbook/kernelconfig-building.html
And set the MODULES_OVERRIDE to acpi_thermal.
Or, is there any way to rebuild only the acpi_thermal module ?
I think that code will end up in kernel, not in module because:
Code:
% kldstat -v | grep acpi_thermal
% ls -1 /boot/kernel | grep acpi_thermal
Won't show anything

After applying fixes (or after new kernel patch arrived) I will just:
Code:
make buildworld && make buildkernel

# optional: create new boot environment:
beadm create <name>
beadm mount <name> /mnt/be
beadm activate <name>
export DESTDIR=/mnt/be

# install
make installkernel
make installworld
reboot
 
Thank you, this solution is working for me!

I have a Lenovo X1 Carbon Gen1 with coreboot 4.7 and tianocore payload and was having this issue since Freebsd 11. Happy to have found this post, thanks a lot! :D
 
OP Patch Risk:
I think it is important to point out a risk with your patch. (If I am properly understanding how acpi_thermal.c works)
C:
/* Check for temperature changes every 10 seconds by default */
#define TZ_POLLRATE    10

/* Make sure the reported temperature is valid for this number of polls. */
#define TZ_VALIDCHECKS    3
By changing the valid checks from 3 to 30 you are telling your computer to wait for 30 checks to comeback with "bad" values before it takes action to protect the hardware. This means that on the default 10 seconds per check you are telling to system to wait 5 minutes before it shuts down instead of the standard 30 seconds. These checks check the cpu temperature meaning that your computer can be running at dangerous temperatures before it takes action to protect itself which can lead to damaged hardware.

Alternative Fix:
Increase the polling rate with acpi_thermal(4). The flag can tuned to anything >10. Anything lower will not help the issue.

Add the following line to /etc/sysctl.conf
Code:
hw.acpi.thermal.polling_rate=15
Risk: 75 seconds for the system to take action rather than default 30 seconds.
Disclaimer: I did not do any tuning to get to 15 so you could set it to something lower. I set it months ago and have had no issues since.
Your post gave me some great info on what to look at and ways to fix it. By default there is no way to change TZ_VALIDCHECKS without a kernel patch as it seems to be hardcoded (I could be wrong here) but TZ_POLLRATE can be changed with acpi_thermal(4). So instead of changing how many times the computer will check the temperature before taking action, just increase how long it takes to check the temp. For me increasing it to 15 sec instead of the default 10 sec fixed the issue. This works because the whole issue stems from how fast the thermals are being checked. This is less risky because it will only take 75 seconds for my system to take action if something is going wrong. And for a laptop that doesn't get pushed to it's limits, the risk is basically negligible.
 
Yes, you are right. 5 minutes of CPU cooking before shutdown:
Code:
Apr 14 08:51:53 hostname kernel: acpi_tz0: INFO [check 1/30] current temperature: 101.1C
Apr 14 08:52:03 hostname kernel: acpi_tz0: INFO [check 2/30] current temperature: 102.1C
Apr 14 08:52:13 hostname kernel: acpi_tz0: INFO [check 3/30] current temperature: 102.1C
Apr 14 08:52:23 hostname kernel: acpi_tz0: INFO [check 4/30] current temperature: 102.1C
Apr 14 08:52:33 hostname kernel: acpi_tz0: INFO [check 5/30] current temperature: 102.1C
Apr 14 08:52:43 hostname kernel: acpi_tz0: INFO [check 6/30] current temperature: 102.1C
Apr 14 08:52:53 hostname kernel: acpi_tz0: INFO [check 7/30] current temperature: 103.1C
Apr 14 08:53:03 hostname kernel: acpi_tz0: INFO [check 8/30] current temperature: 103.1C
Apr 14 08:53:13 hostname kernel: acpi_tz0: INFO [check 9/30] current temperature: 104.1C
Apr 14 08:53:22 hostname kernel: acpi_tz0: INFO [check 10/30] current temperature: 104.1C
Apr 14 08:53:33 hostname kernel: acpi_tz0: INFO [check 11/30] current temperature: 104.1C
Apr 14 08:53:43 hostname kernel: acpi_tz0: INFO [check 12/30] current temperature: 104.1C
Apr 14 08:53:53 hostname kernel: acpi_tz0: INFO [check 13/30] current temperature: 104.1C
Apr 14 08:54:03 hostname kernel: acpi_tz0: INFO [check 14/30] current temperature: 104.1C
Apr 14 08:54:13 hostname kernel: acpi_tz0: INFO [check 15/30] current temperature: 104.1C
Apr 14 08:54:23 hostname kernel: acpi_tz0: INFO [check 16/30] current temperature: 104.1C
Apr 14 08:54:33 hostname kernel: acpi_tz0: INFO [check 17/30] current temperature: 104.1C
Apr 14 08:54:43 hostname kernel: acpi_tz0: INFO [check 18/30] current temperature: 104.1C
Apr 14 08:54:53 hostname kernel: acpi_tz0: INFO [check 19/30] current temperature: 103.1C
Apr 14 08:55:03 hostname kernel: acpi_tz0: INFO [check 20/30] current temperature: 104.1C
Apr 14 08:55:12 hostname kernel: acpi_tz0: INFO [check 21/30] current temperature: 104.1C
Apr 14 08:55:23 hostname kernel: acpi_tz0: INFO [check 22/30] current temperature: 104.1C
Apr 14 08:55:33 hostname kernel: acpi_tz0: INFO [check 23/30] current temperature: 104.1C
Apr 14 08:55:43 hostname kernel: acpi_tz0: INFO [check 24/30] current temperature: 104.1C
Apr 14 08:55:53 hostname kernel: acpi_tz0: INFO [check 25/30] current temperature: 104.1C
Apr 14 08:56:03 hostname kernel: acpi_tz0: INFO [check 26/30] current temperature: 104.1C
Apr 14 08:56:13 hostname kernel: acpi_tz0: INFO [check 27/30] current temperature: 104.1C
Apr 14 08:56:23 hostname kernel: acpi_tz0: INFO [check 28/30] current temperature: 103.1C
Apr 14 08:56:32 hostname kernel: acpi_tz0: INFO [check 29/30] current temperature: 104.1C
Apr 14 08:56:32 hostname root[81305]: WARNING: system temperature too high, shutting down soon!
Apr 14 08:56:42 hostname kernel: acpi_tz0: INFO [check 30/30] current temperature: 104.1C
Apr 14 08:56:42 hostname kernel: acpi_tz0: WARNING - current temperature (104.1C) exceeds safe limits
Apr 14 08:56:42 hostname kernel: .

I didn't have (until now) thermal shutdowns, idea of my kludge was to prevent laptop going to standby when using Coreboot's ACPI tables.
Code:
Feb 27 17:22:52 hostname acpi[2759]: suspend at 20210227 17:22:52
<resume>
Feb 27 17:23:23 hostname kernel: acpi_tz0: INFO [check 1/30] current temperature: 128.1C
Feb 27 17:23:23 hostname kernel: acpi_tz0: INFO [check 2/30] current temperature: 128.1C
Feb 27 17:23:23 hostname kernel: acpi_tz0: INFO [check 3/30] current temperature: 128.1C
 
Back
Top