Solved Very long boot times on new servers

Hi,

At work we recently got some new Servers from Hetzner [1] with FreeBSD. While they are running everything seems to be fine but restarting them takes a very long time (about 10 to 15 minutes) which is rather annoying and a not really acceptable downtime. If we start the FreeBSD installation medium in rescue mode (10.3) the boot takes under a minute.
These new servers got several things in common: They all've got 64G of DDR4 RAM, a Skylake i7-6700 CPU, SSDs and a lot of those ACPI errors in syslog (new ones every 10 seconds):
Code:
Jun 10 14:38:50 s36 kernel: ACPI Error: [\_SB_.PCI0.LPCB.H_EC.ECAV] Namespace lookup failure, AE_NOT_FOUND (20150515/psargs-391)
Jun 10 14:38:50 s36 kernel: ACPI Error: Method parse/execution failed [\_TZ_.TZ00._TMP] (Node 0xfffff8000b403340), AE_NOT_FOUND (20150515/psparse-552)
Jun 10 14:38:50 s36 kernel: ACPI Error: [\_SB_.PCI0.LPCB.H_EC.ECAV] Namespace lookup failure, AE_NOT_FOUND (20150515/psargs-391)
Jun 10 14:38:50 s36 kernel: ACPI Error: Method parse/execution failed [\_TZ_.TZ01._TMP] (Node 0xfffff8000b403200), AE_NOT_FOUND (20150515/psparse-552)

The Only thing I could find about long boot times is this thread: https://forums.freebsd.org/threads/42596/ but it shouldn't have an impact on 10.X installations?

Maybe one of you got an Idea?


[1]
https://www.hetzner.de/us/hosting/produkte_rootserver/ex41ssd
https://www.hetzner.de/us/hosting/produkte_rootserver/ex51ssd
 
Hi, I have alot servers on Hetzner and I use only PX series, they work fine for years with FreeBSD. I tried EX series in past and had numerous problems with them running FreeBSD (mostly stability issues), so I never use EX series with FreeBSD. From what I have learned EX series is made to run Linux - and it work really fine with it. If you want to run FreeBSD get PX series.
 
Thanks for your answer. We used the EX series in the past but the problem only emerged with the new hardware configuration (those 64GB of RAM and new CPU), most of the other EX servers we've got have 16GB or 32GB RAM and Intel Xeon Haswell CPUs (E3-1246).

Maybe we will switch to another series in the future but we also need to fix this problem if possible.
 
did you find a solution yet? Running into the same issue… Other than the slow boot and the ACPI messages, the box seems to behave fine, though.
 
If you have a lot of RAM in the system, then you will experience slow boots, as the kernel/loader scans all the RAM to make sure it's all there, map it out, yadda yadda. It's most noticeable on systems with 128+ GB of RAM, but can be seen on some systems with 64 GB.

There is a /boot/loader.conf tunable that can be set to mitigate this, although I do not recall off-hand what it is.

If you are seeing the spinning cursor for a minute+ during the boot, or just after the kernel version is shown, then you are experiencing this "problem".

There's also some issues with some UEFI implementations that leads to a very slow boot.

This Thread 53511 thread also has some suggestions for similar issues.
 
Thanks for the quick reply. Do you mean hw.memtest.tests="0"? That didn't help.
I'm currently looking at approx 8 minutes for a reboot with 64GB of RAM. But Thread 53511 looks promising, will look into it, thanks!
 
I was under the impression, that hw.memtest.tests was disabled by default since FreeBSD-10 but can't find any sources anymore. So maybe thats worth trying. I'll report back after our next reboot :)
 
We rebooted one of out servers with hw.memtest.tests disabled but it didn't change a thing. Still more than 10 minutes..

About the ACPI messages: they are still there and not really a concern as long as they don't influence the boot time.

Here's the log part of the reboot (of course without anything helpful?):
Code:
...
Jul  4 15:05:59 s07 reboot: rebooted by admin
Jul  4 15:05:59 s07 kernel: Jul  4 15:05:59 s07 reboot: rebooted by admin
Jul  4 15:06:00 s07 syslogd: exiting on signal 15
Jul  4 15:16:43 s07 syslogd: restart
Jul  4 15:16:43 s07 syslogd: kernel boot file is /boot/kernel/kernel
Jul  4 15:16:43 s07 kernel: Copyright (c) 1992-2016 The FreeBSD Project.
...
 
Here's the log part of the reboot (of course without anything helpful?):
The only useful visible things seems to be the fact that the delay happens before booting the kernel, if so it is an hint at "loader" as the source of the delay.

Another hint about that was your report about booting in single-user/rescue mode ... so look at the differences between single/multi-user boot.

EDIT: Sorry, I meant rescue mode / multiuser default.
 
Sorry, I see where that could be misunderstood:
The rescue mode I was talking about is the rescue mode from Hetzner and means that we've booted the server from the FreeBSD live cd / install medium. When we get a new server or have to restart in this rescue again I'll have a look about the loader.conf.
 
Hi, as I already said, don't use EX series (Hetzner had EX series with Xeon&ECC in past - but they rebuild their servers line, and now it's called PX); EX is for Linux now - just look at supported software - you will not find FreeBSD here. After that check PX series - there FreeBSD listed ofc, it works just fine here. Have many leased servers here.
 
And as I answered: Maybe we'll switch, but fixing this existing problem would be very good. We have those servers now and don't intend to move the running stuff to new servers. Except for the long boot times they run very stable in fact.

Also I just checked. One of our newest servers is a PX61-NVMe and also has those long boot times and acpi errors. Also Hetzner in general isn't very supportive aboute FreeBSD..
 
Are you using UEFI or legacy? There's a problem with the bootloader on legacy with ZFS that causes very slow boot until the kernel takes over. It's not a problem with UEFI.
 
Are you using UEFI or legacy? There's a problem with the bootloader on legacy with ZFS that causes very slow boot until the kernel takes over. It's not a problem with UEFI.

Yeah, there are serious problems with the ZFS bootloader with buffering. It's related to the fact that the boot blocks are running as real mode x86 code (you know that 1970s technology that should have died a quick death ages ago..) and that environment is seriously restricted. The ZFS bootcode is complicated because it has to be able to probe every connected disk for the ZFS disk labels and also take into account partitions vs. non-partitioned disks. All that results in very bad performance, especially with multi-disk ZFS pools.
 
Yeah, there are serious problems with the ZFS bootloader with buffering. It's related to the fact that the boot blocks are running as real mode x86 code (you know that 1970s technology that should have died a quick death ages ago..) and that environment is seriously restricted. The ZFS bootcode is complicated because it has to be able to probe every connected disk for the ZFS disk labels and also take into account partitions vs. non-partitioned disks. All that results in very bad performance, especially with multi-disk ZFS pools.

I never really bought that explanation, as during an upgrade I took the disks from an i7 920 system (~5 years old, relatively slow) and transplanted them into a Skylake system and that was the first time I hit this issue. The old system booted about the same speed ZFS or no ZFS but the newer, faster one was hit with the very slow bootloader. UEFI is definitely the fix though.
 
We're using UEFI..
I also have a PX61-NVMe server from Hetzner and had the very same problem. I did some testing lately and I think you are not using UEFI..

The default BIOS setting for such servers is set to "UEFI and Legacy" (you can check it in Advanced -> CSM Configuration after entering BIOS). It's set to this value as rescue console provided by Hetzner uses mfsBSD which seems not to support UEFI yet [1]. So this is the first thing.

Another thing is how you installed your system. If your installation was on top of UFS, you wouldn't have this issue with long booting time as UFS uses Legacy and doesn't support UEFI (or does it and I'm wrong here?). In that case I assume you used ZFS. So, if your installation was on top of ZFS configured with UEFI, your server wouldn't boot at all.. and you wouldn't have this issue either :). My guess is that you went through standard installation provided by bsdinstall(8) and you haven't configured ZFS with UEFI. That's why you get this long booting time. You have different settings in BIOS and different configuration on your disks. In that situation you can:
- Try to add (somehow) a new EFI partition and write /boot/boot1.efifat image to the EFI partition. If you keep "UEFI and Legacy" BIOS setting untouched (which I recommend as if you change that to "UEFI only" you will lose ability to run rescue console provided by Hetzner), you need to keep existing freebsd-boot partition also untouched. Keeping both the EFI and freebsd-boot partitions together solved the issue with long booting time (well, at least for me but I performed new system installation as fortunately I didn't have to change anything on existing servers).
- Keep "UEFI and Legacy" BIOS setting untouched and reinstall your system, create both EFI and freebsd-boot partitions and install all necessary boot loaders. In that case you will be able to easily use rescue console provided by Hetzner. However, you will have to do some additional work as I haven't seen an easy option in bsdinstall(8) to create EFI and freebsd-boot partitions and install all necessary boot loaders.
- Change BIOS setting to "UEFI only" and reinstall system using bsdinstall(8) but this time configure ZFS with UEFI. This solution has one huge drawback. If you want to use rescue console provided by Hetzner, you will need to ask for LARA console, change BIOS setting to "UEFI and Legacy" or "Legacy only", boot mfsBSD, do your work and before rebooting to you normal system, change BIOS setting back to "UEFI only"
- Leave it as it is and make peace with long booting time...

NOTE: From my tests, if you use ZFS without UEFI, changing BIOS setting to "Legacy only" doesn't solve this issue.

[1] https://github.com/mmatuska/mfsbsd/issues/73
 
Thanks for your answers (especally kiela).

We've already upgraded one of our servers to FreeBSD 11 and the boot time droped to well under a minute. Hopefully that will repeat itself for all the other servers.


Edit: It seems that the update to FreeBSD solved this problem on all our servers. I'm marking this thread as solved. Thank you again for all the replies.
 
Last edited:
Edit: It seems that the update to FreeBSD solved this problem on all our servers. I'm marking this thread as solved. Thank you again for all the replies.

I am wondering if you still do debug.acpi.disabled="thermal" to avoid acpi errors? I recently installed EX51 server and am seeing this message every ca. 10 secs without it... did you solve it in some other way?
 
No. We couldn't get rid of this message. The debug.acpi.disabled="thermal" also didn't work for us. For now we have:
Code:
!-ACPI,devd
*.*                                             /var/log/all.log
!*
!ACPI
*.*                                             /var/log/acpi.log
!devd
*.*                                             /var/log/devd.log
!*
in our /etc/syslog.conf..
 
Back
Top