Solved FreeBSD fails ( and other BSDs) to boot if system has more than 5 devices

markmarques

New Member

Reaction score: 2
Messages: 11

Only recently started to try BSD systems but for my dismay whenever I booted a system that had more than 5 devices ( HDDs, SSDs , USB drives ) it locks ( freezes ) during the device enumeration.
Found this behaviour in several flavours ( FreeBSD, OpenBSD, XigmaNAS, GhostBSD ) in several machines , both live ( NomadBSD ) systems or pre-installed systems .... once the number of devices is higher than 5 it locks during the boot sequence.
Reproducible every time if the number of available devices is higher than 5 and the system startup ...

My BSD knowledge is a bit limited , nonetheless how can I start to aid with this ?
 
Last edited:

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,046
Messages: 38,506

I have systems with 8 or more disks attached. And I'm sure there are people here with even more disks attached (think large storage systems). Nothing "extra" needs to be done for this to work. I suspect it's something else that's causing problems for you.

What version of FreeBSD? What kind of system do you have? Mainboard? Controllers? What exactly happens?
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

I have an Asus P8P67 board with a both intel and Marvell chipset controllers, another machine is a superMicro X8DTN4LF-i server ...
Validated this behaviour in FreeBSD 12.1 , OpenBSD 6.7 , NomadBSD, GhostBSD, XigmaNAS ( FreeBSD 12.1 based ) ...

During boot, when enumerating the HDDs the system locks up when reaching the 6 HDD ...
Once waited more than 10 minutes , and no change whatsoever ....
No kernel panic , no stack , only the blinking cursor in the 6 device ....

If I remove ( randomly ) any device it boots perfectly ...
It does not matter if the system booted previosly in a live enviroment as NomadBSD, or if it was already fully installed ...
Once the board has the 6 devices connected I get the same lockup during the HDD enumeration ...


In XigmaNAS ( because the Supermicro System does allow HDD HotSwap) my trick is to startup with only some HDDs and them add them later on ...

Although I do confess that this issue is a bit far-fetched though ...
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,046
Messages: 38,506

During boot, when enumerating the HDDs the system locks up when reaching the 6 HDD ...
This happens in POST before any OS is loaded. So the OS is not relevant.

I have a SuperMicro X8DT3-LN4F. There are 6 drives attached, 2 SSD and 4 SAS disks. No problems booting it.
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

(Perhaps I did not make myself clear ... )
After POST , when BSD starts, it enumerates the different block devices , that is the moment when I get the system lockup ...

Accordingly to the documentation the problem is during the Stage Three - "loader" segment , "where it starts to probe the hardware" ....
What I have is similar to :
Code:
BTX loader 1.00 BTX version is 1.02
Consoles: internal video/keyboard
BIOS drive C: is disk0
BIOS drive D: is disk1
BIOS drive E: is disk2
BIOS drive F: is disk3
BIOS drive G: is disk4
BIOS drive H: is disk5
BIOS drive F: is disk6

As side note other OSes ( non BSD ) startup without any issues whatsoever....
As I stated initially I get this behaviour in present in different hardware and different BSD flavours ...
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,046
Messages: 38,506

Are the drives attached to the SATA controller? Are they set to AHCI in the BIOS? Does your board have an additional mpt(4) controller like mine does (there are a couple of variations of that X8DT board)?
 

olli@

Daemon
Developer

Reaction score: 1,252
Messages: 1,140

That’s very strange. I also witnessed systems during my FreeBSD career that had more (sometimes much more) than 6 devices and booted without any problems whatsoever.

Just for the heck of it I just plugged 8 flash sticks into my USB hub and booted. No problem, all of them were probed and work fine.

If I understood you correctly, the lockup happens in the boot loader, not in the kernel? Please share a screenshot of the lockup situation that you are experiencing. That might be helpful. Edit – Sorry, didn’t see your last post.
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,046
Messages: 38,506

Accordingly to the documentation the problem is during the Stage Three - "loader" segment
The output you've posted is from stage 1 actually. That's the output from the bootloader.
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

I started this topic that because in both systems , not matter which BSD I try to start I only get to that point ( Stage1 as stated by SirDice ) where the HDDs are enumerated ...

Sometimes those devices are USB connected, in other moments they are connected to the LSI PCI-E board or even directly to the onboard SATA connectors ....
It locks after the enumeration in Stage1 during the 5 HDD listing ...
No kernel stack , no nothing ... Simply a blinking cursor ...

As I wrote before, in both cases, if I remove one single device the system boots flawlessy ....

At first thought It could be a problem with the USB live ( using nomadBSD ) , but afterwards I got it with XigmaNAS ( FreeBSD 12.1 based) ...
Afterwards tried GhostBSD same thing and last installed and tried OpenBSD 6.7 with same behaviour ...

Where can I start and help with this ?
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

Added the HDDS and restarted one of the machines ( Supermicro X8DTN board ) and got this :

Code:
BTX loader 1.00 BTX version is 1.02
Consoles: internal video/keyboard
BIOS drive A: is fd0
BIOS drive C: is disk8
BIOS drive D: is diskl
BIOS drive E: is disk2
BIOS drive F: is disk3
BIOS drive G: is disk4
BIOS drive H: is disk5
BIOS drive I: is disk6
BIOS drive J: is disk7
BIOS 630kB/3135808kB available memory
FreeBSD/x86 bootstrap loader, Revision 1.1
int=0000000d err=00002500 efl=00010246 eip=0003e500
eax=00091fe8 ebx=0005cea0 ecx=00000002 edx=00000000
esi=0000015c edi=bd95ad40 ebp=00092420 esp=00091fa4
cs=002b ds=0033 es=0033 fs=0033 gs=0033 ss=0033
cs:eip=17 la a5 6c 8d e7 cc f6-3e 9b 93 f8 0d 9d 45 65
       5f 66 79 a2 f2 f0 c8 cc-8c e2 0c 28 0f 6a a2 51
ss:esp=03 25 03 00 a0 ce 05 00-e8 1f 09 00 ff ff ff ff
       ff ff ff ff f1 e0 04 00-21 2f 03 00 00 20 41 1f
BTX halted

Any ideas ?
 

ekvz

Well-Known Member

Reaction score: 278
Messages: 431

I've noticed something similar on the desktop i am trying to set up. As soon as i try to boot from HD with the USB dongle i installed from connected (but not set to boot before the HD or even at all) the loader comes up, spins for 1-2 seconds and freezes with machine becoming unresponsive and toggling the power switch on the supply unit being the only option.

I am not sure if it's really the USB dongle or just the number of devices but in any case i wouldn't think it's FreeBSDs fault here. This is some seriously weird OEM hardware. To the point where even getting it to boot from USB is kind of an adventure. So something like the hang while booting with the dongle connected doesn't surprise me at all.
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

The previous entry is from a boot try with USB device ...
Although another system ( P8P67) has the similar lockup booting from an HDD only but it does not display the stack dump ...

Later on i will try another system reinstall ( removing any usb block device afterwards ) to duplicate such detail....
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

Mean while I did not need to re-install ...
During the weekend I remembered that I changed hardware and swapped a small SATA PCI-E controller between the designated machines.

It is a 2 port SATA controller defined in Linux as :
SATA controller: Marvell Technology Group Ltd. 88SE9125 PCIe SATA 6.0 Gb/s controller (rev 11) .
Only after some more tests during the day I remember that small detail ...

Apparently it only locks up when HDD is connected ...
and it does not give any stack dump ...

The previous post above with the stack dump was when the Server machine still had the PCI-e connected and was taken in the first bootup ...

Nonetheless what else can I do to aid and solve this ?

and thank you for the fast support ...
 

mark_j

Daemon

Reaction score: 682
Messages: 1,192

My first point to you would be stop jumping around with other OSs. All this OpenBSD, GhostBSD, *BSD is confusing.
After all this is a FreeBSD forum.
So, given that, what is the make up of your drives and devices attached to 1 machine? Just pick one to deal with.
How did the install work? Did it halt when scanning devices?
(And as others have said, there's plenty of FreeBSD systems running multiple disks in excess of yours. Our database server has 32 attached)
 

Jose

Daemon

Reaction score: 904
Messages: 1,109

I must be imagining this:
Code:
# geom disk list | grep 'Geom name'
Geom name: ada0
Geom name: ada1
Geom name: da0
Geom name: da1
Geom name: da2
Geom name: da3
Geom name: da4
Geom name: da5
 

ralphbsz

Son of Beastie

Reaction score: 2,310
Messages: 3,214

My only theory is that one of the drives happens to be defective, in a fashion that either outright breaks or just "delays" the boot. For example, I used to run servers that had 400...800 disks attached (not under FreeBSD, but under Linux), and some of the disks can be so broken that booting becomes impossible. In many cases you don't even get to the boot loader, but end up hanging in the BIOS. The reason is that the bootloader, the BIOS, and the firmware in the various HBAs has to find the "bootable" disk, which requires test-reading all disks. And that code is typically written without very good error handling, so it will sometimes hang or crash on errors. And once you have *many* disks attached, the probability of having one with a fault that causes this hanging increases.

I would take the disks out one at a time, and try again.

I would also definitely standardize on one OS, and get it to work, before switches OSes. This is not an OS bug. I think Terry Kennedy has been known to boot FreeBSD with several hundred disks (the number is from memory), and most SuperMicro motherboards are capable of that too (with the right HBAs), so it is not a fundamental limitation.
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

As I wrote before after some trial and error I have found that the problem is related to the PCI-E SATA Marvell controller....


I swapped disks ( non-bootable ) connected to that specific controller, noticeing that once the system starts to boot it locks always if any HDD is connected.
Before writing the initial post I installed FreeBSD but without the extra PCI-E SATA controller ...
It seems that noone of the latest BSD flavours is able to boot with that hardware.

For the time being I am leaving the PCI-E SATA controller attached to the Asus P8P67 machine ...

Noob question: should I change the this post title ?
 

olli@

Daemon
Developer

Reaction score: 1,252
Messages: 1,140

My only theory is that one of the drives happens to be defective, in a fashion that either outright breaks or just "delays" the boot.
Either that, or a strange bug in the BIOS. This could be either the mainboard’s BIOS, or a BIOS on a PCI controller card that “hooks” into the main BIOS (anybody remember INT 13h?).

FreeBSD’s bootloader calls certain BIOS functions in order to detect devices, because the bootloader does not contain real drivers to access the hardware directly, like the kernel does. Bugs in BIOS functions are not unusual (that’s why there are BIOS updates sometimes). So, it’s quite possible that a BIOS function locks up under certain circumstances. Or it returns invalid values that are unexpected by the bootloader, causing it to lock up.
 

T-Daemon

Daemon

Reaction score: 831
Messages: 1,700

If I remove ( randomly ) any device it boots perfectly ...

What file system are on those devices?

I have a similar delay in the BTX loader stage. In my case this happens only with USB devices. In the presents of one or multiple USB devices the BTX loader does not advance immediately after the disks are enumerated, it stays stuck, the underscore prompt is blinking, the progress indicator above the underscore moves on character at a time ( - \ | / - ) with long delays between them.

If the USB devices have a freebsd-* partition the delay is longer. For example if a FreeBSD installation USB disk is inserted the delay is 40 seconds. If one USB device with ext4 is plugged in, that delays the loader ~10 sec, two ~20 sec.

This happens on a two disk box, with installed 12.1-RELEASE on HDD ( system UFS, ZFS partition, ext4 partitions ) and SSD ( system ZFS, ZFS partition, ext4 partitions ). The motherboard is a Gigabyte H61M-D2-B3 F10 ( BIOS ).

The same BTX delay can be observed from both FreeBSD installations. When no USB devices are plugged in, there is no delay.

Curiously, if I boot a FreeBSD installation image from USB stick there is no delay at all. The BTX loader stage, after enumerating the disks, is past immediately, indifferent how much USB devices are plugged in.

I couldn't determine a relevant difference between the installation image and the hard disk installations for now, regarding into the boot process involved files.

There is an older thread describing a BTX loader delay involving ZFS:
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

I agree with your ( olli@ ) last comment ...
Somehow the BIOS is giving invalid values that were unexpected by the bootloader ...

Answering the T-Deamon detail request, the filesystems are assorted .. ranging from ext4 , NTFS, even ZFS ...
Although I did noticed that some HDDs ( with ZFS ) took a little more time to pass to the next device, but nothing special ...

At the moment I can keep the PCI-E off ( by disconnecting 1or 2 HDDs .... ) , so the system boots properly .
But what about in the in the long run ? How can help to sort this detail out ?
 

SirDice

Administrator
Staff member
Administrator
Moderator

Reaction score: 12,046
Messages: 38,506

Most BIOS's can disable the option ROMs. Have you tried that? That option ROM is usually only needed if you need to boot from that controller. It's not required when using it as an additional controller.
 

T-Daemon

Daemon

Reaction score: 831
Messages: 1,700

I agree with your ( @olli@ ) last comment ...
Somehow the BIOS is giving invalid values that were unexpected by the bootloader ...

In my case that wouldn't explain why a installation image has no BTX delay but a installed one has, which shouldn't mean it might not apply to your case.

I have the habit when installing OS's on different disks, to disconnect all but the one the OS is installed on. I remember vaguely, if I had all disks plugged in during installation of FreeBSD, there was no delay at the BTX loader, but it has been a while, maybe my memory doesn't serve me well.

But I can test it. Recently I trashed my root-on-ZFS installation on my second hard disk ( hard reset in the middle of the boot process, it was stuck for more then 10 minutes, now zpool import reports meta-data corruption, but no harm done, proper backups available.
 

olli@

Daemon
Developer

Reaction score: 1,252
Messages: 1,140

Most BIOS's can disable the option ROMs. Have you tried that? That option ROM is usually only needed if you need to boot from that controller. It's not required when using it as an additional controller.
It might also be worth trying to switch off USB compatibility mode in the BIOS setup. It’s only required when booting from a USB device, if I recall correctly.
 
OP
M

markmarques

New Member

Reaction score: 2
Messages: 11

As indicated by SirDice the detail was in the BIOS settings ... ( Thank you for that tip) ...
Once I disabled the Marvell ROM in the BIOS settings everything worked as expected in the P8P67 system ...

Nonetheless during these experiments realized that the problem might be related to the system (P8P67 board ) that has 2 Marvell devices ( different ROMs chipsets ) that might confuse the bootloader.

Although in the SuperMicro machine nothing has changed but apparently ( without the Marvell PCI-E ) everything works as expected.

Still have not yet tried the "USB compatibility" mode yet as indicated by olli@ , as everything else was in internal HDDs ...
Going to explore even more FreeBSD in the next few weeks ...

Thank you for your fast replies and great support ....
 
Top