FreeBSD 11.1 installation fails and rebooting

Rajesh

Member

Thanks: 3
Messages: 45

#1
I am trying to install FreeBSD 11.1 on a AMD based board. But installation fails and the system reboots. This happens repeatedly.

Here is what I am doing,
  • I have created a bootable USB media with 11.1 memstick image (checksum verified)
  • I chose USB as my first boot device in my BIOS (Aptio v2.18 - UEFI based)
  • I see the FreeBSD boot loader menu (giving options for multi user,single user etc.,)
  • I chose multi-user boot. I see the hardware probe is starting and shortly after that system reboots before I get the "Welcome menu"(which will have the install option). I see a trace flashing before the system reboots (but couldn't capture it).
Note:
I see the same behavior when I try with single user mode (or) safe boot "on". I don't see the option for "ACPI support" in "Boot options menu". So, I assume it's disabled already(please correct if I am wrong).
Also I tried setting hint.acpi.0.disabled=1 from boot loader prompt. In this case, system panics with message "running without a device atpic requires a local APIC "

Questions:
  1. Could this be related to BIOS? If so, how? I tried with 10.4 UEFI memstick image also. Same behaviour seen.
  2. Is there any way I can capture the hardware probe logs to a file (because logs run quickly and reboots)
  3. Is there any way I can prevent rebooting (if there is a panic) from boot loader prompt? I tried boot_pause set to 1. But when I continue booting, system hangs.
Please let me know if you need any other details.
 

bds

Member

Thanks: 9
Messages: 37

#3
Try asking for a serial console at the loader prompt, and capturing the output on another machine connected using a null model cable.
 
B

BSDAppentic3

Guest


#4
Please, try to give some more information. Not only about your software but yes about your hardware.
 
B

BSDAppentic3

Guest


#5
No. Don't try to use old versions of FreeBSD. Use them at your own risk, because I don't want to enumerate ALL the changes that occurred since 10.4
And this:
I see a trace flashing before the system reboots (but couldn't capture it).
The quote of above makes me think that the problem is your hardware, it's say, your equipment.
I have some experience earned fighting and dealing with fails in W1nd0w5...I remember some of them.
Please, give us information about which hardware you have. Because I'm thinking that the problem it's not only the system that you try to install...
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#6
Sorry about the delayed response. I couldn't work on this for couple of days.

The hardware is a new development board, so couldn't share much details here. If any specific details are needed, I can see if that can be shared. Sorry about this. This board doesn't have a serial port at this moment, that is why I asked for the ways to save the boot logs in a file. Fortunately, I have the trace recorded through some management console.

Looks like the panic is triggered when doing a "nexus_add_irq" during "hpet_attach". For testing, I have disabled HPET timer in BIOS and gave a try. But still running into same issue. Weird!!

freebsd_11_1_install_issue_1.jpg


Also, as mentioned earlier, ACPI auto configure is disabled in BIOS. But still, I could see some ACPI logs (as below) during hardware probe.
cpu0 (XX): APIC ID : 0
cpu (XX): APID ID: 1 (disabled)
You can see acpi reference in the above image as well. Are they expected and normal?

As suggested in handbook, I tried setting hint.acpi.0.disabled=1 from boot loader prompt and boot it. But it panics, saying "running without a device atpic requires a local APIC ". I assume, APIC driver should be there in FreeBSD 11.1 by default.

Since HPET backtrace is seen even after disabling HPET in BIOS, I suspected the BIOS and tried using the updated BIOS. But this time, the system hangs just after the FreeBSD copyright messages are seen. It neither print anything on the screen, nor reboots.
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#8
VladiBG, Thanks for the reference you pointed.

I tried hw.pci.enable_msix=0, hw.pci.enable_msi=0 and hw.pci.realloc_bars=1, and these couldn't help in getting the issue solved. I saw a bug opened in similar lines. But, the issue seems to be in VGA driver there. Anyway, updated my issue there as well for their inputs.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221350

Additionally, I tried in a different board (made sure to be good) of same kind. But still face the same issue. With this board, I could see a behavior where if I disable one device which causes the panic, system panics in some other device. I see message "_OSC returned error 0x10" all the time. But sometimes, system just hangs without rebooting after this message.

So, I suspect this issue could be because of FreeBSD incompatibility with the board. Looking for suggestions to proceed further.
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#9
As per the below attached image (mentioned in bug 221350 comment #49), three things of concern here

freebsd_11_1_install_issue_2.jpeg

1. _OSC returned error 0x10 - Not sure what does this mean? Is it really an issue?

2. We see message "Unable to map MSI-X table" before the panic. Looking the code, this message is thrown when the following call fails in xhci_pci_attach. What could be the reason for this fail? Does this has anything to do with PCI BAR mappings, not having enough resource for the device?

bus_alloc_resource_any(self, SYS_RES_MEMORY, &msix_table, RF_ACTIVE);

3. Since MSI-X table mapping fails, driver falls to allocate MSI (pci_alloc_msi). This is where panic happens after "rman_manage_region" fails in nexus_add_irq. This again seems related to PCI BAR mapping.

Overall, Is there something wrong with the PCI BAR mappings here? What could I check with respect to that?
 
B

BSDAppentic3

Guest


#10
Right, you want that we help you but you can't give us the simple information of which hardware you have...Well, despite of it, there's still one thing that ShelLuser brings to my mind: read the handbook .
Sorry, I believe that I can speak for all us, that nobody would help you if you aren't in position to give information about your hardware. Not only for mine, because the little that I know is more software, but yes a Daemon, or someone that have some experience in equipments. Don't you dare forget that you have 2 only parts: hardware, and software.
Until now, you were speaking us about your system, fine. Now, if you aren't really in position to give information about which equipment you're using, I'll recommend to you that:
1) Be sure that the problem isn't the system (for this you MUST KNOW EXACTLY what you're doing: you must be an informatic);
2) Once that you checked that the OS is not the problem, then go and read about how to repair hardware (then you will know a lot of electronics)
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#11
Hi BSDAppentic3, I agree to the fact that without hardware details it's hard for the forum to provide help. Sorry about that. It's not that I will not provide any details. I can provide specific details as possible. So, the hardware I am using is a AMD EPYC processor based board, with 128GB of memory and just a 500GB HDD. I am trying to install FreeBSD 11.1 in this setup and running into the said issue. As said, I am using the handbook and other sources to debug so far. Please let me know if you need any specific hardware details.​
Seems like, the problem starts when the PCI enumeration begins. As you can see the image in comment #9, we get the message "OSC returned error 0x10" while the first HOST-PCI bridge is walked through. Sequentially, we see "no driver attached" messages from the PCI buses attached in the hierarcy. So, it seems like PCI config from the root is not proper and when we try to attach a PCI end device(in this case, xhci usb controller) to the bus, system panics. Adding to this point, as I mentioned earlier, "bus_alloc_resource_any" fails (means, allocating resource from parent bus failed) during MSI-X table mapping. And "rman_manage_region" fails during MSI allocation (possible reasons are region overlaps, region falling outside valid address range, couldn't allocate memory for region). I couldn't get the exact reason here as we just get "nexus_add_irq: failed" when rman_manage_region returns an error code.​
I suspected it has something to do with the PCI BAR settings done by the BIOS. So, I tried setting the following loader prompt variables hw.pci.clear_bars=1
hw.pci.clear_buses=1
hw.pci.clear_pcib=1
hw.pci.realloc_bars=1
so that PCI bridge and bus drivers will ignore the firmware settings and do everything from scratch. But still, I am facing the issue.​
I really doubt, whether the above settings I make in the loader prompt is taken into effect. Because, I even tried​
hw.pci.enable_msi=0
hw.pci.enable_msix=0
from the loader prompt. But still the panic happens trying to allocate a MSI vector.​
Also, In loader prompt, I don't see the default values for the PCI drivers set. For Eg: hw.pci.enable_msi should be '1' by default. But, when I "show hw.pci.enable_msi" (before setting it to zero), I see the variable itself is not set.​
Note: The hardware seems to be good, as I could see other operating system getting installed and booted properly.​
So, now I suspect something fundamentally wrong, missing (or) may be incompatible with respect to PCI. Please let me know your thoughts. Also, let me know if you need any specific hardware details.​
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#12
Tried compiling the head branch (faced compilation issue with ToT, so tried with CS 333773) and used it for testing(12.0-CURRENT image). With this image, I am NOT seeing "OSC returned error 0x10" message, but still seeing "no driver attached" and the panic at same point. Sometimes, I see system hang before the panic point when the first pci bus is walked through, and sometimes I see the flow getting past the panic point and hanging then. But all misbehaves happen during PCI bus walk through only.

Note: This observation is with the same hw.pci.<tunables> set from the loader prompt.
 
Π

Π 5C15

Guest


#13
The problem is that since I have no balls of crystal and I'm not an clarivident, I should make the detective: try to guess what you've done wrong.
Well, this:
Could this be related to BIOS? If so, how? I tried with 10.4 UEFI memstick image also. Same behaviour seen.
And this:
I have created a bootable USB media with 11.1 memstick image (checksum verified)
Makes me think that you're doing something weird here. Why you use versions so separated? You have BIOS? You have UEFI?
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#14
Makes me think that you're doing something weird here. Why you use versions so separated? You have BIOS? You have UEFI?
Sorry about the delayed response

To answer your questions first, I am using UEFI and just trying to get a FreeBSD booted on my board. So, I started with currently supported 11.1, then rolled back to 10.4 (Another currently supported release) after facing issue with 11.1. But unfortunately, 10.4 also gives me the same issue. I tried with both version to confirm the problem is specific to my board.

Note: I meant checksum verified to confirm that the image file isn't corrupt

Adding to this, I had a doubt whether 11.1 image is not compatible with UEFI. That's why I asked whether the issue could be related to BIOS ( By BIOS, I meant UEFI only here). To clarify this doubt, I tried with 10.4 UEFI memstick image, but ran into same issue. Later I understood, In 10.x series, a separate image is built for Legacy and UEFI based systems. But 11.1 images should go good with Legacy and UEFI BIOS.

Another reason for my question regarding BIOS is, I suspected about the PCI BAR mapping done by the BIOS, because the issue happens during PCI enumeration. So looking for any pointers for debugging with respect to BIOS.
 
OP
OP
R

Rajesh

Member

Thanks: 3
Messages: 45

#15
In continuance to my observations in comment #11, I did some more debug, adding few logs in the kernel and rebuilding them.

1. I couldn't see a way to find why "bus_alloc_resource_any" fails when trying to map MSI-X tables.

2. Adding logs to find the return value of "rman_manage_region". I see the return value as 0x10 (16 - EBUSY). As per man page, this return value is given when the allocated region overlaps with an existing region. Since this happens during MSI allocation, I assume that the currently allocated MSI vectors overlaps with already allocated ones. Please correct me if am wrong.

I went through the "rman_manage_region" function, and see two places where it returns EBUSY.
a) Overlap with current region --> In my case, this is what is happening (confirmed with a log added in this path)
b) Overlap with next region
I am not much familiar with the concepts of regions in FreeBSD. So, It would be helpful if I could get some pointers related to regions.

Two questions at this point,
1) What could be the reason for the failure in mapping MSI-X table?
2) What could be the reason for such an overlap during MSI allocation?
 
Top