Frequent kernel panic and ZFS mount problems

Hello everybody,
I installed the new disks (I used the existed cables and connectors).
I installed FreeBSD 13.2 RELEASE with ZFS using the two new disks as a mirror. Then I installed all the needed packages but in the end the computer crashed so badly that I got into the mountroot ZFS problem directly and from there I wasn't able to to recover the core.txt files.
Then I installed FreeBSD 13.2 RELEASE again using the new disks as a mirror at installation time with bsdinstall, but this time I created, after the installation, a new zpool with the two old disks as a mirror. With this ZFS configuration the crashes intervened immediately, I barely managed to install the gdb package, but this time I was able to get some core.txt files, which I attached here.
While connecting the new disks to the existing cables I observed that actually all four disks are connected to the HBA controller through a single connector to one port.
The two old disks are two DELL 600 GB 10k SAS ISE 12GBps (actually labeled as Toshiba).
The two new disks are two Samsung PM1643a 960GB SAS SSDs.
 

Attachments

  • core.txt.0.txt
    84 bytes · Views: 38
  • core.txt.1.txt
    66.9 KB · Views: 39
  • core.txt.2.txt
    84.7 KB · Views: 43
  • fbsdv_kru.txt
    39 bytes · Views: 37
Thanks for replying, VladiBG
I've changed in BIOS the memory frequency from DDR4-2666Mhz to DDR4-2133MHz.
Immediately after that there was another crash. I attach the resulting core.txt file. I also tested the new disks as previously recommended with sg3_utils (da0, da1), and I attach the corresponding files, just in case
 

Attachments

  • core.txt.3.txt
    71.3 KB · Views: 38
  • sga_logs_all_da0.txt
    6.3 KB · Views: 36
  • sga_logs_all_da1.txt
    6.3 KB · Views: 31
  • sga_logs_all_da2.txt
    5.4 KB · Views: 40
  • sga_logs_all_da3.txt
    5.4 KB · Views: 35
It's look like memory problem to me but it can be some incompatibility. The best approach is to strip down the computer to it's minimum configuration, 1 memory module, 1 SATA hard disk attached directly on the motherboard, no any other attached PCI-E cards or VGA adapters, good working power supply and then install OS which you know and test again by performing some basic benchmarks to put the system under some stress (CPU, HDD, Memory etc) when you are sure that those components are working then gradually attach other components until the system start give issues again that way you will identify the faulty/incompatible component to this build.

So remove the LSI card, leave only 1 memory module and put 1 normal SATA disk on the motherboard, install FreeBSD with UFS and play with it to see if it's giving you some issues. Make sure that you disable any AI overclocking in the bios. This will prove that those components are ok, then you can add the second memory module and see if it's continue to work and so on until you found the faulty component. It's a long process of try and error and it's usually done in service where you have many parts lying around and you can afford to swap them and test.
 
I don't think I could find an appropriate service nearby. They are all self-sufficient windows-prone individuals not willing or able to learn anything new. Looking backward and knowing what I know now I regret that I didn't build this computer by scratch by myself.

But that's it. I'll gather all my courage and I'll dive deep into the computer's guts to find the "culprits".
Thanks again forum, great community, and if I'll have a positive result I'll come back to share it.
 
Regarding the core files, very short summary, using core.txt numbering:
0 - unusable as gdb was not installed
1,2 - crash on bogus address 0x20000000000, same rip, same issue, not zfs related
3 - crash on unmapped address, zfs related

As those crashes are not reoccurring at the same location it's safe enough to say it's not a SW bug (while it always can be at the first glance it doesn't seem so).

I suggest to do these tests:
- avoid using this HBA completely during tests. Remove it from board, attach disks directly to board, try again
- use new version of memtest, do the full memtest till it shows "PASS" on the screen

If those two points above don't help try to eliminate possibly faulty memory module. Remove all but one, boot FreeBSD, try to trigger crash. Test all modules like these. If you can't trigger crash with running single module try to test every slot. Move module to other slot, boot and test.

But then .. if memory if faulty you should be able to trigger the crash/encounter an issue under any OS. Running hard stress test under Windows will show HW issues too.
 
Back
Top